Information about Norms and Data Sets

No norms are available on the IPIP website, for reasons explained below.

One should be very wary of using canned "norms" because it isn't obvious that one could ever find a population of which one's present sample is a representative subset. Most "norms" are misleading, and therefore they should not be used.

Far more defensible are local norms, which one develops oneself. For example, if one wants to give feedback to members of a class of students, one should relate the score of each individual to the means and standard deviations derived from the class itself. To maximize informativeness, one can provide the students with the frequency distribution for each scale, based on these local norms, and the individuals can then find (and circle) their own scores on these relevant distributions.

That said, some researchers might still be interested in comparing their data to existing data sets or in reanalyzing an existing data set. 

Data from surveys administered to the Eugence-Springfield Community Sample (ESCS) are the basis of many of the statistical properties of IPIP scales reported on the IPIP Website. These data can now be accessed from the Harvard Dataverse at Those wishing to access this archive are strongly encouraged to follow the first link, labeled
"(0) Documentation and Sample Demographics," and read the file labeled "Read Me First.txt" before trying to access any of the data at this site.

There are also two other known data archive sites on the Web, neither of which is officially associated with the IPIP project. Use these data at your own risk.

The first archive site is The Open-Source Psychometrics Project, which contains raw data collected online from a number of personality scales, including the 50-item IPIP inventory of the Big-Five factor markers.

Those interested in the data at the Open-Source Psychometrics Project should note that this site actually contains two data sets for the Big-Five factor markers. The first data set, with 19,719 cases, can be accesssed with a link labeled BIG5 in the Download column of a table of data sets. However, there are many problems with this data set, and we recommend avoiding it. A PDF copy of the site's version of the test when the site used to be called shows that the site used the NEO PI-R labels for the five scores instead of the correct lexical labels, so it is not clear whether factor IV is scored in the direction of Neuroticism or Emotional Stability in the data set. This PDF also uses an unconventional scoring method instead of the standard IPIP scoring method and presents norms in terms of item means instead of scale means, which has confused users.

If you want raw data for the Big-Five factor markers from The Open-Source Psychometrics Project website, you are much better off using their second data set, listed further down in their table, with a link labeled That data set contains over a million cases and is well-documented in the codebook provided within the zipped file. The scales are labeled correctly, indicating that the site owners responded to earlier confusions surrounding their first data set.

The second data archive site is Johnson's data repository on the Open Science Framework. This site contains raw data for the 300-item IPIP representation of the NEO PI-R and Johnson's (2014) 120-item IPIP-NEO.

Return Home