"Validity" Indices for IPIP Measures

This is a most controversial topic, so be warned! Most inventory developers assume that these are necessary. BUT, they may well be wrong: See Piedmont, et al. (2000), Johnson (2005), and Dudley, et al. (2005).

One can easily develop one's own validity indices after selecting a particular set of IPIP scales of interest. The validity indices that one might consider include:

(a) OVERUSE OF THE SAME RESPONSE OPTION: For each subject, count the number of responses for each option, including no response, across the total set of items administered. After examining the resulting distribution for each response option, one may want to exclude subjects with too many responses of the same kind--particularly those who omit many responses.

(b) COMMONALITY VERSUS DEVIANCY: Select the items with the most extreme response splits, half in each direction, and score each subject's responses on a total deviancy scale. After examining the distribution of subjects' deviancy scores, one may elect to exclude subjects whose deviancy scores are much like those from random responders.

(c) SEMANTIC INCONSISTENCY: Intercorrelate all of the items across the total sample, and select pairs with high positive and with high negative correlations to be used as indices of synonym consistency and of antonym consistency. For each subject, compute the correlation between his/her responses across each of the two sets of pairs, including each pair twice, once in each order (AB and BA). After examining the bivariate frequency distribution of these two indices of semantic consistency, one may elect to exclude subjects who look like quasi-random responders. BUT, see Costa and McCrae (1997) and, especially, Kurtz and Parrish (2001).

(d) SOCIAL DESIRABILITY: Compute the mean response for each item across the total sample of subjects, and then correlate each subject's responses with those item means across the total set of items. After examining the distribution of subjects' social-desirability correlations, one might elect to exclude subjects with negative correlations, near-zero correlations, and/or extraordinarily high correlations.

The sensitivity and specificity of different methods for detecting careless respondents have been compared in a study conducted by Niessen, Meijer, and Tendeiro (2016).


Costa, P. T., Jr., & McCrae, R. R. (1997). Stability and change in personality assessment: The revised NEO personality inventory in the year 2000. Journal of Personality Assessment, 68, 86-94.

Dudley, N. M., McFarland, L. A., Goodman, S. A., Hunt, S. T., & Sydell, E. J. (2005). Racial differences in socially desirable responding in selection contexts: Magnitude and consequences. Journal of Personality Assessment, 85, 50-64.

Johnson, J. A. (2005). Ascertaining the validity of individual protocols from Web-based personality inventories. Journal of Research in Personality, 39, 103-129.

Kurtz, J. E., & Parrish, C. L. (2001). Semantic response consistency and protocol validity in structured personality assessment: The case of the NEO-PI-R. Journal of Personality Assessment, 76, 315-332.

Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use? Journal of Research in Personality, 63, 1-11.

Piedmont, R. L., McCrae, R. R., Riemann, R. & Angleitner, A. (2000). On the invalidity of validity scales: Evidence from self-reports and observer ratings in volunteer samples. Journal of Personality and Social Psychology, 78, 582-593.

Return Home