Personality Assessment Research Paper

The term personality is used by different theorists in widely different ways, and the practice of personality assessment is correspondingly varied. Nevertheless, most definitions of personality refer to features that characterize an individual and distinguish him or her from others, and most assessment procedures attempt to measure these features, usually in comparison to the average person. Different approaches to personality assessment differ in the variables they measure, in the source of information about the individual, and in the way information is evaluated. This research paper reviews the most common approaches to personality assessment (projective techniques, self-report questionnaires, observer ratings, and laboratory measures) and their status in contemporary psychological science and practice.

Personality variables are pervasive and enduring, and thus can be expected to have an impact on a variety of areas in the individual’s life—for example, the prototypic extrovert has a wide circle of friends, speaks out in class, does well in enterprising occupations, enjoys competitive sports, and has an optimistic outlook in life. In consequence, personality assessment is important in many applied areas. Psychiatrists who need to diagnose psychopathology, counselors who want to suggest meaningful vocational choices, and physicians concerned with behavioral health risk factors may all turn to personality assessment. Personality variables are important in forensic, developmental, educational, social, industrial, and clinical psychology, as well as personality psychology, the discipline which seeks a scientific understanding of personality itself. For all these purposes, accurate assessment of personality is crucial.

Personality is also of great importance to laypersons in everyday life and in such significant decisions as whom to vote for or marry. Lay evaluations of personality are in some respects unscientific and susceptible to many biases; in other respects they are extremely sophisticated interpretations of observed behavior. Much (though not all) of personality assessment consists of knowing how to systematize the information laypersons have about themselves and each other in order to capitalize on the strengths and reduce the limitations of lay perceptions of personality.

The scientific study of individual differences in personality can be traced to the work of Sir Francis Galton in the 1880s, and it has occupied many of the brightest minds in psychology since. During the 1950s and 1960s, personality assessment underwent a period of crisis, based in part on humanistic objections to the depersonalizing labeling that much assessment seemed to foster, and in part on real (although exaggerated) technical problems with assessment instruments. Considerable progress has been made in the past 30 years in both personality theory and test construction, and today personality assessment is once again assuming a central role in psychology.

II. Assessment Methods and Instruments

A. Projective Techniques

The single most influential theory of personality is psychoanalysis, a complex system developed by Sigmund Freud and elaborated by a host of his followers. Briefly, psychoanalysis sees human personality as the result of conflict between the individual’s sexual and aggressive impulses and society’s demand for their control. In the course of early development, people evolve characteristic ways of resolving these conflicts which guide their adult behavior, particularly their interpersonal relationships. Because the underlying conflicts are psychologically painful and threatening, both the impulses and the defenses against them are repressed from consciousness. From this perspective, individuals never really know themselves and hide their most important features from those around them.

Thus, psychoanalytic theory poses formidable problems for personality assessment. The central information is not merely unavailable; it is systematically distorted. The analyst must make elaborate inferences on the basis of free associations, dreams, and slips of the tongue. But patients may not recall dreams or make revealing slips, and psychoanalysts need a dependable source of information that can be gathered as needed. Projective tests were designed to fill this need.

Rorschach’s inkblots are a series of 10 cards shown to the patient or subject, who is asked to explain what he or she sees in them. The basic premise is that these abstract blots, having no meaning of their own, will act as a screen onto which the inner conflicts, impulses, and emotions of the patient will be projected. They will of course still be disguised—otherwise they would be censored by the patient’s defenses—but they can be interpreted by the knowledgeable analyst just as an X-ray can be read by a skilled radiologist.

The projective technique is an ingenious approach to the problem of assessing unconscious conflicts, and the window it promises into the depths of the mind is extremely appealing. The Rorschach continues to be one of the most widely used instruments in personality assessment, and dozens of variations (including the Holtzman Inkblot Technique) and scoring systems have been developed.

It is therefore more than a little unfortunate that the scientific basis of these instruments—and of psychoanalysis itself—is highly questionable. Different interpreters draw very different conclusions from the same set of responses, and few rigorous studies have demonstrated that inkblot scores predict important external criteria. While clinical psychologists still rely heavily on the Rorschach, academic personality researchers have almost entirely abandoned it. A search of abstracts in the personality research field’s most important publication, the Journal of Personality and Social Psychology, showed that of over 4000 articles appearing between 1974 and 1992, only four studies employed the Rorschach. (Rorschach studies do still appear regularly in more clinically oriented journals.)

All projective techniques use responses to relatively unstructured, ambiguous stimuli on the assumption that these will elicit spontaneous expressions of psychologically important features. This general approach to assessment is not limited to psychoanalytic theories of personality, but can also be applied to better supported theories about needs, motives, or traits. The Thematic Apperception Test, or TAT, shows a series of drawings about which individuals are asked to tell stories. The responses can be scored in a relatively straightforward fashion—for example, a story about overcoming obstacles in pursuit of a goal is scored as evidence of a need for achievement—and when so scored they typically show somewhat better evidence of scientific validity.

Note that these approaches do not assume that the characteristics they assess are repressed. When asked, people who tell stories about achievement or intimacy often report that they are high in achievement striving or nurturance. These projective tests apparently do not reveal a level of personality from which self-reports are excluded.

B. Objective Tests: Self-Reports

Projective tests are usually contrasted with objective tests, typically questionnaires in which subjects are asked to describe themselves by answering a series of questions. For example, a measure of conscientiousness may ask 10 questions such as ‘‘Do you always keep your desk clean?’’ and ‘‘Are you devoted to your work?’’ The test is considered objective because it can be scored directly, without the need for clinical interpretation: The number of conscientiousness items to which an individual responds true is that individual’s score, and higher scores indicate higher levels of conscientiousness.
That basic paradigm—asking a standard set of questions and scoring responses with a predetermined key—has been used in thousands of assessment applications. Intelligence tests, vocational interest inventories, mood indicators, and measures of psychopathology as well as personality scales have adopted this model. Its scientific appeal lies in the fact that it can be repeated at different times and with different subjects, and consequently its accuracy can be evaluated. A whole branch of statistics, psychometrics, has been developed to analyze responses, both for what they tell us about the individual and for what they tell us about the quality of the test. Psychometric analyses provide information that can allow researchers to improve the quality of the test by changing the questions or the response format or the interpretation of the results.

If psychoanalysis formed the theoretical basis of projective tests, then trait psychology must be considered the basis of objective personality tests. Briefly, trait psychologies hold that individuals differ in a number of important ways that are usually thought to be continuously and normally distributed. Just as a few people are short, a few tall, and most average in height, so some people may be very agreeable, some very antagonistic, and most intermediate along this psychological dimension. Unlike moods, traits are enduring dispositions; and unlike specific habits, they are general and pervasive patterns of thoughts, feelings, and actions. Scores on trait measures should therefore be relatively constant, and scale items measuring different aspects of the trait should go together. These theoretical premises are the basis of the psychometric requirements of retest reliability and internal consistency in personality scales.

Among the most important personality questionnaires in current use are the Minnesota Multiphasic Personality Inventory (MMPI), developed in the 1940s to measure aspects of psychopathology; the Sixteen Personality Factor Questionnaire and the Eysenck Personality Questionnaire, representing the personality theories of Raymond B. Cattell and Hans J. Eysenck, respectively; the Myers-BriggsType Indicator, which is based on C. J. Jung’s theory of psychological types; the California Psychological Inventory, a set of scales intended to tap folk concepts, the personality constructs used in everyday life; and the Personality Research Form, a psychometrically sophisticated measure of needs or motives. The NEO Personality Inventory is a more recent addition, based on new discoveries about the basic dimensions of personality. In addition to these omnibus inventories which all measure a variety of traits, there are a number of individual scales that are widely used in personality research, such as the self-monitoring scale and the locus of control scale.

C. Objective Tests: Observer Ratings

The vast majority of personality assessments are made on the basis of either projective tests or self-reports, but observer ratings provide a powerful alternative that is increasingly used in both research and clinical contexts. The clinical interview is a kind of observer rating, because the clinician not only asks questions, but also observes the reactions of the patient to the interview process. With a few exceptions (such as the Structured Interview for the Type A Behavior Pattern), these observations are not standardized, and thus they share with projective tests potential problems of unreliability.

There are, however, assessment methods that are both objective and observer based. These methods apply the same psychometric principles used in self-report questionnaires to ratings from informants. One simple and effective way to do this is by rephrasing questions in the third person. Instead of asking the individual, ‘‘Are you devoted to your work?’’ we could ask her spouse, ‘‘Is she devoted to her work?’’ One advantage of observer ratings is that we are not limited to a single respondent. It is possible to obtain ratings from friends, relatives, and neighbors, and there is evidence that aggregating or averaging several ratings yields better information on the individual.

Ranking methods provide an alternative to observer rating questionnaires. In these methods, all the members of a group rank each other on a series of characteristics. For example, all the members of a fraternity may be asked to decide which fraternity members are most and least talkative, and rank order all the other members between them. In the assessment center method, a group of expert raters (typically psychologists) interacts with a group of subjects over a period of a few days, observing them in both standardized and unstructured situations. They then make personality ratings, perhaps by checking descriptive adjectives.

The advantages of these different forms of gathering information from observers are still debated, as are the relative merits of self-reports versus observer ratings. Fortunately, however, many recent studies have shown general agreement between many different objective methods of assessing personality. This consensual validation of observations about personality traits forms an essential basis for scientific personality psychology.

D. Objective Tests: Laboratory Procedures

Researchers dedicated to objectivity have often hoped that personality could be assessed by laboratory tests that did not depend on the judgments of individuals. Quantity of salivation in response to a drop of lemon juice, perspiration as measured by the galvanic skin response, and dilation of the pupil have all been proposed as measures of personality attributes. This approach to personality assessment has been of limited value; physiological responses typically have shown only modest and inconsistent relations to personality variables.

Yet accumulating evidence on the heritability of most personality traits suggests that there is some genetic and presumably physiological basis for many traits. Increasing sophistication in our understanding of the brain and new techniques such as magnetic resonance imaging may one day lead to discoveries about personality/brain relations with implications for assessment. At present, however, our best source of information about personality is the individual and those who know him or her well.

III. Evaluating Assessment Methods

A. Reliability and Validity

Although well-constructed objective measures of personality are valuable scientific tools, it should not be assumed that all objective measures are well-constructed. Psychometricians have, however, established a series of criteria by which scales can be evaluated. It is traditional to divide these into reliability criteria and validity criteria, although the distinction between the two is somewhat artificial. In essence, both require that the scale perform in ways that are consistent with its intended theoretical interpretation.

The two most common forms of reliability are internal consistency and retest reliability. If each of the items in a scale is considered to be an indicator of the same underlying trait, it seems reasonable to require that they all agree with each other. Cronbach’s coefficient a is a commonly used measure of this internal consistency of scale items. Internal consistency can be increased by discarding items that show limited agreement with other items, or by adding more items of the same kind (longer scales are more reliable because the errors introduced by individual items tend to cancel each other out in the long run).

For narrow constructs, the higher the internal consistency, the better. For broad constructs, however, higher internal consistency is not necessarily better because it may be purchased with a loss of generality. For example, a measure of general psychological distress that consisted of the items, ‘‘I am fearful,’’ ‘‘I am nervous,’’ and ‘‘I am anxious’’ would probably have high internal consistency, but it would offer a very narrow measure of distress focused exclusively on anxiety. Depression, frustration, shame, and other aspects of psychological distress are omitted. By including items to measure them, we would probably produce a scale with lower internal consistency but higher fidelity to the broad theoretical construct of psychological distress.

Test–retest reliability refers to the reproducibility of scores on different occasions. We would not expect major changes in personality over a 2-week period, so if individuals score very differently when they complete the questionnaire twice over this interval, it suggests that there are problems with the test.

In essence, questions about reliability ask whether the scale elicits responses that are consistent across items and across time. Without some minimum of reliability, it is hard to argue that the scale measures anything meaningful, so reliability is often taken as a prerequisite to validity. Validity refers to the degree to which a scale actually measures the construct it is intended to measure. A spelling test might have excellent internal consistency and retest reliability, but no validity at all if it were intended to be used as a measure of extraversion.

The central problem in establishing the validity of a test is that we rarely have completely satisfactory external criteria. A good measure of agreeableness– antagonism would separate agreeable from antagonistic people, but—without giving the test—how do we know who is agreeable and who is antagonistic? No single answer is usually sufficient, so we rely on a pattern of evidence in evaluating the construct validity of a scale. We may correlate it with other scales that measure similar constructs, (e.g., scales measuring trust and altruism), or we may see if it distinguishes between known groups that should differ on the dimension (e.g., social workers versus convicted felons), or we may compare self-reports with ratings on the same scale made by spouses or peers.

All of these studies would give information on the convergent validity of the scale, but they would not necessarily speak to its discriminant validity. The criterion of discriminant validity requires that scales be unrelated to scales which measure theoretically different constructs. If a test is designed to measure agreeableness, it should not be strongly related to intelligence, because intelligence is theoretically independent of agreeableness. In order to establish discriminant validity, a scale must be related to a series of other measures, especially those with which it is apt to be confounded. The strongest designs for construct validity usually require that multiple methods be used for assessing multiple traits, and that stronger correlations be seen for measures of the same trait obtained from different methods than for measures of different traits obtained from the same method.

Table I gives an example of convergent and discriminant validity across instruments and observers. Five basic dimensions of personality—neuroticism, extroversion, openness to experience, agreeableness, and conscientiousness—are measured by self-reports on adjective rating scales, and by peer ratings on a questionnaire measure, the NEO Personality Inventory. The convergent correlations (given in boldface) show substantial agreement—far greater agreement than would be expected by chance. By contrast, the discriminant correlations (e.g., between peer-rated neuroticism and self-reported openness to experience) are much smaller and generally do not exceed chance. Such data provide evidence that both instruments measure the intended constructs with considerable success.

Table I. Convergent and Discriminant Validity of Measures of Five Basic Dimensions of Personality

Personality Assessment Table 1

B. Sources of Error and Bias; Response Styles

Most personality questionnaires consist of a series of statements that the respondent must answer either true or false or rate on a scale (e.g., from strongly disagree to strongly agree). As anyone who has taken such a test knows, the items are often ambiguous and sometimes of dubious relevance. The question, ‘‘Are you devoted to your work?’’ might be interpreted in several ways. Some respondents might compare their devotion to work with their commitment to family. Some might compare their own devotion to that of their co-workers. Retired or unemployed respondents might not know how to respond. Even with the sincerest cooperation, respondents may not give the response the test developer intended.

Further, some respondents may not be sincerely cooperative. They may respond carelessly or at random simply to be finished with the task. Or they may wish to present a flattering picture of themselves to the tester. One of the most troubling discoveries of personality psychology was that laypeople are exquisitely sensitive to the social desirability of items and can, if so instructed, fake most personality tests.

Another common problem is acquiescent responding. It was discovered long ago that individuals differ in the tendency to agree with statements, regardless of content. So-called yea-sayers interpret items in ways that allow them to endorse most of them; nay-sayers find something in most items to which they object. If all the items are keyed in the same direction—that is, if true or agree responses are always indicative of the trait—then scale scores will confound measurement of the trait with measurement of acquiescent tendencies. Two such scales might show a positive correlation even if they measured very different traits, because both might also measure acquiescent tendencies.

This particular response style can be controlled quite effectively by creating scales with balanced keying: Half the items are scored in the positive direction, half in the negative. For example, we might measure conscientiousness by including the item ‘‘I often fail to keep my promises,’’ and giving points for conscientiousness if the respondent disagrees. In responses to a balanced scale, acquiescent tendencies cancel themselves out, leaving a purer measure of the trait.

Similar strategies have been developed for dealing with other response styles. For example, random responding can be detected by including a set of items that virtually no one would endorse if they were paying attention and cooperating (e.g., ‘‘I keep an elephant in my basement’’). Endorsing several such items would suggest random responding, and test results should be considered invalid. Cooperative respondents, however, may find the inclusion of such ‘‘trick questions’’ offensive. An alternative way of detecting one common form of random responding is by looking for a string of identical responses on an answer sheet, which may indicate thoughtless, repetitive responding merely intended to finish the questionnaire. This is an unobtrusive measure of random responding.

The greatest attention has been paid to the problem of socially desirable responding. Many scales have been devised in the hopes that they could identify individuals who responded on the basis of the desirability of an item rather than its accuracy as a description of their personality. Researchers routinely include such scales in construct validity studies to estimate the discriminant validity of the scales of interest from socially desirable response tendencies. Unfortunately, however, no good measure of desirable responding per se has yet been developed, and most research suggests that attempts to correct for social desirability do more harm than good.

The root of the problem is that statements have both substantive and evaluative meanings. Anyone who wished to appear in a good light would endorse the item ‘‘I always try to do my best’’—but so would highly conscientious individuals who are scrupulously honest in their responses. It is impossible to determine from the response alone whether the individual really has desirable characteristics or is presenting a falsely favorable picture of him- or herself.

Two general strategies appear to be useful for dealing with this problem. First, in most cases it appears that respondents are more truthful than psychologists anticipated. Even though they can endorse desirable items when instructed to do so, test takers normally do not, when asked to be honest and accurate. Research volunteers have little incentive to distort their responses, and clients in counseling and psychotherapy should be convinced by the assessor that accurate responding will be in their best interest. Mutual respect and trust between test administrators and test takers is usually the best basis for assuring valid results.

However, in some cases there may be good reasons for mistrusting self-reports. The responses of prison inmates who describe themselves as saints when being evaluated for parole should be regarded with considerable skepticism. In these cases, the most appropriate tactic may be to obtain observer ratings from knowledgeable and impartial informants. The current availability of validated observer-rating questionnaires (such as Form R of the NEO Personality Inventory) makes that approach feasible.

None of these approaches to scale construction or administration eliminates all the limitations of personality assessment by questionnaire. The inevitable ambiguity of items and respondents’ imperfect knowledge of themselves or the individuals they rate mean than personality measures lack the precision that we admire in the physical sciences. The data in Table I show that our assessments are on the right track, but they can also be interpreted to show that our measurements are far from perfect. Both self-reports and observer ratings are useful tools that give valuable information about personality, and either is acceptable for use in research on groups. For the intensive understanding of the individual (e.g., in psychotherapy), it is desirable to obtain both self-reports and informant ratings, and all inferences about personality traits should be considered provisional, subject to revision or refinement as new information becomes available.

C. Content and Comprehensiveness in Personality Questionnaires

Psychometric theory gives general guidelines for constructing and evaluating measures of psychological characteristics, but it gives little guidance about what should be measured. For decades, one of the central problems in personality psychology was the proliferation of hundreds of scales measuring aspects of personality that some researcher or theorist thought important in understanding human beings. Many of the most eminent personality psychologists were those who offered a system, a model of personality structure that specified the most important aspects of personality and thus brought some kind of order to the chaos of competing ideas.

Factor analysis has frequently been used as the statistical technique for studying personality structure. Factor analysis is a mathematical procedure which condenses the information about intercorrelations among many variables by detecting groups of variables that covary separately from other groups. These groups of variables define a factor, a dimension along which individuals can be ranked. For example, the individual traits of trust, straightforwardness, altruism, compliance, modesty, and tender-mindedness covary to define the broad dimension of agreeableness.

In the days before computers, a factor analysis might consume months of computational labor, and it is not surprising that early factor analysts tended to defend whatever structure they first uncovered. As a result, for many decades disputes raged about whether there were 2 basic factors, or 3, or 5, or 10, or 16. Failure to resolve this issue lowered the credibility of the field, and paralyzed much personality research: How could we study personality and aging, say, unless we knew which aspects of personality we needed to measure as individuals aged?

In 1961 two Air Force psychologists, Ernest Tupes and Raymond Christal, factored data from several different studies and concluded that five, and only five, major factors seemed to recur. Their work was largely ignored during the next 20 years, but around 1980 interest in the five-factor model revived. Initially, these five factors were seen as the basic dimensions underlying trait adjective terms used by laypersons and encoded in the natural language—terms such as nervous, enthusiastic, original, accommodating, and careful. Questionnaire measures, including the NEO Personality Inventory, were then developed to measure these five factors (see Table I for names of the factors). Subsequent research showed that the same five factors were also found in most of the theoretically based questionnaires that had previously been constructed. For example, the four scales of the Myers-Briggs Type Indicator correspond to four of the five factors (introversion– extroversion to extroversion, sensation– intuition to openness, thinking–feeling to agreeableness, and perception–judgment to conscientiousness).

The same basic dimensions have been recovered in cross-cultural analyses of personality (including studies conducted in Hebrew, German, and Chinese), in self-reports and observer ratings, in men and women, and in young, middle-aged, and older adults. Although disagreements remain over the precise nature and scope of the factors, there is a general consensus that these five represent basic and universal features of personality. (Intelligence, another fundamental dimension of individual differences, is generally considered to be outside the realm of personality proper.) The five factor model thus provides a general answer to the question of what personality traits should be measured: A comprehensive assessment must include measures of all five factors.

However, the five factors themselves are too broad to give a detailed picture of the individual. Anxiety and depression are both aspects of neuroticism, and in general, people who are anxious also tend to be depressed. But some anxious people are not depressed, and some depressed people are not anxious, and it is extremely important to clinical psychologists to determine whether a patient is anxious, depressed, or both. A global measure of neuroticism would not provide that information; instead, more specific scales are needed to provide the details. The same could be said for all five factors.

While most personality psychologists see the need for assessment of personality at this more specific level, there is no consensus about which specific traits should be measured, or even how to go about identifying the most important specific traits. Advocates of circumflex models suggest that we expand the five-factor model by measuring traits that represent combinations of pairs of factors. For example, friendliness is related to both extraversion and agreeableness, so it might merit separate assessment.

Other researchers believe that there are a large number of important traits—perhaps 100 or more—that should be separately analyzed; the five-factor model could then be used primarily to organize the results. In the Revised NEO Personality Inventory, 30 separate traits identified from an analysis of the psychological literature are measured by facet scales, and global domain scales are formed by summing groups of six of them, as shown in Table II. This scheme encourages hierarchical personality assessment at both the more specific and more general levels.

Table II. Global Domains and Specific Facets in the Revised NEO Personality Inventory

Personality Assessment Table 2

IV. Personality Assessment and Personality Theory

Human beings have always tried to understand themselves and the people around them; scientific psychology has made real, if slow, progress in this endeavor over the past century. Theories of test construction and validation and psychometric techniques have provided the technical basis for developing sound measures, and the five-factor model specifies which aspects of personality should be measured. As usual in science, there is a continuing interaction between theory and measurement: The more we know about human personality, the better our techniques for measuring it, and the better we measure it, the more we are able to refine our theories.

Psychoanalysis, behaviorism, and humanistic psychologies in turn dominated personality psychology, and each led to serious problems for personality assessment. Self-reports about thoughts, feelings, and actions were considered trivial by psychoanalysts, who believed that the important psychological variables were unconscious, and unscientific by behaviorists, who preferred observation of behavior in laboratory settings. Humanistic psychologists sometimes opposed assessment on principle, because they believed that rating individuals on a fixed set of dimensions was depersonalizing.

Trait psychology has always coexisted with these other schools, but has rarely been dominant. But personality assessments based on the principles of trait psychology have shown themselves to be scientifically defensible and useful in applied contexts. For these reasons, trait psychology and the five-factor model appear poised to be the dominant paradigm in personality psychology in the next century.


