Handbook of Research Methods in Social & Personality Psych: Summary

The Handbook of Research Methods in Social and Personality Psychology (second edition) covers conceptual and practical issues in research design to help social and personality psychologists develop and execute better research.


About the Authors:
There are different authors for each chapter because the goal of this Handbook is to provide knowledge and guidance for researchers, coming from experienced and high-authority sources.
Hence, authors vary by chapter because each topic tends to be covered by psychology researchers who are most experienced on that specific topic.

We Do Research To Scratch A Mental Itch

Many scientists are curious and motivated by a thirst for knowing, comprehending, and making sense of the world.

So behind the need to publish research in order to advance at work and eat and feed families :), they’re also driven to find the truth, to test hypotheses, to resolve a conflict among different theories, or to resolve a discrepancy:

People are wired to detect discrepancies and want to resolve them. One prime way to start a program of research is precisely to mind the cognitive gap. That is, scientists especially notice theoretical discrepancies, empirical inconsistencies, missing links in evidence, counterintuitive patterns, and all manner of knowledge that just does not fit (Fiske, 2004a).

Research Design and Issues of Validity

The definition of validity is:

Validity refers to “the best available approximation to the truth or falsity of propositions” (Cook & Campbell, 1979, p. 37).

How a study is designed and conducted affects the validity of the conclusions of course, but validity must be evaluated in light of the purposes of the research.

Internal VS External VS Construct Validity

In Campbell’s original terminology:

  • Internal validity refers to the truth value tha tcan be assigned to the conclusion that a cause-effect relationship between an independent variable and a dependent variable has been established within the context of the particular research setting. The question here is whether changes in the dependent measure were produced by variations in the independent variable (or manipulation, in the case of an experiment) in the sense that the change would not have occurred without that variation.

Threats to internal validity are any third factors that may contribute to the variations -correlation does not imply causation-. If there are third factors, then we have spurious correlation between X and Y.
Albeit holding thid variables constant helps with internal validity, the best approach is with random assignment of participants to different levels of the manipulated factor.

  • External validity referred to the generalizability of the causal finding, that is, whether it could be concluded that the same cause-effect relationship would be obtained across different participants, settings, and methods.

In simpler words: internal validity is for how truthful and thorough the study in itself is, external validity is how generalizable and truthful the study is when going from study itself to the general population and to the world, and construct refers to how truthful and thorough and a study is when compared to the dynamic and concept it examines.

Other Typs of Validity

  • Construct validity to refer to the extent to which a causal relationship could be generalized from the particular methods and operations of a specific study to the theoretical constructs and processes they were meant to represent.

Questions of external validity and construct validity, however, can rarely be addressed within the context of a single research design and require systematic, programmatic studies that address a particular question across different participants, operations, and research settings.

  • Robustness refers to whether an effect can occur across different settings and people
  • Ecological validity answers the question of “is it representative to what actually happens in real life, in the real world?

The 3 Purposes of Research

Empirical research in social psychology can be differentiated in 3 broad categories:

  • Demonstration, to empirically establish the existence of a phenomenon or relationship
  • Causation, to establish a cause-effect linkage between specific variables (i.e., “if X then Y”)
  • Explanation, to establish the processes that govern the linkage between variations in X and Y

Research Design & Statistical Significance

Research design is the systematic planning of research to permit valid conclusions.

It includes, for example:

  • Specification of the population to be studied
  • Treatments to be administered
  • Dependent variables to be measured

Design-related issues sicj as the number of participants used in a study and the way they are allocated to conditions affect statistical conclusion validity (especially statistical power).

Statistical Significance

Statistical significance of the results of an experiment indicates that we can safely generalize beyond the specific participants to other “generally similar” participants.

Power of A Study

Power of a study or test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is tue.

Power depends on the chosen level of significance, difference that we look for (effect size), variability of the measured variables, and sample size.

However, often the only element under direct researcher’s control is sample size is the.
Therefore, methods have been developed to approximate the necessary sample size to obtain the desired power.

Most studies have too low power:

Cohen (1962) found that published studies in social psychology had a median power to detect a “medium-sized” effect of only .46. This means that even researchers who are clever enough to be testing a hypothesis that actually was true and had a medium-sized effect have less than an even chance of finding a significant result.

Researchers Overestimate Power & Odds of Replication Success

researchers frequently overestimate the likelihood of a replication being successful. After all, the original study demonstrated the effect, so shouldn’t a similar study be able to also? However, Greenwald, Gonzalez, Harris, and Guthrie (1996) showed that if an original study produced a significant effect at the .05 level, the chance of obtaining significance in an exact, independent replication using the same N is quite low. Only if the first study produced p < .005 does the replication have a power level of .80 to detect the effect at the .05 level! This is an instance of researchers’ general overoptimism about power levels and their consequent tendency to run low-powered studies, which has frequently been noted and decried

Programs Are The Main Way To Advance Social Psychology

Isolated studies often have relatively little scientific impact and, in solation, they don’t allow for the development of general theories.

Single studies though can have a much larger impact when they are part of a larger group of studies and coordinated at a higher program level in order to test different sub-hypotheses of a larger hypothesis or overarching teory.

The research program is arguably the unit that is most important in the advancement of social psychology as a field.

Say the authors:

in social psychology today the most important unit of research is not the individual study (…) “real research investigations usually involve multiple studies conducted as part of an ongoing conceptual elaboration of a particular topic.”
Such a series of conceptually related studies – a research program – is most often associated with a given investigator and his or her collaborators, but may include work from several distinct laboratories.
The research program is not conveniently handled by the classic concepts of research design (power, generalizability, etc.), which focus on an individual study. Nor is it well captured by the meta-analysis movement (e.g., Hedges & Olkin, 1985), which focuses on drawing empirical conclusions from large numbers of heterogeneous studies on a given issue.
a single study can be almost definitively assumed to have good internal validity (based on its use of experimental design), but the broader forms of validity – construct and external – almost always emerge only from a series of interrelated studies that can compensate for each others’ weaknesses.
They also bring theoretical structure to a series of studies. Programmatically related studies generally focus on theoretical questions and build on one another rather than addressing superficially interesting but atheoretical questions in scattershot fashion.

Randomize Designs

The most perfect ideal conditions under which a causal effect could be observed cannot be tested.

The ideal condition would be to have the same exact person, under the same exact condition, repeat the exact same behavior, while changing only the treatment condition.

Such as, we’d need a time machine.

So, since causality cannot be observed directly, we can only develop research designs that permit us to infer causality, by making specific assumptions.

The next best thing we have is randomization:

Randomization approaches this ideal by approximately equating the treatment and control groups on all possible baseline covariates prior to any treatment delivery. Participants are assigned to treatment conditions using a method that gives every participant an equal chance of being assigned to the treatment and control conditions.
Random assignment means that the variable representing the treatment condition (treatment vs. control) can be expected on average to be independent of any measured or unmeasured variable prior to treatment.
Following random assignment, each participant then receives the treatment condition (e.g., experimental treatment vs. comparison [control] treatment) to which he or she was assigned. The responses of each participant are then measured after the receipt of the treatment or comparison condition.

Field Research Methods

Field research is characterized by a higher degree of naturalism.

When assessing the degree to which studies qualify as field studies, one must consider the naturalism of four aspects of the study: (1) participants, (2) the intervention and its target, (3) the obtrusiveness of intervention delivery, and (4) the assessed response to the intervention.

Behavior Genetic Research Methods

This chapter frankly felt overly critical of genetic research to me.

The authors seemed to dislike the “obsession” around genes and heritaiblity, which I can understand, and yet a personal dislike for how certain new methods are being used or discussed shouldn’t cast doubt on the methods themselves.

For example, the authors say that “we believe that the world would be a less confusing and contentious place without heritability coefficients, at least if one is concerned with a more complex and uncontrollable aspect of behavior than, say, milk production in cows”.
To me, that felt unnecessarily caustic and like meeting bias with more bias -or, shall we say, throwing the baby with the bathwater-.

The authors conclude that:

Other than the important task of disconfirming any remnants of blank-slate environmentalism mistakenly held over from previous eras of behaviorism or psychoanalysis, this effort was in our view not especially productive. Heritability is greater than zero for all individual differences, and takes a determinate value for none of them. Figuring out how “genetic” traits are, either in absolute terms or relative to each other, is a lost cause: Everything is genetic to some extent and nothing is completely so. There is little more to be said.

Self-Organization: Group Behavioral Patterns Transcend Individuals

The term “self-organization” is used to refer to behavioral patterns that emerge from the interactions that bind the components of a system (social or otherwise) into a collective, synergistic system, while not being dictated a priori by a centralized controller.

Implicit VS Explicit & Direct VS Indirect

Measurement outcomes can be described as:

  • Implicit when the impact of the to-be-measured psychological attribute on participants’ responses is unintentional, resource-independent, unconscious, or uncontrollable
  • Explicit when the impact of the to-be-measured psychological attribute on participants’ responses is intentional, resource-dependent, conscious, or controllable (cf. Bargh, 1994; Moors & De Houwer, 2006

For example, a measure of racial attitudes may be described as implicit if it reflects participants’ racial attitudes even when they do not have the goal to express these attitudes (i.e., unintentional) or despite the goal to conceal these attitudes (i.e., uncontrollable).


  • Direct when the measurement outcome is based on participants’ self-assessment of the to-be-measured attribute (e.g., when participants’ racial attitudes are inferred from their self-reported liking of black people).
  • Indirect when the measurement outcome is not based on a self-assessment (e.g., when participants’ racial attitudes are inferred from their reaction time performance in a speeded categorization task) or when it is based on a self-assessment of attributes other than the to-be-measured attribute (e.g., when participants’ racial attitudes are inferred from their self-reported liking of a neutral object that is quickly presented after a black face

People Who THINK Are High Self-Esteem But Are Instead Low Self-Esteem Are Defensive And More Racist

Combinations of high self-esteem on explicit measures and low self-esteem on implicit measures have been shown to predict defensive behaviors, such as favoring one’s in-group over out-groups and dissonance-related attitude change (e.g., Jordan, Spencer, Zanna, Hoshino-Browne, & Correll, 2003).

To Ensure Validity, Keep Experimenters Ignorant

“Ignorant” of what’s been measured” in the subjects, we mean.

Because experimenters can involuntarily, subtly prime the subjects to display the behavior they are being measured on:

it has long been known that experimenter’s knowledge of hypotheses can sometimes produce the hypothesized effect in often quite subtle ways (Rosenthal, 1966). Indeed, Doyen, Klein, Pichon, and Cleeremans (2012) have recently claimed that such effects as the elderly stereotype priming effect on behavior stems entirely from the experimenter’s awareness of the participant’s priming condition, and the consequent subtle differences in how those participants were treated. However, their criticism cannot account for the effect observed in Bargh et al. (1996, Study 2) precisely because in those studies, as explained earlier, the appropriate steps were taken to ensure that the experimenter remained unaware of the participant’s condition.

Also read:

Sense-Making: We Develop (False) Narratives Around Random Events

People have a tendency to weave narrative and stories around events that instead may not connected the way they think they are -or not connected at all-.

In psychology, it’s called “sense making”, and it means to interpret events in ligh of later events and/or to conform to implicit theories and beliefs (Ross, 1989).

Moving Beyond “Coefficient Alpha”

Why has alpha become the golden standard of measurement reliability?
We suspect it is the relative ease with which alpha is both obtained and computed.
Alpha is the “least effort” reliability index; it can be used as long as the same participants responded to multiple items thought to indicate the same construct.

Albeit people and scientists alike love a simple number they should meet, and albeit unluckily that’s what many researchers have often done, there is no “magical alpha threshold” that guarantees validity:

How Large Should Alpha Be? It Depends on the Construct. Students often ask questions like “my scale has an alpha of .70 – isn’t that good enough?” and they are frustrated when the answer is “that depends.” Although it would be nice to have a simple cookbook for measurement decisions, there is no particular level of alpha that is necessary, adequate, or even desirable in all contexts.

To better interpret alpha:

alpha needs to be interpreted in terms of its two main parameters – interitem correlation and scale length – and in the context of how these two parameters fit the nature and definition of the construct to be measured.

Consider a researcher who wants to measure the broad construct of extraversion, which includes sociability, assertiveness, and talkativeness, and has constructed a scale with the following items: “I like to go to parties,” “Parties are a lot of fun for me,” “I do not enjoy parties,” (reverse scored), and “I’d rather go to a party than spend the evening alone.” Note that these items are essentially paraphrases of each other and represent the same item content (liking parties) stated in slightly different ways. Cattell (1972) called these kinds of scales “bloated specifics” – they have high alphas simply because the item content is so redundant and interitem correlations are very high. Thus, alphas in the high .80s or even .90s, especially for short scales, may not indicate an impressively reliable scale but instead signal redundancy or narrowness in item content.
Although such redundant items increase alpha, they do not add unique (and thus incremental) information and can often be omitted in the interest of efficiency, suggesting that the scale can be abbreviated without much loss of information (see Robins & Hendin, 1999).

Attenuation Paradox of Alpha

This phenomenon is also known as the attenuation paradox because increasing the internal consistency of a test beyond a certain point will not enhance construct validity and may even come at the expense of validity when the added items emphasize one part of the construct (e.g., party-going) over other important parts (e.g., assertiveness).

Handbook of Research Methods in Social and Personality Psychology book cover


Obviously, this is not a book written for practical life applications.
However, some of the most “practical” wisdom include:

Brainstorming Doesn’t Work

Brainstorming is a good example of how BS spread:

There is some guy, with some made-up credentials, who speaks high power and high confidence, sharing some cool-sounding “hack” with clear rules and relatively easy to replicate and, of course, with supposedly bgi benefits.

The message spreads and the supposed opinions leaders acritically accept the BS as if it were gospel and seek to monetize from themselves -either in gains or status, or money-.

Osborn (1957) made rather extravagant claims for the efficacy of group brainstorming – for example, “the average person can think up twice as many ideas when working with a group than when working alone” (Osborn, 1957, p. 229). Unfortunately, systematic research has failed to substantiate these claims. To the contrary, a sizeable literature (see Diehl & Stroebe, 1987, Mullen, Johnson, & Salas, 1991, and Nijstad, 2009 for reviews) has consistently shown that brainstorming groups produce both fewer and poorer-quality ideas than equal-sized, identically instructed nominal groups (i.e., groups whose members work in isolation and whose total output is determined by pooling members’ output, eliminating any redundant ideas).

The possible reasons why have to do with natural group dynamics:

Substantial progress has been made in identifying the sources of this process loss in brainstorming groups, with production blocking (i.e., the fact that only one person can talk [and, perhaps, think] at a time in the face-to-face group), production matching (i.e., social comparison and modeling of low levels of productivity), and evaluation apprehension (i.e., fear of negative evaluation for voicing ideas in the group context) all emerging as contributing processes (Diehl & Stroebe, 1987; Paulus & Dzindolet, 1993; Stroebe & Diehl, 1994).

The Myers-Briggs Type Indicator Personality Test Is Unfounded

Referring to the early and least scientific approaches to personality measurements, the authors write:

At the other extreme of the dust bowl empiricists were those psychologists who had detailed theories they did not doubt.
Thus they felt free to focus solely on the content and face validity of their measures.
Variously labeled the rational, intuitive, or deductive approach, they easily generated items on the basis of their theories.
The resulting scales, face-valid with obvious item content, proved remarkably popular, if not always with other researchers then certainly with the test-taking public. In fact, this approach gave birth to the Myers-Briggs Type Indicator (MBTI; Myers & McCaulley, 1985), based partly on Carl Jung’s type theories.
Without much evidence for its external, structural, or substantive validity, the MBTI nonetheless became the most popular personality questionnaire in this country.


To the eternal embarrassment of research psychologists, the MBTI continues to be used at major research universities in applied contexts, such as counseling and career advising.


We don’t often admit this outside the family, but psychological scientists do often get ideas from personal experience.

On science being very slow to root out old and ineffective myths and supplant them with the new, more rigourous findings:

As Jacob Cohen (1990) concluded from his 40 years of research on methodology, the “inertia” of methodological advance is enormous “but I do not despair…these things take time” (p. 1311).


The Handbook of Research Methods in Social and Personality Psychology is a good if not long and sometimes overly complex textbook for psychology researchers.

The goal is to help researchers improve the quality of their work and to generally do an even better job at advancing science.
The readership is supposed to be experienced and, albeit nobody would say that, to have a high IQ. So it’s normal and expected that it’s going to be complex.
However, that could easily turn out to be a cop out because it should never be an excuse to make a text easier and faster to read and internalize as well as more engaging.
And I felt that some of the authors missed many opportunities to make the content easier to both understand and internalize when it was possible to do so.

The quality as well as the style varies among chapters because each chapter is written by a different and more experienced researcher in that specific field.

Overall, it can be good for people who are more familiar with research and research terminology to better assess studies and the quality and reliability of their results.

It’s not good for the layperson and neither for the readers of entertaining pop-pscyhology because it would be way too complex (and boring for most).

Check the best books to read or get this book on Amazon.

Scroll to Top