TWO years ago, I idly surfed my way to a harmless-seeming article from 2004 by Denny Borsboom, Gideon Mellenbergh, and Jaap van Heerden entitled The Concept of Validity. More than a decade had passed since its publication, and I had never heard of it. Egocentrically, this seemed like reason enough to surf right past it. Then I skimmed the abstract. Intrigued, I proceeded to read the first few paragraphs. By that point, I was hooked: I scrapped my plans for the next couple of hours so I could give this article my complete attention. This was a paper I needed to read immediately.
I’ve thought about The Concept of Validity every day for the past two years. I have mentioned or discussed or recommended The Concept of Validity hundreds of times. My zeal for The Concept of Validity is the zeal of an ex-smoker. The concept of validity in The Concept of Validity has led to a complete reformatting of my understanding of validity, and of measurement in general—and not just in the psychological sciences, but in the rest of the sciences, too. And those effects have oozed out to influence just about everything else I believe about science. The Concept of Validity is the most important paper you’ve probably never heard of.*
The concept of validity in The Concept of Validity is so simple that it’s a bit embarrassing even to write it down, but its simplicity is what makes it so diabolical, and so very different from what most in the social sciences of have believed validity to be for the past 60 years.
According to Borsboom and colleagues, a scientific device (let’s label it D) validly measures a trait or substance (which we will label T), if and only if two conditions are fulfilled:
(1) T must exist;
(2) T must cause the measurements on D.
That’s it. That is the concept of validity in The Concept of Validity.
What is most conspicuous about the concept of validity in The Concept of Validity is what it lacks. There is no talk of score meanings and interpretations (à la Cronbach and Meehl). There is no talk of integrative judgments involving considerations of the social or ethical consequences of how scores are put to use (à la Messick). There’s no talk of multitrait-multimethod matrixes (à la Campbell and Fiske), nomological nets (Cronbach and Meehl again), or any of the other theoretical provisos, addenda, riders, or doo-dads with which psychologists have been burdening their concepts of validity since the 1950s. Instead, all we need—and all we must have—for valid measurement is the fulfillment of two conditions: (1) a real force or trait or substance (2) whose presence exerts a causal influence on the physical state of a device. Once those conditions are fulfilled, a scientist can read off the physical changes to the device as measurements of T. And voila: We’ve got valid measurement.
Boorsboom and colleagues’ position is such a departure from 20th century notions of validity precisely because they are committed to scientific realism—a stance to which many mid-20th-century philosophers of science were quite allergic. But most philosophers of science have gotten over their aversion to scientific realism now. In general, they’re mostly comfortable with the idea that there could be hidden realities that are responsible for observable experience. Realism seemed like a lot to swallow in 1950. It doesn’t in 2017.
As soon as you commit to scientific realism, there is a kind of data you will prize more highly than any other for assessing validity, and that’s causal evidence. What a realist wants more than anything else on earth or in the heavens is evidence that the hypothesized invisible reality (the trait, or substance, or whatever) is causally responsible for the measurements the device produces. Every other productive branch of science is already working from this definition of validity. Why aren’t the social sciences?
For some of the research areas I’ve messed around with over the past few years, the implications of embracing the concept of validity in The Concept of Validity are profound, and potentially nettlesome: If we follow Borsboom and colleagues’ advice, we can discover that some scientific devices do indeed provide valid measurement, precisely because the trait or substance T they supposedly measure actually seems to exist (fulfilling Condition #1) and because there is good evidence that T is causally responsible for physical features of the device that can be read off as measurements of T (fulfilling Condition #2). In other areas, the validity of certain devices as measures looks less certain because even though we can be reasonably confident that the trait or substance T exists, we cannot be sure that changes in T are responsible for the physical changes in the device. In still other areas, it’s not clear that T exists at all, in which case there’s no way that the device can be a measure of T.
I will look at some of these scenarios more closely in an upcoming post.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111, 1061-1071.
*Weirdly, The Concept of Validity does not come up in Google Scholar. I’ve seen this before, actually. Why does this happen?