Tag Archives: experimental economics

Behavioral Altruism is an Unhelpful Scientific Category

Altruism has been a major topic in evolutionary biology since Darwin himself, but altruism (the word) did not appear even once in Darwin’s published writings.[1] The omission of altruism from Darwin’s thoughts about altruism is hardly surprising: Altruism had appeared in print for the first time only eight years before The Origin of Species. The coiner was a Parisian philosopher named Auguste Comte.

Capitalizing on the popularity he had already secured for himself among liberal intellectuals in both France and England, Comte argued that Western civilization needed a complete intellectual renovation, starting from the ground up. Not one to shrink from big intellectual projects, Comte set out to do this re-vamping himself, resulting in four hefty volumes. Comte’s diagnosis: People cared too much for their own welfare and too little for the welfare of humanity. The West, Comte thought, needed a way of doing society that would evoke less égoisme, and inspire more altruisme.

Comte saw a need for two major changes. First, people would need to throw out the philosophical and religious dogma upon which society’s political institutions had been built. In their place, he proposed we seek out new principles, grounded in the new facts emerging from the new sciences of the human mind (such as the fast-moving scientific field of phrenology), human society (sociology), and animal behavior (biology).

Second, people would need to replace Christianity with a new religion in which humanity, rather than the God of the Abrahamic religions, was the object of devotion. In Comte’s new world, the 12-month Gregorian calendar would be replaced with a scientifically reformed calendar consisting of 13 months (each named after a great thinker from the past—for example, Moses, Paul the Apostle, Gutenberg, Shakespeare, and Descartes) of 28 days each (throw in a “Day of the Dead” at the end and you’ve got your 365-day year). Also, the Roman Catholic priesthood would be replaced with a scientifically enlightened, humanity-loving “clergy” with Comte himself—no joke—as the high priest.

Comte’s proposals for a top-down re-shaping of Western society didn’t get quite the reception he was hoping for (though they caught on better than you might think: If you’re ever in Paris or Rio, pay a visit to the Temples of Humanity that Comte’s followers founded around the turn of the 19th century). In England especially, the scientific intelligentsia’s response was frosty. On the advice of his friend Thomas Huxley, Darwin also steered clear of all things Comtean, including altruism.

Nevertheless, altruism was in the air, and its warm reception among British liberals at the end of the 19th century is how the word percolated into everyday language. It’s also why the word is still in heavy circulation today. The British philosopher Herbert Spencer, an intellectual rock star of his day, was a great admirer of Comte, and he played a major role in establishing a long-term home for altruism in the lexicons of biology, social science, and everyday discourse.[2] Spencer used the term altruism in three different senses—as an ethical ideal, as a description of certain kinds of behavior, and as a description for a certain kind of human motivation. (He wouldn’t have understood how to think about it as an evolutionary concept.)[3]

Here, I want to look at Spencer’s second use of the word altruism—as a description of a class of behaviors—because I think it is a deeply flawed scientific concept, despite its wide usage. At the outset, I should note that as a Darwinian concept—an evolutionary pathway by which natural selection can create complex functional design by building traits in individuals that cause them to take actions that increase the rate of replication of genes locked inside their genetic relatives’ gonads—altruism has none of the conceptual problems that behavioral altruism has.

With Spencer’s behavioral definition of altruism, he meant to refer to “all action which, in the normal course of things, benefits others instead of benefiting self.”[4] A variant of this definition is embraced today by many economists and other social scientists, who use the term behavioral altruism to classify all “costly acts that confer benefits on other individuals.”[5] Single-celled organisms are, in principle, as capable of Spencerian behavioral altruism as humans are. Social scientists who subscribe to the behavioral definition of altruism have applied it to a wide range of human behaviors. Have you ever jumped into a pool to save a child or onto a hand grenade to spare your comrades? Donated money to your alma mater or a charity? Given money, a ride, or directions to a stranger? Served in the military? Donated blood, bone marrow, or a kidney? Reduced, re-used, or recycled? Adopted a child? Held open a door for a stranger? Shown up for jury duty? Volunteered for a research experiment? Taken care of a sick friend? Let someone in front of you in the check-out line at the grocery store? Punished or scolded someone for breaking a norm or for being selfish? Taken found property to the lost and found? Tipped a server in a restaurant in a city you knew you’d never visit again? Pointed out when a clerk has undercharged you? Lent your fondue set or chain saw to a neighbor? Shooed people away from a suspicious package at the airport? If so, then you, according to the behavioral definition, are an altruist.[6]

Some economists seek to study behavioral altruism in the laboratory with experimental games in which researchers give participants a little money and then measure what they do with it. The Trust Game, which involves two players, is a great example. We can call the first actor an Investor because he or she is given a sum of money—say, $10—by the experimenter, some or all of which he or she can send to the other actor, whom we might call the trustee. The investor knows that every dollar he or she entrusts to the trustee gets multiplied by a fixed amount—say, 3—so if the investor transfers $1 to the trustee, the trustee now has $3 more in his or her account as a result of the investor’s $1 transfer. Likewise, the investor knows that the trustee will subsequently decide whether to transfer some money back. Under these circumstances, according to some experimental economists, if the Investor sends money to the Trustee, it is “altruistic” because it is a “costly act that confers an economic benefit upon another individual.”[7] But the lollapalooza of behavioral altruism doesn’t stop there: It’s also altruistic, per the behavioral definition that economists embrace, if the Trustee transfers money back to the Investor. Here, too, one person is paying a cost to provide a benefit to another person.

Notice that motives don’t matter for behavioral altruism. (To social psychologists like Daniel Batson, altruism is a motivation to raise the welfare of another individual, pure and simple. Surprising as it might seem, this is also, in fact a conceptually viable scientific category. But that’s another blog post.) All that matters for a behavior to be altruistic is that it entails costs to actors and benefits to recipients. Clearly, donating a kidney or donating blood are costly to the donor and beneficial to the recipients, but even when you hold a door open for a stranger, you pay a cost (a few seconds of your time and a calorie or so worth of physical effort) to deliver a benefit to someone else. By this definition, even an insurance company’s agreement to cover the MRI for your (possibly) torn ACL qualifies: After all, the company pays a cost (measured in the thousands of dollars) to provide you with a benefit (magnetic confirmation either that you need surgery or that your injury will probably get better after a little physical therapy).

But a category that lumps together recycling, holding doors for strangers, donating kidneys, serving in the military, and handing money over to someone in hopes of securing a return on one’s investment—simply because they all involve costly acts that confer benefits on others—is a dubious scientific category. Good scientific categories, unlike “folk categories,” are natural kinds—as Plato said, they “carve nature at its joints.” Rather than simply sharing one or more properties that are interesting to a group of humans (for example, social scientists who are interested in a category called “behavioral altruism”), they should share common natural essences, common causes, or common functions. Every individual molecule with the chemical formula H2O is a member of a natural kind—water—because they all share the same basic causes (elements with specific atomic numbers that interact through specific kinds of bonds). These deep properties are the causes of all molecules of H2O that have ever existed and that ever will exist. Natural kinds are not just depots for things that have some sort of gee-whiz similarity.[8]

If behavioral altruism is a natural kind, then knowing that a particular instance of behavior is “behaviorally altruistic” should enable me to draw some conclusions about its deep properties, causes, functions, or effects. But it doesn’t. All I know is that I’ve done something that meets the definition of behavioral altruism. Even though I have, on occasion, shown up for jury duty, held doors open for strangers, received flu shots, loaned stuff to my neighbors, and even played the trust game, simply knowing that they are all instances of “behavioral altruism” does not enable me to make any non-trivial inferences about the causes of my behavior. By the purely behavioral definition of altruism, I could show up for jury duty to avoid being held in contempt of court, I could give away some old furniture because I want to make some space in my garage, and I could hold the door for someone because I’m interested in getting her autograph. The surface features that make these three behaviors “behaviorally altruistic” are, well, superficial. Knowing that they’re behaviorally altruistic gives me no new raw materials for scientific inference.

So if behavioral altruism isn’t a natural kind, then what kind of kind is it? Philosophers might call it a folk category, like “things that are white,” or “things that fit in a bread box,” or “anthrosonic things,” which comprise all of the sounds people can make with their bodies—for example, hand-claps, knuckle- and other joint-cracking, the lub-dub of the heart’s valves, the pitter-patter of little feet, sneezes, nose-whistles, coughs, stomach growls, teeth-grinding, and beat-boxing. Anthrosonics gets points for style, but not for substance: My knowing that teeth-grinding is anthrosonic does not enable me to make any new inferences about the causes of teeth-grinding because anthrosonic phenomena do not share any deep causes or functions.

Things that are white, things that can fit in a bread box, anthrosonics, things that come out of our bodies, things we walk toward, et cetera–and, of course, behavioral altruism–might deserve entries in David Wallechinsky and Amy Wallace’s entertaining Book of Lists[9], but not in Galileo’s Book of Nature. They’re grab-bags.

~

[1] Dixon (2013).
[2] Spencer (1870- 1872, 1873, 1879).
[3] Dixon (2005, 2008, 2013).
[4] Spencer (1879), p. 201.
[5] Fehr and Fischbacher (2003), p. 785.
[6] See, for instance, Silk and Boyd (2010), Fehr and Fischbacher (2003); Gintis, Bowles, Boyd, & Fehr (2003).
[7] Fehr and Fischbacher (2003), p. 785.
[8] Slater and Borghini (2011).
[9] Wallechinsky, Wallace, and Wallace (2005).

REFERENCES

Dixon, T. (2005). The invention of altruism: August Comte’s Positive Polity and respectable unbelief in Victorian Britain. In D. M. Knight & M. D. Eddy (Eds.), Science and beliefs: From natural philosophy to natural science, 1700-1900 (pp. 195-211). Hampshire, England: Ashgate.

Dixon, T. (2008). The invention of altruism: Making moral meanings in Victorian Britain. Oxford, UK: Oxford University Press.

Dixon, T. (2013). Altruism: Morals from history. In M. A. Nowak & S. Coakley (Eds.), Evolution, games, and God: The principle of cooperation (pp. 60-81). Cambridge, MA: Harvard University Press.

Fehr, E., & Fischbacher, U. (2003). The nature of human altruism. Nature, 425, 785-791.

Gintis, H., Bowles, S., Boyd, R., & Fehr, E. (2003). Explaining altruistic behavior in humans. Evolution and Human Behavior, 24, 153-172.

Silk, J. B., & Boyd, R. (2010). From grooming to giving blood: The origins of human altruism. In P. M. Kappeler & J. B. Silk (Eds.), Mind the gap: Tracing the origins of human universals (pp. 223-244). Berlin: Springer Verlag.

Slater, M. H., & Borghini, A. (2011). Introduction: Lessons from the scientific butchery. In J. K. Campbell, M. O’Rourke, & M. H. Slater (Eds.), Carving nature at its joints: Natural kinds in metaphysics and science (pp. 1-31). Cambridge, MA: MIT Press.

Spencer, H. (1870- 1872). Principles of psychology. London: Williams and Norgate.

Spencer, H. (1873). The study of sociology. London: H. S. King.

Spencer, H. (1879). The data of ethics. London: Williams and Norgate.

Wallechinsky, D., & Wallace, A. (2005). The book of lists: The original compendium of curious information. Edinburgh, Scotland: Canongate Books.

The Myth of Moral Outrage

This year, I am a senior scholar with the Chicago-based Center for Humans and Nature. If you are unfamiliar with this Center (as I was until recently), here’s how they describe their mission:

The Center for Humans and Nature partners with some of the brightest minds to explore humans and nature relationships. We bring together philosophers, biologists, ecologists, lawyers, artists, political scientists, anthropologists, poets and economists, among others, to think creatively about how people can make better decisions — in relationship with each other and the rest of nature.

In the year to come, I will be doing some writing for the Center, starting with a piece I that has just appeared on their web site. In The Myth of Moral Outrage, I attack the winsome idea that humans’ moral progress over the past few centuries has ridden on the back of a natural human inclination to react with a special kind of anger–moral outrage–in response to moral violations against unrelated third parties:

It is commonly believed that moral progress is a surfer that rides on waves of a peculiar emotion: moral outrage. Moral outrage is thought to be a special type of anger, one that ignites when people recognize that a person or institution has violated a moral principle (for example, do not hurt others, do not fail to help people in need, do not lie) and must be prevented from continuing to do so . . . Borrowing anchorman Howard Beale’s tag line from the film Network, you can think of the notion that moral outrage is an engine for moral progress as the “I’m as mad as hell and I’m not going to take this anymore” theory of moral progress.

I think the “Mad as Hell” theory of moral action is probably quite flawed, despite the popularity that it has garnered among may social scientists who believe that humans possess “prosocial preferences” and a built-in (genetically group-selected? culturally group selected?) appetite for punishing norm-violators. I go on to describe the typical experimental result that has given so many people the impression that we humans do indeed possess prosocial preferences that motivate us to spend our own resources for the purpose of punishing norm violators who have harmed people whom we don’t know or otherwise care about. Specialists will recognize that the empirical evidence that I am taking to task comes from that workhorse of experimental economics, the third-party punishment game:

…[R]esearch subjects are given some “experimental dollars” (which have real cash value). Next, they are informed that they are about to observe the results of a “game” to be played by two other strangers—call them Stranger 1 and Stranger 2. For this game, Stranger 1 has also been given some money and has the opportunity to share none, some, or all of it with Stranger 2 (who doesn’t have any money of her own). In advance of learning about the outcome of the game, subjects are given the opportunity to commit some of their experimental dollars toward the punishment of Stranger 1, should she fail to share her windfall with Stranger 2.

Most people who are put in this strange laboratory situation agree in advance to commit some of their experimental dollars to the purpose of punishing Stranger 1’s stingy behavior. And it is on the basis of this finding that many social scientists believe that humans have a capacity for moral outrage: We’re willing to pay good money to “buy” punishment for scoundrels.

In the rest of the piece, I go on to point out the rather serious inferential limitations of the third-party punishment game as it is typically carried out in experimental economists’ labs. I also point to some contradictory (and, in my opinion, better) experimental evidence, both from my lab and from other researchers’ labs, that gainsay the widely accepted belief in the reality of moral outrage. I end the piece with a proposal for explaining what the appearance of moral outrage might be for (in a strategic sense), even if moral outrage is actually not a unique emotion (that is, a “natural kind” of the type that we assume anger, happiness, grief, etc. to be) at all.

I don’t want to steal too much thunder from the Center‘s own coverage of the piece, so I invite you to read the entire piece over on their site. Feel free to post a comment over there, or back over here, and I’ll be responding in both places over the next few days.

As I mentioned above, I’ll be doing some additional writing for the center in the coming six months or so, and I’ll be speaking at a Center event in New York City in a couple of months, which I will announce soon.

The Trouble with Oxytocin, Part III: The Noose Tightens for The Oxytocin–>Trust Hypothesis

https://i0.wp.com/media-cache-ak0.pinimg.com/736x/2b/1f/9b/2b1f9b4e930d47f31b1f7f3aecd0b0cf.jpgMight be time to see about having that Oxytocin tattoo removed…

When I started blogging six months ago, I kicked off Social Science Evolving with a guided tour of the evidence for the hypothesis that oxytocin increases trusting behavior in the trust game (a laboratory workhorse of experimental economics). The first study on this topic, authored by Michael Kosfeld and his colleagues, created a big splash, but most of the studies in its wake failed to replicate the original finding. I summarized all of the replications in a box score format (I know, I know: Crude. So sue me.) like so:

Box Score_Dec2013By my rough-and-ready calculations, at the end of 2013 there were about 1.25 studies’ worth of successful replications of the original Kosfeld results, but about 3.75 studies’ worth of failed replications (see the original post for details). Even six months ago, the empirical support for the hypothesis that oxytocin increases trust in the trust game was not looking so healthy.

I promised that I’d update my box score as I became aware of new data on the topic, and a brand new study has just surfaced. Shuxia Yao and colleagues had 104 healthy young men and women play the trust game with four anonymous trustees. One of those four trustees (the “fair” trustee) returned enough of the subject’s investment to cause the subject and the trustee to end up with equal amounts of money; the other three trustees (designated as the “unfair players”) declined to return any money to the subject at all.

Next, subjects were randomly assigned to receive either the standard dose of intranasal oxytocin, or a placebo. Forty-five minutes later, participants were told that they would receive an instant message from the four players to whom they had entrusted money during the earlier round of the trust game. The “fair” player from the earlier round, and one of the “unfair” players, sent no message at all. The second unfair player sent a cheap-talk sort of apology, and the third unfair player offered to make a compensatory monetary transfer to the subject that would make their payoffs equal.

Finally, study participants took part in a “surprise” round of the trust game with the same four strangers. The researchers’ key question was whether the subjects who had received oxytocin would behave in a more trusting fashion toward the four players from Round 1 than the participants who received a placebo instead.

They didn’t.

In fact, the only hint that oxytocin did anything at all to participants’ trust behaviors was a faint statistical signal that oxytocin caused female participants (but not male participants) to treat the players from Round 1 in a less trusting way. If anything, oxytocin reduced women’s trust. I should note, however, that this females-only effect for oxytocin was obtained using a statistically questionable procedure: The researchers did not find a statistical signal of an interaction between oxytocin and subjects’ sex, and without such a signal, their separation of the men’s and the women’s data for further analyses really wasn’t licensed. But regardless, the Yao data fail to support the idea that oxytocin increases trusting behavior in the trust game.

It’s time to update the box score:

Box_Score_Jun2014

In the wake of the original Kosfeld findings, 1.25 studies worth of results have accumulated to suggest that oxytocin does increase trust in the trust game, but 4.75 studies worth of results have accumulated to suggest that it doesn’t.

It seems to me that the noose is getting tight for the hypothesis that intransasal oxytocin increases trusting behavior in the trust game. But let’s stay open-minded a while longer. As ever, if you know of some data out there that I should be including in my box score, please send me the details. I’ll continue updating from time to time.

The Trouble with Oxytocin, Part II: Extracting the Truth from Oxytocin Research

Two weeks ago, the Society for Personality and Social Psychology (SPSP) held its annual meeting in Austin, TX. I tried to get there myself, as I had been invited to give a talk on the measurement of oxytocin in social science research as part of the “Social Neuroendocrinology” pre-conference. However, some things were brewing on the home front that kept me in Miami. Undeterred, the pre-conference organizers arranged for me to give my talk via Skype, which worked out reasonably well.

In this essay, I’ve turned some of that talk into the second installment in my “The Trouble with Oxytocin” series (the first installment is here). It’s a bit wonkish, focusing as it does on the importance of a bioanalytical technique called extraction, but it’s an important topic nonetheless. Many of the social scientists who are studying oxytocin have decided that they can skip this step entirely. As a result of their decision to take this shortcut, it’s quite possible that many scientific claims about the personality traits, emotions, and relationship factors that influence circulating oxytocin levels are—how to put this diplomatically?—without adequate basis in fact. I’ll substantiate this claim anon, but first, a bit of nomenclature.

A Bit of Nomenclature

Applied researchers generally measure oxytocin in bodily fluids by immunoassay—a technique so ingenious that the scientists who developed it received a Nobel Prize in 1977. Simplifying greatly, to develop an immunoassay for Substance X, you inject animals (probably rabbits) with Substance X and wait for the animal(s) to produce an immune reaction. To the extent that one of the antibodies an animal produces in response to Substance X is sensitive to Substance X, but not to other substances that can masquerade as Substance X, you may be in a position to conclude that you have successfully produced a “Substance X antibody.” With that antibody in hand, you’ve got the most important ingredient for developing an immunoassay.

Antibodies can be used to make several types of immunoassays, but two types are prominent in the oxytocin field: Radioimmunoassays (RIA) and Enzyme-Linked Immunosorbent Assays (ELISA, or EIA). Both methods are widely accepted (although ELISAs don’t require the analysts to handle radiation—a benefit to be sure). I wanted to familiarize you with these terms here at the outset only because I don’t want my toggling back and forth between them to distract you. The focal issue for our purposes here is the issue of extraction.

To Be Exact, You Must Extract

Extraction is a set of preliminary processes an analyst can use to separate Substance X from other substances in a sample of (for instance) blood plasma that might interfere with the immunoassay’s ability to quantify precisely how much Substance X is in the sample. I’m going to skip the details, but you can read up here. Antibodies can bind to all sorts of substances that are not Substance X (for example, proteins, other peptides, or their degradation products) if you’re not careful to remove that other stuff first. More relevant for our purposes here, researchers have known for a really long time that a failure to extract before conducting immunoassays for plasma oxytocin will result in profound overestimates of how much oxytocin is actually in the sample.

This is not some well-kept industry secret. The manufacturers of some of the more widely used commercial ELISAs have been admonishing the users of their assays to extract samples since at least 2007. Below is a snip from an instruction manual bearing a 2006 copyright. (The admonition gets repeated in this 2013-copyright instruction manual also):

Instruction Manual

What the manufacturers are showing here (see the two columns of data on the left) is that when they performed their oxytocin assay on a sample of human blood plasma without performing an extraction step, they read off an oxytocin concentration of 2,761 pg/ml (picograms [10-12 grams] per milliliter). When they performed the extraction step on the same sample, they got a value of 3.4 pg/ml—three orders of magnitude smaller. Plain English translation: “There are some substances in human blood plasma that fool our antibody into believing they’re oxytocin molecules. You’d better get rid of those imposters before you run our assay on your sample. After you do that, we think you’ll be OK.” Keep this value of 3.4 pg/ml in mind. As I’ll show you below, it’s the sort of value, more or less, that one ought to be expecting from assays that actually measure oxytocin.

Like I say, the need for extraction is no secret. Basic biological researchers who study oxytocin have been extracting their samples since The Waltons had a prime-time slot on CBS. But extraction takes a lot of time, so it is expensive. Perhaps this is why a team of researchers started to skip the extraction step in the early 2000s.[1] In no time at all, other social scientists were following in their footsteps, and with that, a Pandora’s box was opened. Most social scientists just stopped extracting, often citing the originators of this custom to justify their choice.

In what follows, I’ll chronicle what happened to the social science literature on oxytocin as a result of this fateful methodological choice. Table 1, below, is from a paper that Armando Mendez, Pat Churchland, and I published last year.[2] It illustrates the typical oxytocin values one can expect to see in samples of extracted plasma measured by radioimmunoassay versus the values one can expect to see when using one of the commercial ELISAs on raw (i.e., unextracted) plasma.

MCA Table 1.jpgFrom McCullough, Churchland, and Mendez (2013)

A few things stand out in Table 1. First, when you measure oxytocin in blood plasma using RIA on extracted samples, you typically find that healthy, non-pregnant women and men have oxytocin levels of somewhere between 0 and 10 picograms per milliliter of blood plasma. This is consistent with that value of 3.4 pg/ml that I suggested you keep in mind from the 2006 instructions that came with that assay kit.

Below are some values that Ben Tabak, our neuroscience/biochemistry colleagues, and I obtained on 35 women whose oxytocin we measured in five different samples of plasma. Mean values were in the 1-2 picogram range.[3]

Tabak ValuesAdapted from Tabak et al., (2011)

The Tabak et al. (2011) sample was small. We had oxytocin values for only a few dozen women, so I won’t be offended if you don’t want to place too much trust in them, but here are some values that Tim Smith and his colleagues obtained with an RIA on extracted samples from 180 male-female couples: Again, their mean values hovered around 1-2 picograms per milliliter. [4]

Smith DataFrom Smith et al., 2013

So this is very reassuring.  The values that we got, and the values that Smith and his colleagues got, are very consistent with the 1-10 pg/ml range that we’ve come to expect over the past 35 years.

MCA Table 1.jpgFrom McCullough, Churchland, and Mendez (2013)

But now take a look the right side of Table 1 above to see what happens when you assay plasma for oxytocin using commercial ELISAs without extraction. It doesn’t matter whether you’re studying healthy non-pregnant women, healthy non-pregnant men, pregnant women, or new mothers: You’re going to get mean oxytocin values in the 200-400 pg/ml range, that is, values that are 100 to 200 times higher than what you get with RIAs on extracted samples.

Consider, for instance, the data below, which come from this paper, which the authors accurately described in the abstract as “[u]tilizing the largest sample of plasma OT to date (N = 473).” They found a mean value for men of approximately 400 pg/ml and a mean value for women of around 359 pg/ml.[5]

Weisman CurvesFrom Weisman et al. (2013)

Mean values of 200, 300, and 400 pg/ml for oxytocin in unextracted plasma are not exceptions to an otherwise orderly corpus of findings. They are what you should expect to find if you perform an oxytocin assay without extraction. For instance, the data below, from this paper show the sorts of oxytocin values you can expect to find in the plasma of pregnant and recently pregnant women when you use ELISA on raw plasma:[6]

Feldman ValuesFrom Feldman et al. (2007)

The values above are measured in picomolars rather than in pg/ml, but oxytocin has a molecular mass of 1007 Daltons, so by sheer coincidence one picomolar of oxytocin is roughly equivalent to one pg/ml. In other words, these authors also got mean values for oxytocin using an ELISA on raw plasma that are way too high—and look at the upper end of those ranges—3,648 pg/ml! There’s just no good reason for believing that there could be 300 picograms of OT—much less 3,648—in a milliliter of blood plasma.

Why are these ELISAs giving such high values? There’s nothing wrong in principle with using an ELISA to measure OT in plasma, even though some of the commercial assays have used antibodies whose sensitivity and specificity is far from ideal. (This is an extremely important issue, by the way, but not the one to tackle here.) Instead, the predominant reason why researchers are getting such wacky values from these ELISAs is that they’re skipping the extraction step.

How do I know? Because I know what happens if you do extract your samples before you assay them via ELISA. Our research group found that when you extract your samples before you analyze them with a certain commercial ELISA kit, the mean values drop from somewhere around 358 pg/ml to somewhere around 1.8 pg/ml—just as you’d expect, given the admonitions in the manufacturer’s instructions.[7] And here are some extracted values that Karen Grewen and her colleagues got for 20 healthy breastfeeding mothers when they used the same ELISA that gave Weisman et al. those values in the 300-400 pg/ml range for raw plasma.[8] ELISAs can give plausible values if you extract first.

Grewen ValuesFrom Grewen, Davenport, and Light (2010)

Estimating OT from Unextracted Samples: Is There Any Signal Amidst the Noise?

Of course, none of this would matter very much if there were some way to statistically transform the OT values you obtain from unextracted plasma into the values you would have obtained from extracted plasma, but that doesn’t seem to be the case: The evidence currently available suggests that the values from the two methods are, quite possibly, uncorrelated.

We looked at this issue in our 2011 paper.[7] We had 39 plasma samples, which we analyzed with one of the most widely used commercial ELISAs, both before and after extraction. The correlation coefficients ranged from .09 to -.14, depending on distributional assumptions. Kelly Robinson and her colleagues just came to the same conclusion with their own data—52 samples of blood plasma from seals.[9] In fairness, I have to acknowledge another study that revealed a very high correlation between the oxytocin values derived from extracted samples versus those obtained from unextracted samples (0.89), but that study was based on very little data (11 samples of blood serum, rather than plasma, from Rhesus monkeys), so it would be a mistake to give it too much weight.[10]

Conclusion

So, what shall we conclude about oxytocin assays on unextracted plasma, given the data we have to go on at this point? Well, on the plus side, raw plasma is cheaper and quicker to assay than extracted plasma. Nobody disputes that. On the minus side, if you don’t extract those samples before you assay them, you apparently convert those ingenious oxytocin assays into random number generators, and there are cheaper ways to generate random numbers.

For ten years, many social scientists who study oxytocin have been side-stepping an expensive but evidently crucial extraction step. If you’ve come to believe that the trust of a stranger, or sharing a secret, or sensitive parenting, or mother-infant bonding, or your mental health, can influence (or is influenced by) how much oxytocin is coursing through your veins, you might want to take a second look. Chances are, those findings came from studies that used immunoassays on unextracted plasma (it’s easy to know for sure: just check the papers’ Method sections), and if so, there’s little compelling reason to think the results are accurate.

Now, if any researchers out there have data that can prove that we should be taking the results from immunoassays on unextracted samples at face value, they would do the field a great favor to make those results public, and at that point I will happily concede that all my worrying has been for nought. Even better, perhaps someone could conduct a large, pre-registered study on the correlation of OT values from extracted versus raw plasma. Pre-registration is easy (for example, here), and would increase the inferential value of such a study immensely. In any case, more data on this topic would be most welcome. I, for one, would love to know whether we should be taking the results of studies on raw plasma seriously, or whether we’d be better off by dragging them into the recycle folder.

References

1.         Kramer, K.M., et al., Sex and species differences in plasma oxytocin using an enzyme immunoassay. Canadian Journal of Zoology, 2004. 82: p. 1194-1200.

2.         McCullough, M.E., P.S. Churchland, and A.J. Mendez, Problems with measuring peripheral oxytocin: Can the data on oxytocin and human behavior be trusted? Neuroscience and Biobehavioral Reviews, 2013. 37: p. 1485-1492.

3.         Tabak, B.A., et al., Oxytocin indexes relational distress following interpersonal harms in women. Psychoneuroendocrinology, 2011. 36: p. 115-122.

4.         Smith, T.W., et al., Effects of couple interactions and relationship quality on plasma oxytocin and cardiovascular reactivity: Empirical findings and methodological considerations. International Journal of Psychphysiology, 2013. 88: p. 271-281.

5.         Weisman, O., et al., Plasma oxytocin distributions in a large cohort of women and men and their gender-specific associations with anxiety. Psychoneuroendocrinology, 2013. 38: p. 694-701.

6.         Feldman, R., et al., Evidence for a neuroendocrinological foundation of human affiliation: Plasma oxytocin levels across pregnancy and the postpartum period predict mother-infant bonding. Psychological Science, 2007. 18: p. 965-970.

7.         Szeto, A., et al., Evaluation of enzyme immunoassay and radioimmunoassay methods for the measurement of plasma oxytocin. Psychosomatic Medicine, 2011. 73: p. 393-400.

8.         Grewen, K.M., R.E. Davenport, and K.C. Light, An investigation of plasma and salivary oxytocin responses in breast- and formula-feeding mothers of infants. Psychophysiology, 2010. 47: p. 625-632.

9.         Robinson, K.J., et al., Validation of an enzyme-linked immunoassay (ELISA) for plasma oxytocin in a novel mammal species reveals potential errors induced by sampling procedure. Journal of Neuroscience Methods, in press.

10.       Michopoulos, V., et al., Estradiol effects on behavior and serum oxytocin are modified by social status and polymorphisms in the serotonin transporter gene in female rhesus monkeys. Hormones and Behavior, 2011. 58: p. 528-535.

The Trouble with Oxytocin, Part I:
Does OT Actually Increase Trusting Behavior?

It’s the holiday season, when many people try to clear a little mental space for thoughts about peace on earth and good will toward humanity. In this spirit, I thought I’d inaugurate this blog with a close look at an endocrine hormone that, according to some researchers, can promote trust, generosity, empathy, and, yes, even world peace. I’m referring, of course, to oxytocin (OT).

I’ve been involved with a few research projects on OT over the past few years, mostly in collaboration with my former PhD student Ben Tabak (plus some other colleagues here in Miami), but I’ve made no secret of my concerns about the validity of the techniques that scientists use to measure and manipulate OT experimentally. I also remain unconvinced that intranasally administered OT even makes it into the human brain in the first place. (Many experts think the brain is involved in the control of behavior, so this particular gap in our scientific knowledge seems to me like a problem that OT researchers should be taking a lot more seriously.)

I’ll probably write about these issues in the future, but for now I want to look closely at a much more circumscribed OT-related idea that took the scientific world by storm a few years back. This is the notion that spraying a little OT up people’s noses causes them to become more trusting toward strangers. Let’s look at the initial test of this hypothesis, as well as the evidence that emerged in the wake of the initial experiment, with the goal of estimating the strength of the evidence both for, and against, this charming idea.

 The Kosfield (2005) Experiment

In the very first experiment on oxytocin’s effect on trusting behavior, which bore the definitive title “Oxytocin increases trust in humans” [1], Kosfeld and colleagues randomly assigned 58 healthy men to receive either OT, or an equivalent amount of placebo, via a nasal spray. After the sprays had been given a chance to “kick in” (50 minutes), participants played four rounds (each time with different partners) of the Trust Game—one of the workhorses of experimental economics. The Trust Game is a two-player game in which one player takes on the role of the Investor (these are the subjects whose oxytocin-influenced behavior matters for our purposes here), and the other takes on the role of the Trustee. The Trust Game is hard to describe succinctly, but the Kosfeld paper has a helpful illustration.

Trust Game_Kosfeld

The Trust Game is a two-stage game. In Stage 1, the Investor chooses how much money (in the Kosfeld experiment, either 0, 4, 8, or 12 “monetary units,” or “MU”) from a bolus 12 of MUs (which the experimenter provides) to transfer to an anonymous Trustee. (Participants are told that these MUs will be converted into real cash after the experiment ends.) The experimenters typically triple the transfer on its way to the Trustee. As a consequence, if the Investor sends 4 MU to the Trustee from her bolus of 12 MU (second branch from the left, marked “4”), the Trustee will finish Stage 1 with her original 12 MU, plus the additional 4 MU * 3 = 12 MU that result from the 4-MU transfer from the Investor (after the experimenters multiply that transfer by 3). In contrast, the Investor will be left with 12 – 4 = 8 MU at the end of Stage 1.

In Stage 2, the Trustee is given a choice to send as much or as little of her 24 MU back to the Investor as she wishes. This is called a back-transfer. If the Trustee chooses to send 0 back, she keeps all 24 MU for herself. Anything she does sends back to the Investor gets subtracted from the Trustee’s 24 MUs, and is added to the 8 MU that remained in the Investor’s account at the end of Stage 1. The game is called the trust game under the assumption that people generally like money and prefer to have as much of it as possible. Under this assumption, it does make sense to conceptualize Investors’ choices about how much to send to their Trustees during Stage 1 as measures of their trust that the Trustees will reciprocate during Stage 2.

So, the key question is this: Did OT increase Investors’ Stage 1 transfers in the Kosfeld experiment? That is, did OT increase their trusting behavior? Here’s what the authors wrote: “The investors’ average transfer is 17% higher in the oxytocin group (Mann-Whitney U-test; z = -1.897, P = 0.029, one-sided), and the median transfer in the oxytocin group is 10MU, compared to a median of only 8MU for subjects in the placebo group” (p. 674). The figure below, also from the Kosfeld paper, shows the distribution of transfers for the OT group and the placebo group.

OT_TRUST_DISTRIB_KOSFELD_CORRECT

Look at the far right side of the figure: The difference in the percentages of participants in the OT and placebo conditions who transferred all of their MUs (12) to their four Trustees is really quite arresting. The authors summarize this result on p. 647: “Out of the 29 subjects, 13 (45%) in the oxytocin group showed the maximal trust level [that is, they entrusted all of their MUs to their Trustees on all 4 rounds], whereas only 6 of the 29 subjects (21%) in the placebo group showed maximal trust.” Mind you, a statistical purist would likely have winced at the researchers’ use of a one-tailed statistical test—especially since the difference in the distributions for the two groups would not have registered as statistically significant at p < .05 (which signals that the results would be expected less than 5% of the time in a world in which the null hypothesis is true) with a two-tailed test. Nevertheless, just by looking at the figure you can understand why the authors got excited by their data.

The Kosfeld paper has become a citation classic. Google Scholar tells me that it has been cited 1,673 times as of today (by means of comparison, Watson and Crick’s 1953 Nature paper on the structure of DNA, which has also been sort of important for science, has been cited 9,130 times). But is it correct? That is to say, are the Kosfeld findings robust enough to license the conclusion that oxytocin really does increase trust in humans? Allow me to lay out the post-Kosfeld evidence so you can make up your own mind. I have located five post-Kosfeld experiments that examined the effects of intranasal OT on trusting behavior in the trust game, and I restrict my remarks to those experiments only. (I’m ignoring studies on people’ s self-reported trust of strangers, for example, as well as a few other experiments that have used experimental games other than the trust game.) I have scored each of these five replication experiments as either a successful replication or a failure to replicate (or some admixture of success and failure). (Caveat lector: None of these studies is an exact replication of Kosfeld).

The Post-Kosfeld Experiments

Replication 1: Baumgartner et al. (2008) In 2008, Baumgartner and colleagues ran a reasonably close replication of the Kosfeld experiment, though they modified the protocol so participants could play the trust games while their brains were being scanned via fMRI.[2] Forty-nine men, randomly assigned to receive either OT or placebo, played a series of six trust games (interleaved with six other kinds of games, which I’m ignoring) with anonymous partners. At the end of the first six trust games, Investors received the feedback that only 50% of their Trustees had made back-transfers. After this disappointing feedback, the Investors played six new trust games (interleaved with some other games) with six new anonymous partners. The figure below, from the supplemental online materials for the paper, shows the main results.

Baumgartner_FIGURE_SOI

As you can see on the left side of the figure, OT did not meaningfully increase trust during the first six “Pre-Feedback” rounds. Baumgartner mostly ignored those results, however, and focused instead in their discussion on the right side of the figure: In the six “Post-Feedback” Trust Games, OT participants entrusted significantly more money to their Trustees, on average, than did the placebo participants.

But it seems to me that we, as dispassionate consumers, are ill-advised to discount the lack of OT-vs.-placebo differences on the Pre-Feedback rounds: I myself am going to score them as an unambiguous  “failure to replicate.” Nevertheless, it’s nearly Christmas, and science would stop progressing if we were unwilling to open our minds to new ideas, so I’m happy to score the results from the post-feedback rounds as a “successful replication” of Kosfield. I am going to score Baumgartner, then, as a 50% successful replication and a 50% failure to replicate.

Replication 2: Mikolajczak et al. [3] Mikolajczak and colleagues randomly assigned 60 healthy men to either OT or placebo, and then had them play ten trust games with partners who had been described as “reliable,” and ten with partners who had been described as “unreliable” (and some other trials that aren’t directly relevant here). Men in the OT group entrusted more money, on average, to partners who had been described as “reliable” than did men in the placebo group, although there were was no OT-vs.-placebo difference in the amounts entrusted to partners who had been described as “unreliable.” The results for the “reliable” partners can be interpreted as a reasonably successful replication of Kosfeld, and a good story can be told for why the results for “unreliable” partners are not a failure to replicate Kosfeld, but I’m not sure whether we can just ignore the lack of OT effects for unreliable partners entirely. I am going to score Mikolajczak as a 75% successful replication and a 25% failure to replicate. I admit that this is a hard one to call, though, and other people of good will could come to different conclusions about how to score this study.

Replication 3: Barraza (2010). Jorge Barraza [4] found that 44 healthy men who received OT did not invest more money in four consecutive trust games than did 22 men who received placebo (disclosure: I was an outside reader of Jorge’s dissertation, and co-authored a paper based on some of the results he obtained during that work). I’m calling this one a 100% failure to replicate. Take note that Investors played their four games with a single anonymous partner, with feedback on the back-transfers after each game, which makes this experiment a bit different from the others included here. Even so, it’s a mistake to exclude Barraza if we want to know whether Kosfeld and colleagues were right to claim that “Oxytocin increases trust in humans.”

Replications 4 and 5: Klackl et al. (2012) and Ebert et al. (2013). Only two more to go. Klackl and colleagues performed a fairly close replication of the 2008 Baumgartner paper with 40 healthy men (sans fMRI) and found that participants who received OT did not, on average, send more money to partners during six pre-feedback games, or during six post-feedback games.[5] (This study, therefore, is not only a failure to replicate Kosfeld, but also a failure to replicate Baumgartner.) Finally, Ebert et al. found that 26 people (13 who had been diagnosed with Borderline Personality Disorder and 13 non-diagnosed controls; mostly women) were no more trusting of 20 strangers in a series of trust games following OT administration than they were following administration of a placebo (all 26 participants did OT trials on one occasion, and placebo trials on another occasion, with counterbalancing).[6] On this basis, I’m calling Ebert, too, a 100% failure to replicate.

Summing Up

So, does OT increase trust in humans? The Kosfeld experiment found a faint statistical signal (remember, p = .029, one-tailed) for an effect of OT across a series of trust games with different Trustees, but statistical hard-liners who would insist on a p value less than .05—two-tailed—might reasonably argue that Kosfeld did not even find a phenomenon in need of replication to begin with. That said, the post-feedback rounds from Baumgartner look quite consistent with the claim that OT increases trusting behavior, as do Mikolajczak’s results for “reliable” partners (though I can’t convince myself to call Mikolajczak a 100% successful replication because of the failure to find effects for the “unreliable” partners). On the other hand, the pre-feedback rounds from Baumgartner, and the results from Barraza, Klackl, and Ebert, look to me like out-and-out failures to replicate Kosfeld.  (Plus, I’m going to weight 25% of the Mikolajczak results as a failure to replicate; again, I don’t think we can just ignore the lack of effects for unreliable partners, or pretend that the original Kosfeld hypothesis explicitly entails such a pattern.)

Adding up these scores, then, leads me to conclude that the original Kosfeld results have been succeeded by 1.25 studies’ worth successful replications and 3.75 studies’ worth of failures to replicate. Here’s the box score for the replications:

 

Replication

Outcome

1

Baumgartner

2

Mikolajczak

3

Barraza

4

Klackl

5

Ebert

Total

Success

.50

.75

0

0

0

1.25

Failure

.50

.25

1.0

1.0

1.0

3.75

With the relevant post-Kosfeld data favoring failures to replicate by 3:1, I think a dispassionate reader is justified in not believing that OT increases trusting behavior–at least not in the context of the trust game. Should we do a few more studies just to make sure? Fine by me, but it seems to me that we, as a field, should have some sort of stop-rule that would tell us when to turn away from this hypothesis entirely–as well, of course, as how much data in support of the hypothesis we would need to justify our acceptance of it. In addition, I’m struck by the fact that no one has ever gotten around to reporting the results of an exact replication of Kosfeld. In light of the Many Labs Projects’ recent successes in identifying experimental results that do and do not replicate, I’d personally be content to believe the results of several (five, perhaps?) large-N, coordinated, pre-registered exact replications of the Kosfeld experiment. But until then, or until new data come in that are relevant to this question, I know what I am going to believe.

By the way, if you don’t like how I scored the studies, I would be curious to know how you would synthesize these results to come to your own conclusion. Also, there could be other data on this topic out there that I have failed to include. If you’ll let me know about them, I’ll get around to incorporating them here and updating my box score accordingly.

References

1.         Kosfeld, M., et al., Oxytocin increases trust in humans. Nature, 2005. 435: p. 673-676.

2.         Baumgartner, T., et al., Oxytocin shapes the neural circuitry of trust and trust adaptation in humans. Neuron, 2008. 58: p. 639-650.

3.         Mikolajczak, M., et al., Oxytocin makes people trusting, not gullible. Psychological Science, 2010. 21: p. 1072-1074.

4.         Barraza, J.A., The physiology of empathy: Linking oxytocin to empathic responding. 2010, Unpublished Doctoral Dissertation, Claremont Graduate University: Claremont, CA.

5.         Klackl, J., et al., Who’s to blame? Oxytocin promotes nonpersonalistic attributions in response to a trust betrayal. Biological Psychology, 2012. 92: p. 387-394.

6.         Ebert, A., et al., Modulation of interpersonal trust in borderline personality disorder by intranasal oxytocin and childhood trauma. Social Neuroscience, 2013. 8: p. 305-313.