The Trouble with Oxytocin, Part I:
Does OT Actually Increase Trusting Behavior?

It’s the holiday season, when many people try to clear a little mental space for thoughts about peace on earth and good will toward humanity. In this spirit, I thought I’d inaugurate this blog with a close look at an endocrine hormone that, according to some researchers, can promote trust, generosity, empathy, and, yes, even world peace. I’m referring, of course, to oxytocin (OT).

I’ve been involved with a few research projects on OT over the past few years, mostly in collaboration with my former PhD student Ben Tabak (plus some other colleagues here in Miami), but I’ve made no secret of my concerns about the validity of the techniques that scientists use to measure and manipulate OT experimentally. I also remain unconvinced that intranasally administered OT even makes it into the human brain in the first place. (Many experts think the brain is involved in the control of behavior, so this particular gap in our scientific knowledge seems to me like a problem that OT researchers should be taking a lot more seriously.)

I’ll probably write about these issues in the future, but for now I want to look closely at a much more circumscribed OT-related idea that took the scientific world by storm a few years back. This is the notion that spraying a little OT up people’s noses causes them to become more trusting toward strangers. Let’s look at the initial test of this hypothesis, as well as the evidence that emerged in the wake of the initial experiment, with the goal of estimating the strength of the evidence both for, and against, this charming idea.

 The Kosfield (2005) Experiment

In the very first experiment on oxytocin’s effect on trusting behavior, which bore the definitive title “Oxytocin increases trust in humans” [1], Kosfeld and colleagues randomly assigned 58 healthy men to receive either OT, or an equivalent amount of placebo, via a nasal spray. After the sprays had been given a chance to “kick in” (50 minutes), participants played four rounds (each time with different partners) of the Trust Game—one of the workhorses of experimental economics. The Trust Game is a two-player game in which one player takes on the role of the Investor (these are the subjects whose oxytocin-influenced behavior matters for our purposes here), and the other takes on the role of the Trustee. The Trust Game is hard to describe succinctly, but the Kosfeld paper has a helpful illustration.

Trust Game_Kosfeld

The Trust Game is a two-stage game. In Stage 1, the Investor chooses how much money (in the Kosfeld experiment, either 0, 4, 8, or 12 “monetary units,” or “MU”) from a bolus 12 of MUs (which the experimenter provides) to transfer to an anonymous Trustee. (Participants are told that these MUs will be converted into real cash after the experiment ends.) The experimenters typically triple the transfer on its way to the Trustee. As a consequence, if the Investor sends 4 MU to the Trustee from her bolus of 12 MU (second branch from the left, marked “4”), the Trustee will finish Stage 1 with her original 12 MU, plus the additional 4 MU * 3 = 12 MU that result from the 4-MU transfer from the Investor (after the experimenters multiply that transfer by 3). In contrast, the Investor will be left with 12 – 4 = 8 MU at the end of Stage 1.

In Stage 2, the Trustee is given a choice to send as much or as little of her 24 MU back to the Investor as she wishes. This is called a back-transfer. If the Trustee chooses to send 0 back, she keeps all 24 MU for herself. Anything she does sends back to the Investor gets subtracted from the Trustee’s 24 MUs, and is added to the 8 MU that remained in the Investor’s account at the end of Stage 1. The game is called the trust game under the assumption that people generally like money and prefer to have as much of it as possible. Under this assumption, it does make sense to conceptualize Investors’ choices about how much to send to their Trustees during Stage 1 as measures of their trust that the Trustees will reciprocate during Stage 2.

So, the key question is this: Did OT increase Investors’ Stage 1 transfers in the Kosfeld experiment? That is, did OT increase their trusting behavior? Here’s what the authors wrote: “The investors’ average transfer is 17% higher in the oxytocin group (Mann-Whitney U-test; z = -1.897, P = 0.029, one-sided), and the median transfer in the oxytocin group is 10MU, compared to a median of only 8MU for subjects in the placebo group” (p. 674). The figure below, also from the Kosfeld paper, shows the distribution of transfers for the OT group and the placebo group.


Look at the far right side of the figure: The difference in the percentages of participants in the OT and placebo conditions who transferred all of their MUs (12) to their four Trustees is really quite arresting. The authors summarize this result on p. 647: “Out of the 29 subjects, 13 (45%) in the oxytocin group showed the maximal trust level [that is, they entrusted all of their MUs to their Trustees on all 4 rounds], whereas only 6 of the 29 subjects (21%) in the placebo group showed maximal trust.” Mind you, a statistical purist would likely have winced at the researchers’ use of a one-tailed statistical test—especially since the difference in the distributions for the two groups would not have registered as statistically significant at p < .05 (which signals that the results would be expected less than 5% of the time in a world in which the null hypothesis is true) with a two-tailed test. Nevertheless, just by looking at the figure you can understand why the authors got excited by their data.

The Kosfeld paper has become a citation classic. Google Scholar tells me that it has been cited 1,673 times as of today (by means of comparison, Watson and Crick’s 1953 Nature paper on the structure of DNA, which has also been sort of important for science, has been cited 9,130 times). But is it correct? That is to say, are the Kosfeld findings robust enough to license the conclusion that oxytocin really does increase trust in humans? Allow me to lay out the post-Kosfeld evidence so you can make up your own mind. I have located five post-Kosfeld experiments that examined the effects of intranasal OT on trusting behavior in the trust game, and I restrict my remarks to those experiments only. (I’m ignoring studies on people’ s self-reported trust of strangers, for example, as well as a few other experiments that have used experimental games other than the trust game.) I have scored each of these five replication experiments as either a successful replication or a failure to replicate (or some admixture of success and failure). (Caveat lector: None of these studies is an exact replication of Kosfeld).

The Post-Kosfeld Experiments

Replication 1: Baumgartner et al. (2008) In 2008, Baumgartner and colleagues ran a reasonably close replication of the Kosfeld experiment, though they modified the protocol so participants could play the trust games while their brains were being scanned via fMRI.[2] Forty-nine men, randomly assigned to receive either OT or placebo, played a series of six trust games (interleaved with six other kinds of games, which I’m ignoring) with anonymous partners. At the end of the first six trust games, Investors received the feedback that only 50% of their Trustees had made back-transfers. After this disappointing feedback, the Investors played six new trust games (interleaved with some other games) with six new anonymous partners. The figure below, from the supplemental online materials for the paper, shows the main results.


As you can see on the left side of the figure, OT did not meaningfully increase trust during the first six “Pre-Feedback” rounds. Baumgartner mostly ignored those results, however, and focused instead in their discussion on the right side of the figure: In the six “Post-Feedback” Trust Games, OT participants entrusted significantly more money to their Trustees, on average, than did the placebo participants.

But it seems to me that we, as dispassionate consumers, are ill-advised to discount the lack of OT-vs.-placebo differences on the Pre-Feedback rounds: I myself am going to score them as an unambiguous  “failure to replicate.” Nevertheless, it’s nearly Christmas, and science would stop progressing if we were unwilling to open our minds to new ideas, so I’m happy to score the results from the post-feedback rounds as a “successful replication” of Kosfield. I am going to score Baumgartner, then, as a 50% successful replication and a 50% failure to replicate.

Replication 2: Mikolajczak et al. [3] Mikolajczak and colleagues randomly assigned 60 healthy men to either OT or placebo, and then had them play ten trust games with partners who had been described as “reliable,” and ten with partners who had been described as “unreliable” (and some other trials that aren’t directly relevant here). Men in the OT group entrusted more money, on average, to partners who had been described as “reliable” than did men in the placebo group, although there were was no OT-vs.-placebo difference in the amounts entrusted to partners who had been described as “unreliable.” The results for the “reliable” partners can be interpreted as a reasonably successful replication of Kosfeld, and a good story can be told for why the results for “unreliable” partners are not a failure to replicate Kosfeld, but I’m not sure whether we can just ignore the lack of OT effects for unreliable partners entirely. I am going to score Mikolajczak as a 75% successful replication and a 25% failure to replicate. I admit that this is a hard one to call, though, and other people of good will could come to different conclusions about how to score this study.

Replication 3: Barraza (2010). Jorge Barraza [4] found that 44 healthy men who received OT did not invest more money in four consecutive trust games than did 22 men who received placebo (disclosure: I was an outside reader of Jorge’s dissertation, and co-authored a paper based on some of the results he obtained during that work). I’m calling this one a 100% failure to replicate. Take note that Investors played their four games with a single anonymous partner, with feedback on the back-transfers after each game, which makes this experiment a bit different from the others included here. Even so, it’s a mistake to exclude Barraza if we want to know whether Kosfeld and colleagues were right to claim that “Oxytocin increases trust in humans.”

Replications 4 and 5: Klackl et al. (2012) and Ebert et al. (2013). Only two more to go. Klackl and colleagues performed a fairly close replication of the 2008 Baumgartner paper with 40 healthy men (sans fMRI) and found that participants who received OT did not, on average, send more money to partners during six pre-feedback games, or during six post-feedback games.[5] (This study, therefore, is not only a failure to replicate Kosfeld, but also a failure to replicate Baumgartner.) Finally, Ebert et al. found that 26 people (13 who had been diagnosed with Borderline Personality Disorder and 13 non-diagnosed controls; mostly women) were no more trusting of 20 strangers in a series of trust games following OT administration than they were following administration of a placebo (all 26 participants did OT trials on one occasion, and placebo trials on another occasion, with counterbalancing).[6] On this basis, I’m calling Ebert, too, a 100% failure to replicate.

Summing Up

So, does OT increase trust in humans? The Kosfeld experiment found a faint statistical signal (remember, p = .029, one-tailed) for an effect of OT across a series of trust games with different Trustees, but statistical hard-liners who would insist on a p value less than .05—two-tailed—might reasonably argue that Kosfeld did not even find a phenomenon in need of replication to begin with. That said, the post-feedback rounds from Baumgartner look quite consistent with the claim that OT increases trusting behavior, as do Mikolajczak’s results for “reliable” partners (though I can’t convince myself to call Mikolajczak a 100% successful replication because of the failure to find effects for the “unreliable” partners). On the other hand, the pre-feedback rounds from Baumgartner, and the results from Barraza, Klackl, and Ebert, look to me like out-and-out failures to replicate Kosfeld.  (Plus, I’m going to weight 25% of the Mikolajczak results as a failure to replicate; again, I don’t think we can just ignore the lack of effects for unreliable partners, or pretend that the original Kosfeld hypothesis explicitly entails such a pattern.)

Adding up these scores, then, leads me to conclude that the original Kosfeld results have been succeeded by 1.25 studies’ worth successful replications and 3.75 studies’ worth of failures to replicate. Here’s the box score for the replications:





























With the relevant post-Kosfeld data favoring failures to replicate by 3:1, I think a dispassionate reader is justified in not believing that OT increases trusting behavior–at least not in the context of the trust game. Should we do a few more studies just to make sure? Fine by me, but it seems to me that we, as a field, should have some sort of stop-rule that would tell us when to turn away from this hypothesis entirely–as well, of course, as how much data in support of the hypothesis we would need to justify our acceptance of it. In addition, I’m struck by the fact that no one has ever gotten around to reporting the results of an exact replication of Kosfeld. In light of the Many Labs Projects’ recent successes in identifying experimental results that do and do not replicate, I’d personally be content to believe the results of several (five, perhaps?) large-N, coordinated, pre-registered exact replications of the Kosfeld experiment. But until then, or until new data come in that are relevant to this question, I know what I am going to believe.

By the way, if you don’t like how I scored the studies, I would be curious to know how you would synthesize these results to come to your own conclusion. Also, there could be other data on this topic out there that I have failed to include. If you’ll let me know about them, I’ll get around to incorporating them here and updating my box score accordingly.


1.         Kosfeld, M., et al., Oxytocin increases trust in humans. Nature, 2005. 435: p. 673-676.

2.         Baumgartner, T., et al., Oxytocin shapes the neural circuitry of trust and trust adaptation in humans. Neuron, 2008. 58: p. 639-650.

3.         Mikolajczak, M., et al., Oxytocin makes people trusting, not gullible. Psychological Science, 2010. 21: p. 1072-1074.

4.         Barraza, J.A., The physiology of empathy: Linking oxytocin to empathic responding. 2010, Unpublished Doctoral Dissertation, Claremont Graduate University: Claremont, CA.

5.         Klackl, J., et al., Who’s to blame? Oxytocin promotes nonpersonalistic attributions in response to a trust betrayal. Biological Psychology, 2012. 92: p. 387-394.

6.         Ebert, A., et al., Modulation of interpersonal trust in borderline personality disorder by intranasal oxytocin and childhood trauma. Social Neuroscience, 2013. 8: p. 305-313.


18 thoughts on “The Trouble with Oxytocin, Part I: Does OT Actually Increase Trusting Behavior?

  1. Bill Skaggs (@weskaggs)

    Hi, a few comments regarding methodology. First, a one-tailed test seems appropriate given that the hypothesis didn’t just predict a difference, it predicted the direction of the difference.
    Second, a marginal p value is marginal, and one should not try to invent rules for turning it into a definite yes or definite no.
    Third, introducing a “stop rule” is the worst possible response to this sort of thing. The more mixed-up the evidence, the greater the need for additional evidence. The only viable “stop rule” is that experiments may cease when the understanding is so thoroughly settled that a single additional experiment will not have any impact on it.

    1. mmcculloughmiami Post author

      Bill: Thanks for the thoughts. There are certainly instances when one-tailed p values are appropriate. In the case of the Kosfeld paper, it would have been if the researchers only cared whether OT caused /increases/ in trust and if they were unwilling to permit their data to be informative about the opposite possibility–that OT causes /decreases/ in trust. In social science, where the links between theory and predictions are typically quite loose, it is often difficult to generate a prediction so ironclad that one is justified in ignoring one tail. But would the authors of the Kosfeld paper have ignored the finding that OT seemed to reduce trust? It’s conceivable; if so, their choice to use a one-tailed test was a defensible one.

      Of course you’re right that we throw away valuable information by using p values to generate binary yes/no decisions. That’s always the case, but if there’s anything we’ve learned from the replication crisis in social science, it’s that p < .05 is probably far too liberal to protect us from false discoveries anyway.

      Your point about stop rules strikes a similar chord. Strength of evidence for a given hypothesis is a continuous quantity, but at the end of the day, we need some guidance on what to believe given the data. I think the Bayes factor, which provides a ratio of the likelihood of the alternative hypothesis given the data divided by the likelihood of the null hypothesis given the data, is probably our best way forward in using social science data to draw inferences about how the world works. As the Bayes factor increases, inferences that the alternative hypothesis is correct are strengthened, even though there is still no bright line given that we can use for making real yes/no decisions.

  2. Bill Skaggs (@weskaggs)

    Actually the point I was trying to make is sort of the opposite — not that using p values to generate binary decisions throws away information, but rather that it creates spurious information. A p value of 0.05 is so weak that it usually contains much less than one bit of information. Treating it as a binary threshold artificially increases that to a full bit. It inflates the meaning of the evidence. I personally think it was a mistake for the field to set 0.05 as the working threshold for publishability, and that using 0.01 would give us a substantially more robust literature.

    Regards, Bill

    1. mmcculloughmiami Post author

      I agree with you, Bill. If one goes further and combines the practice of insisting on smaller p values (perhaps even .005 or .001) with the practice of obtaining much larger samples than what we have become accustomed to finding acceptable, the robustness of our literatures will improve. Or, even better, we could do away with NHST entirely and focus on confidence intervals and Bayes factors.

  3. Pingback: Monday Miscellany: ACT, Autism, Anorexia » Gruntled & Hinged

  4. Gidi Nave

    Hi Bill, thank you for this important post!
    We must admit that at the moment we simply cannot tell for sure if OT increases trust, and as we all know, OT was also found to increase many other types if behavior that are not promoting world peace, like out-group hate. Two points that I think should be taken into account in your analysis, which is not a “by the book” meta analysis:
    1. A failure to get a significant p value (p<0.05) is not necessarily a failure to replicate. The results are many times consistent with both the hypothesis and the null. (See for example this recent discussion in the behavioral econ community, published by very serious people: ). In the "failiure" of replication 1, for example, the effect directions seems to be in the right way, and the error bars are pretty big. I did not check the numeric values, but this "null" effect seems to not be able to reject the 17% increase "hypothesis". Regarding studies 3 and 4, I haven't looked at the direction of the interactions, but there could be the same confound — I especially suspect that because of the smaller size of participant groups compared to the Kosfeld paper (approaching a half of the subjects). No one would claim that if we ran this study with 10 subjects alone and fail to get p<0.05 this would be a meaningful evidence, right?

    2. It is important to control for possible other confounding factors, such as general risk attitude, that may increase "trust" behavior. This was properly done in the Kosfeld study, and the paper seems especially convincing because of that. I'm not sure if it was done in the rest of the papers.

    1. Mike McCullough Post author

      Dear Gidi: Thanks for your comment. With only six studies in this area, it might look science-y to do a meta-analysis, but there are so many ways in which the six studies I wrote about differ from the others that it would be difficult to derive truly comparable effect size and standard error estimates. They’re probably not estimating a single population parameter, so I went with a narrative approach to reviewing the studies instead.

      Once you’ve proven that the effect of oxytocin on trust game behavior is different from its effect on general non-social risk-taking (assuming it has any effects on trust game behavior at all), I see no reason to continue to rule out that alternative interpretation in each new study. Once an alternative explanation has been ruled out (killed), we don’t need to keep digging it up over and over so that we can kill it again and again.

  5. Pingback: The Trouble with Oxytocin, Part II: Extracting the Truth from Oxytocin Research | Social Science Evolving

  6. Maria

    “I also remain unconvinced that intranasally administered OT even makes it into the human brain in the first place” – do you have any papers in mind that have examined this?

  7. Pingback: The Trouble with Oxytocin, Part 3: The Noose Tightens for The Oxytocin–>Trust Hypothesis | Social Science Evolving

  8. Moïra Mikolajczak

    Dear Michael, thank you for your post. For the moment, my team and I have not been able to replicate our own results about the impact of OT on trust. We are about to submit a “failed-replication” paper for publication and hope that it will go through…

    1. Mike McCullough Post author

      Thanks for your note, Moïra: It is good that you are trying to publish your replication! It is still hard to get replications published, but really good for the field. Would love to see it when you have a draft that you are comfortable circulating.

  9. Nick Phelps

    Hi Mike,

    Have you written/heard of anything further about OT with regards to the BBB? I’ve spoken to a number of neuropharmacologists who share your doubts about whether intranasal OT can enter the brain. It seems, however, that researchers in this field are hedging their bets A) on the possibility that intranasal administration bypasses the BBB (I believe there is some experimental evidence by which we might buy this proposition as it pertains to other drugs/hormones) and B) that intranasal OT in rats/mice seems to increase brain levels after about an hour. Is there any evidence for or against intranasal OT entering the brain in humans?


    1. Mike McCullough Post author

      Nick–There seems to be some decent evidence that intranasally administered OT makes it into the noses of rats, and some mixed evidence that it makes it into the CSF of macaques, but human noses are quite different from rat noses, and I think the macaque data are not as strong as I’d personally prefer.

      With regard to whether OT makes it into the human central nervous system, I’m aware of this paper:

      which Neuroskeptic has very thoughtfully criticized here:

  10. Pingback: Human Oxytocin Research Gets a Drubbing | Social Science Evolving

  11. Pingback: Probando | manneporte

  12. Pingback: Oxytocin | novaturiente

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s