This post is a longer-form treatment of the Cap and Trade idea for controlling false positives in science that Dave Kelly and I outlined in our brief letter, which appeared in this week’s issue of Nature. It provides more background and additional details that we simply couldn’t cover in a 250-word letter.
First, the background. For the past several years, as many readers are surely aware, a replication crisis has been roiling the sciences. The problem, quite simply, is that some (probably large) proportion of published scientific findings are false. Many remedies have been proposed for addressing the replication crisis, including (1) system-wide changes in how researchers are trained in statistics and research methods; (2) exhortations to greater statistical and methodological virtue among researchers; (3) higher editorial standards for journal editors and reviewers; and (4) journal reforms that would require more transparency from investigators about hypothesis formulation, research methods, data collection, and data analysis as a condition for publication.
Most of these remedies are sensible, but Nature has suggested here and here that NIH officials have been contemplating an even more radical measure: Some sort of audit system in which independent laboratories would be tasked with trying to reproduce recently published scientific results from particular fields. An audit-based system would have its merits, but a cap and trade system might work even better. Our proposal rests on the idea that false positives are a form of pollution: I call it false positive pollution.
False Positives are Pollution
False positives fit the standard economic definition of pollution: They impose opportunity costs on others when they are emitted into the environment. If all published research findings were correct (i.e., if the false discovery rate were zero), then any single conclusion from any single research paper (“Drug X is safe and effective,” say, or “Cap and trade systems reduce false positives in scientific literatures”) could form the basis for confident decision-making. You could read a published paper and then take action on the basis of its conclusions, knowing that those conclusions reflected true states of the world.
However, the more false positive pollution a literature contains, the more costly, on average, it becomes to make decisions on the basis of any published finding. The recent Tamiflu debacle provides a vivid case study: The reason drug companies, governments, and individuals got so excited about Tamiflu as a treatment for flu was that their decision-making was distorted by irreproducible research results. The Tamiflu misadventure features false positive pollution doing what it does best: imposing costs on others, to the tune of $20 billion in wasted public expenditures (not to mention the harm the drugs might have done to their consumers, and the opportunity costs associated with not pursuing possible alternatives).
Likewise, if a published scientific article led you erroneously to believe that a particular laboratory technique was a good way to manipulate some variable in your research, and then you went on to base your PhD work on that technique—only to find that it did not work for you (because it actually doesn’t work for anybody)—then false positive pollution would have caused you to devote time and resources to hocus-pocus rather than the pursuit of something that could have produced actual scientific knowledge. This is one of the costs of false positive pollution that should really bother graduate students, post-docs, and anyone who cares about their career development: Trainees with just as much scientific promise as any other end up wasting their valuable training time on illusions. False positive pollution sets careers back.
A cap and trade system might be useful for reducing false positive pollution in the same way that cap and trade systems have, over the past 45 years, helped to reduce sulphur dioxide, nitrogen oxide, lead additives in gasoline, and even over-fishing. Below, I outline some of the steps we’d need to undertake as a society to implement a cap and trade system to control false positive pollution.
Step 1: Measuring Existing Levels of False Positive Pollution
The first step forward could be to estimate how much false positive pollution is emitted annually, which would require independent replications of random samples of published findings from the prior year. What we would be trying to estimate is the proportion of published experiments, out of the 1.5 million or so that are published each year worldwide, whose results cannot be independently reproduced even when the original protocols are followed exactly. I rather admire the way this was done in the Many Labs Replication Project: Several lab directors agree on the experimental protocol [ideally in collaboration with the investigator(s) who originally published the study] and then go back to their labs and re-run the experiment. The results from all of their independent attempts to replicate are then statistically aggregated to determine whether the original result is a true positive or a false positive.
Expensive, yes, but don’t let the expense distract you for the moment. Good research takes money, and we’re already hemorraging money through the production of false knowledge (keep the image of those warehouses full of Tamiflu vividly in your mind). Why not invest in trying to understand how much money we’re actually wasting and what we might do about it?
Step 2: Determining An Optimal Level of False Positive Pollution
Once we had an estimate of much false positive pollution is emitted annually, we’d need to figure out how much false positive pollution we’d like to live with. A 100% pollution-free research literature would be nice. So would 100% pollution-free air. However, “100% pollution-free air” is an unrealistic goal. Compliance would be too expensive, and it would come with too many undesirable side effects. Likewise, a research literature that’s 100% free of false positive pollution sounds great, but that’s a goal that cannot be attained without adversely affecting the overall research enterprise. False positives are going to happen—even by scientists who have done their best to avoid them (after all, there is no such thing as a study with 100% statistical power). There must be some amount of false positive pollution we can tolerate.
One way to set an acceptable level of false positive pollution would be to measure the costs and benefits associated with the average false positive emission. How much money is wasted each time a researcher emits an erroneous “finding?” And how much would it cost to prevent such an event? These benefits and costs are likely to vary quite a lot from field to field, so I see good, plentiful work for economists here. In any case, with those data in hand, it should be possible to estimate the optimal amount of false positive pollution that we should be willing to tolerate—that is, the amount that maximizes society-wide benefits relative to costs.
But there’s actually a simpler way to set an acceptable level: Society tacitly endorses the idea that we can live with a 5% false positive pollution rate each time we accept the results of a study in which the p value was set at .05. That’s what p < .05 actually means: “In a world in which the null hypothesis is true, we’d only get results as extreme as those we obtained in this study in 5 out of 100 exact replications.” We could simply make a 5% FPP emissions rate our explicit society-wide ideal.
Step 3: Setting Goals
Once key stakeholders have agreed upon an acceptable annual level, whether that acceptable level is derived by measuring costs and benefits (as outlined above), or by the “5% fiat” approach, an independent regulatory body would be in a position to set goals (with stakeholder input, of course) for reducing the annual FPP emissions rate down to the acceptable level. (In the United States, the regulatory body might be the NIH, the NSF, or some agency that does the regulatory work on behalf of all of the federal agencies that sponsor scientific research; an international regulatory body might resemble the European Union’s Emissions Trading System.)
I’ll illustrate here with a simplified example that assumes a global regulatory agency and a global trading market. Let’s assume that the global production of scientific papers is 1,500,000 papers per year. Now, suppose the goal is to reduce the global false positive emission rate from, say, 50% of all research findings (I use this estimate here merely for argument’s sake; nobody knows what the field-wide FPP emission rate is, though for some journals and sub-fields it could be as high as 80%) to 5%, and we want to accomplish that goal at the rate of 1% per year over a 45-year period. (In our Nature correspondence, space limitations forced Dave and me to envision a move from the current emission levels to 5% emissions in a single year. The scenario I’m presenting here is more complex, but it’s also considerably less draconian.)
Our approach relies on the issuance of false positive pollution (FPP) permits. These permits allow research organizations to emit some false positive pollution, but the number of available permits, and thus, the total amount of pollution emitted annually, is strictly regulated. In Year 1, the Agency would distribute enough FPP permits to cover only 49% of the total global research output (or 1,500,000*.49 = 735,000 false positive permits). The number of permits distributed to each research-producing institution (universities are canonical examples of research-sponsoring institutions, as are drug companies) would be based on each institution’s total research output. Highly productive institutions would get more, and less productive ones would get fewer, but for all institutions, the goal would be to provide them with enough permits to allow a 49% emissions rate in Year 1. After the agency distributes the first year’s supply of FPP permits, it’s up to each individual research-sponsoring institution to determine how it wants to limit its false positive pollution to 49%. In Year 2, the number of permits distributed would go down a little further, a little further in the year after that, and so on until the 5% ideal was reached.
By the way, there are lots of ways to make the distribution process fair to small businesses, independent scientists, and middle-school science fair geniuses (including, for example, exempting small research enterprises and individuals, so long as the absolute value of their contributions to FPP are trivially small) so it’s not fair to dismiss my idea on the basis of such objections. Cap and trade systems can be extremely flexible.
Step 4: Monitoring and Enforcement
Once the FPP permits have been distributed for the year, the regulatory agency would turn to another important task: Monitoring. In the carbon sector, monitoring of individual polluters can be accomplished with electronic sensors at the point of production, so the monitoring can be extremely precise and comprehensive. In the research sector, this level of precision and comprehensiveness would be impossible. We’d have to make do with random samples of research-producing institutions’ research output from the prior year. (Yes; some research studies would be difficult to replicate because the experiment or data set is literally unrepeatable. Complications like these, again, are just details; they don’t render a cap-and-trade approach unworkable by any means). If the estimated FPP emission rate for any research-sponsoring institution substantially exceeded (by some margin of error) the number of FPP permits the institution possessed at the time, the institution would be forced to purchase additional permits from other institutions that had done a better job of getting their FPP emissions under control. If you, as a research institution, could get your FPP emissions rate down to 40% in Year 1, you’d have a bunch of permits available to sell on the market to institutions that hadn’t done enough to get their emissions under control. In a cap and trade system, there is money to be made by institutions that take their own false positive pollution problems seriously.
The Virtues of a Cap and Trade System
Cap and trade systems have many virtues that suit them well to addressing the replication crisis. Here are a few examples:
- Cap and trade systems use shame effectively. On one hand, they enable us to clearly state what is bad about false positives in a way that reduces righteous indignation, shame-based secrecy, and all of the pathologies these moralistic reactions create. On the other hand, were we to make information about institutions’ sales and purchases of false positive permits publicly available, then institutions would face the reputational consequences that would come from being identified publicly as flagrant polluters. Likewise, permit-sellers would come to be known as organizations whose research was relatively trustworthy. These reputational incentives would motivate all institutions—even Big Pharma and universities with fat endowments, which could afford to buy all the excess permits they desired on the open market—to get their emissions problems under control.
- Cap and trade systems don’t rely on appeals to personal restraint, which are subject to public goods dilemmas. (Fewer false positives are good for everyone, of course, but I’m best off if I enjoy the benefits of your abstemiousness while I continue polluting whenever I feel like it.) Cap and trade systems do away with these sorts of free-rider problems.
- Cap and trade systems encourage innovation: Each research-sponsoring institution is free to come up with its own policies for limiting the production of false positives. Inevitably, these innovations will diffuse out to other institutions, increasing cost-effectiveness in the entire sector.
- A cap and trade system would be less chilling to individual investigators than a simple audit-and-sanction system would be because a cap-and-trade system would require institutions, and not just investigators, to share in the compliance burden. Research-sponsoring institutions take the glory for their scientists’ discoveries (and the overhead); they should also share the responsibility for reform.
- Most importantly; cap and trade systems reduce pollution where it is cheapest to do so first. All of the low-hanging fruit will be picked in the first year; and harder-to-implement initiatives will be pursued in the successive years. This means that we could expect tangible progress in getting our problems with false positives under control right away. Audit systems do not possess this very desirable feature.
Wouldn’t a Cap and Trade System Be Expensive?
Elizabeth Iorns estimated that it costs $25,000 to replicate a major pre-clinical experiment that involves in vitro and/or animal work. I don’t know that well-conducted laboratory-based behavioral experiments are that much cheaper (at least, once you’ve factored in the personnel time for running the study, analyzing the data properly, and writing up the paper). So all of those replications goal-setting and monitoring purposes are going to cost a lot of money.
But bear in mind, as I already explained, that false positives are expensive, too—and they produce no societal benefit. In fact, what they produce is harm. It costs as much money to produce a false positive as it does to produce a true positive, but the money devoted to producing a false positive is wasted. (If it’s true that the United States spends around $70 billion/year on basic research, then if even 10% of the resultant findings are false positives (which is almost surely a gross underestimate), then the U.S. alone is using $7 billion dollars per year to buy pollution). Also, Tamiflu. What if we used some of the money we’re currently using to buy pollution to make sure that the rest of our research budget is spent not on the production of more pollution, but instead, on true-positives and true-negatives—that is, results that actually have value to society?
Cap and Trade: Something For Everyone (In Congress)
Here’s the final thing I like about the cap-and-trade idea: It has something for both liberals and conservatives. (I presume that enacting a project this big, which would have such a huge impact on how federal research dollars are spent, would require congressional authorization, and possibly the writing of new laws, but perhaps I am wrong about that). Liberals venerate science as a source of guidance for addressing societal problems, so they should be motivated to champion legislation that helps restore science’s damaged reputation. Conservatives, for their part, like market-based solutions, private sector engagement, and cutting fraud, waste, and abuse, so the idea should appeal to them as well. In a congress as impotent as the 113th U.S. congress has been, can you think of another issue that has as much to offer both sides of the aisle?