The Reproducibility Initiative: A Good Idea in Theory that Won’t Work in Practice
The failure of scientists to independently confirm much of the data contained in “hot” academic publications is casting a long shadow over the biopharmaceutical industry. Research groups at Amgen and Bayer reported that the data in a significant percentage of published “breakthrough” papers from academic scientists could not be confirmed in their labs. Given that Big Pharma has increasingly turned to academic investigators as a source of molecular targets for new drugs, this represents a Big Problem. I have argued that this lack of experimental reproducibility represents an especially acute problem for virtual biotech companies, who lean on contractors to do most of their R&D, and have no internal lab facilities in which they can try to replicate the data. A new proposal, the Reproducibility Initiative, has recently been established to create a pathway for verifying experimental data outside of the lab that generated it. Science Exchange, a for-profit online marketplace of laboratory services, is coordinating this new initiative in partnership with the open access journal PLoS ONE.
The basic concept behind the Science Exchange marketplace is that it enables scientists to hire service providers to perform experiments that are beyond the capabilities of their own labs. The Reproducibility Initiative is layered on top of this marketplace, with the goal of addressing this irreproducibility problem. Researchers sign up with the Reproducibility Initiative to have their work replicated. An advisory panel finds an outside lab group to perform the studies (which are done anonymously), and the results are then shared with the investigators. Researchers who wish to avail themselves of the Initiative must pay for the confirmation work to be done. In addition, they must also pay a 5 percent transaction fee to Science Exchange for tapping into their network.
While the goal of this new initiative is admirable (and likely profitable for the Science Exchange), I don’t think the approach will work in the real world. Here’s why this initiative will not pan out, despite its good intentions:
Running an academic research lab is expensive, and grants that support these efforts are hard to get. As a result, budgets are very tight, focused on the “must have” items and not the “would be nice” ones. The Principal Investigator in a lab has to find a way to pay him or her self as well as their post-docs, grad students, technicians, and other personnel. They may need to buy very expensive equipment (or contribute towards the expense of a core facility, like an electron microscope or nuclear magnetic resonance machine), and to invest in a costly stream of consumable reagents (e.g. cell culture dishes, chemical supplies, growth media for cells).
This doesn’t leave a lot of leftover money to pay someone else to replicate your studies. Elizabeth Iorns, the CEO of Science Exchange, has said she is hopeful that granting agencies will eventually fund these replication efforts. This is extremely difficult to picture in the current economic climate, where there are serious concerns that the NIH budget will get significantly whacked in the upcoming sequestration process. Even without sequestration, it is hard to fathom that funding agencies would decide to significantly cut the number of grants they award in order to spend the money essentially replicating already obtained results. Iorns estimates that the replication studies might cost only 10 percent as much as the original studies. I have no idea how this number (which sounds way too low to me) was generated, or how these studies could be done so inexpensively by another lab.
Beyond the obvious problems with the proposed academic funding situation, another concern is the notion that the original authors have the option to “republish” their results in the journal PLoS ONE, with a link to the original publication. Who will pay for the additional publication costs? Should people wishing to cite the science refer to the original publication, the rehashed confirmatory paper, or both? Since this confirming publication is optional, labs that learn their work was not capable of being replicated are highly likely to not advertise this fact by publishing this finding. Other ethical and practical concerns related to the “replicating” papers have been raised as well by those with a focus on scholarly publications. Finally, would the failure to ask for “reproducibility” funds as part of a grant application be considered tantamount to admitting that the investigator doing the work viewed it as second rate?
Most scientists have a strong regard for their ability to do science. The good ones are trained to test and retest their hypotheses, to look for alternate explanations of their data. They run additional experiments that may prove their ideas wrong, but will enable them to refine and improve their theories. I’ve known a lot of these people, and I don’t see them as having a strong desire to pay an anonymous commercial lab to validate their work, especially if the cost for this is coming out of their grants. I doubt they would value a certificate from the Reproducibility Initiative as being worth all the time, effort, and money involved in acquiring it. Would earning this certificate really make their data more trustworthy in the eyes of other scientists in their field? And if the commercial lab can’t reproduce their findings, will they think that the problem lies with the people they work with every day, or with anonymous folks that they have never met?
Scientists know that if their work is truly important, other investigators in their field will try to replicate and expand on their discoveries. Scientists with little confidence in their own abilities aren’t likely to be doing cutting edge research, and therefore also wouldn’t be strong candidates to test out the Reproducibility Initiative. It is clear that scientists of all types need to properly supervise the people in their labs who are actually performing the experiments. Lab notebooks need to be reviewed with a critical eye, and junior scientists need to be carefully trained to avoid both deliberate and inadvertent bias in running their experiments. This training process should be a key part of virtually every academic investigator’s job description.
I would imagine that many academic scientists would consider it a considerable waste of time to (1) wait while the Reproducibility Initiative’s advisory board searches for a suitable laboratory to rerun their key experiments, and (2) then wait until said experiments have been performed and the results reported. Furthermore, since the testing is done anonymously, there will be serious concerns about the competence of those tasked to rerun the experiments. Scientists are used to anonymity in dealing with the reviewers of their manuscripts, but in that case they can always appeal a bad review to the editor. How can scientists appeal the results if they are told that the anonymous lab could not reproduce their data? Such concerns will distract scientists from planning the next round of experiments, writing future grant applications, keeping current with the scientific literature, and any number of other scholarly activities. Should they put their follow-on studies on hold, waiting for confirmation of their data?
Failure to have their data replicated could mean that their original data was indeed incorrect. However, another strong possibility is that there were simply some technical problems in running the studies. Does this mean that they should now ask the advisory panel to find a second commercial lab to attempt to reproduce their original findings, adding further to their costs? If the second lab can replicate the data, does this mean that the academic investigator should seek a refund on fees paid to the first lab? Finally, how would funding the replication studies take place when the original work is a collaborative effort between three different laboratories with disparate levels of support? The bottom line: engaging an unidentified outside lab to try to reproduce original research findings could easily lead to a lengthy, expensive, and wasteful trip down the rat hole.
Some types of data are easy to replicate, whereas others are much more difficult to reproduce. Cutting-edge experiments may be harder to replicate than “average” data, because, as one of my profs used to tell me “if it was easy, someone else would have already done it”. If you’ve worked in a lab for any length of time, you know that when trying to reproduce someone else’s results, the devil is in the details. Sometimes using a different brand of test tube is all it takes to turn a positive result into a negative one. Let’s imagine trying to replicate the following experiment: the injection of Protein X into mice with breast cancer caused their tumors to shrink. Let’s focus on just a single reagent (Protein X) needed to reproduce these findings to illustrate what could go wrong:
Obtaining a sufficient supply of Protein X to reproduce the data may be problematic. Experiments done in animals usually need a fairly large amount of protein, and the purity of the material is also quite important. Is the Confirmation Lab going to produce their own supply of Protein X, or is going to be supplied by the Innovator Lab (i.e. the group that produced the original data)? Producing sufficient amounts of a bioactive, purified recombinant protein for injection into numerous mice for days or weeks is not a trivial (or inexpensive) undertaking. Suppose the Innovator lab simply supplies Protein X to the Confirmation Lab to make it easier to do the biology experiments. This would reduce the time and expense of doing the replication study, but it would be a bad idea scientifically. Suppose the Innovator’s vial of Protein X has been contaminated with another protein that has anti-cancer properties (it happens). If the Confirmation Lab does not test for contamination, then they could easily get the same result as the Innovators, but come to the same wrong conclusion: that Protein X has a specific anti-cancer activity. Suppose a pure, truly active batch of Protein X is shipped by the Innovator lab, but becomes inactivated during shipment, or while sitting on the lab bench at the Confirming Lab (this too happens). This would lead to a false negative result. Probably the best idea would be for the Confirmation Lab to make their own supply of Protein X, but this potentially creates new problems. If the original Innovator experiment is not reproduced, how do we know that the reason isn’t because the Confirming Lab produced an inactive batch of Protein X? This would then lead to even more experiments when the Confirming Lab shipped their batch of Protein X back to the Innovators for retesting of the original result.
Other elements of the experiment could also result in false positive results or false negatives. There could be genetic differences between the mice used in the two labs, or with the tumor cells, which are notoriously susceptible to genetic drift.
Suppose the Confirming Lab actually manages to replicate the original anti-tumor results. Does this mean that the Protein X actually works as advertised as an anti-cancer agent? Actually, it does not. The result can be due to some indirect effect that is unrelated to the current hypothesis. Let me share an example of this from my own experience.
I was tasked as a graduate student with screening a variety of compounds in mice to see if any of them might have anti-estrogenic activities (that is, they oppose the action of female hormones). I dove into the experiments and quickly identified a molecule as having the desired effect. The result was extremely reproducible and the purity of the test compound was beyond question. Rather than take the results at face value, however, I undertook a search for alternative hypotheses that would prove my result false. While examining the treated mice, I noticed that they were sleeping much more than mice of their age usually do. Delving further, I realized that the drug that I was injecting was a known sedative, so this made sense. And because the animals were sleeping most of the day, I guessed that they were not eating or drinking as much as normal. I put the mice on a scale and found that they had lost a significant amount of weight during the brief treatment period. Perhaps the apparent anti-estrogenic effect of the molecule was actually a result of the fact that the mice were simply not taking in enough food and water? I tested this idea by seeing if simply restricting the food and water intake of untreated mice would have the same anti-estrogenic effect I observed when injecting the test compound. This did, in fact, turn out to be the case. The net result: the drug was not anti-estrogenic; it only appeared to be via an indirect side effect. We published our data, though negative, as a cautionary tale for other researchers.
Sometimes it takes awhile to see your biological data confirmed, and other times it’s quick. I published my first science paper in 1980 detailing genetic variations in an enzymatic activity isolated from different types of mice. It was a solid project for a young graduate student, but the publication had little scientific impact at the time. Over the years I occasionally searched the scientific literature to see if anyone had uncovered a molecular basis for the genetic differences I had measured. Finally, in 2000, another lab followed up on my old paper and identified the single amino acid change that was responsible for the different enzymatic profiles I had uncovered. I was happy to see that my data were confirmed, albeit 20 years after my original publication. This time frame upset neither me, nor, I would guess, the rest of the scientific community. In contrast, the data in two high-profile papers of mine from 1990 and 1993 were already confirmed by the time they appeared in print.
Biology is amazingly complex, and uncovering its molecular underpinnings can be an extraordinarily difficult undertaking. The inability to reproduce scientific experiments may result from the data being truly wrong, or may simply reflect some technical complexity that has not been recognized. I know of no simple way to resolve this issue, and I don’t believe the Reproducibility Initiative, as structured, is likely to achieve this goal.
Here’s a better idea: ask industry to pay for replication studies in fields of interest to them. After all, it was industry groups that raised the alarm about the irreproducibility issue, would be among the primary beneficiaries of these efforts, and have the financial wherewithal to tackle the problem. This could be done independently of the academic groups that originally did the work. Fortuitously, 10 Big Pharma companies have just joined forces to form a new non-profit entity call TransCelerate, whose primary mission is “to solve common drug development challenges”. Reproducibility of academic data is an obvious place for these companies to start. Venture capitalist Bruce Booth recently pointed out that Big BioPharma companies are sitting on a big pile of cash with few good ideas of how to spend it. As a result, they have committed some $75B to stock buybacks in the past two years. Surely they could direct a little money from the petty cash box to tackle this irreproducibility issue? Whether industry takes on this challenge or not, I expect that the most important science will continue to be assessed by independent researchers trying to replicate and expand on discoveries old and new.