all the information, none of the junk | biotech • healthcare • life sciences

The Reproducibility Initiative: A Good Idea in Theory that Won’t Work in Practice


(Page 2 of 2)

additional experiments that may prove their ideas wrong, but will enable them to refine and improve their theories. I’ve known a lot of these people, and I don’t see them as having a strong desire to pay an anonymous commercial lab to validate their work, especially if the cost for this is coming out of their grants. I doubt they would value a certificate from the Reproducibility Initiative as being worth all the time, effort, and money involved in acquiring it. Would earning this certificate really make their data more trustworthy in the eyes of other scientists in their field? And if the commercial lab can’t reproduce their findings, will they think that the problem lies with the people they work with every day, or with anonymous folks that they have never met?

Scientists know that if their work is truly important, other investigators in their field will try to replicate and expand on their discoveries. Scientists with little confidence in their own abilities aren’t likely to be doing cutting edge research, and therefore also wouldn’t be strong candidates to test out the Reproducibility Initiative. It is clear that scientists of all types need to properly supervise the people in their labs who are actually performing the experiments. Lab notebooks need to be reviewed with a critical eye, and junior scientists need to be carefully trained to avoid both deliberate and inadvertent bias in running their experiments. This training process should be a key part of virtually every academic investigator’s job description.


I would imagine that many academic scientists would consider it a considerable waste of time to (1) wait while the Reproducibility Initiative’s advisory board searches for a suitable laboratory to rerun their key experiments, and (2) then wait until said experiments have been performed and the results reported. Furthermore, since the testing is done anonymously, there will be serious concerns about the competence of those tasked to rerun the experiments. Scientists are used to anonymity in dealing with the reviewers of their manuscripts, but in that case they can always appeal a bad review to the editor. How can scientists appeal the results if they are told that the anonymous lab could not reproduce their data? Such concerns will distract scientists from planning the next round of experiments, writing future grant applications, keeping current with the scientific literature, and any number of other scholarly activities. Should they put their follow-on studies on hold, waiting for confirmation of their data?

Failure to have their data replicated could mean that their original data was indeed incorrect. However, another strong possibility is that there were simply some technical problems in running the studies. Does this mean that they should now ask the advisory panel to find a second commercial lab to attempt to reproduce their original findings, adding further to their costs? If the second lab can replicate the data, does this mean that the academic investigator should seek a refund on fees paid to the first lab? Finally, how would funding the replication studies take place when the original work is a collaborative effort between three different laboratories with disparate levels of support? The bottom line: engaging an unidentified outside lab to try to reproduce original research findings could easily lead to a lengthy, expensive, and wasteful trip down the rat hole.


Some types of data are easy to replicate, whereas others are much more difficult to reproduce. Cutting-edge experiments may be harder to replicate than “average” data, because, as one of my profs used to tell me “if it was easy, someone else would have already done it”. If you’ve worked in a lab for any length of time, you know that when trying to reproduce someone else’s results, the devil is in the details. Sometimes using a different brand of test tube is all it takes to turn a positive result into a negative one. Let’s imagine trying to replicate the following experiment: the injection of Protein X into mice with breast cancer caused their tumors to shrink. Let’s focus on just a single reagent (Protein X) needed to reproduce these findings to illustrate what could go wrong:

Obtaining a sufficient supply of Protein X to reproduce the data may be problematic. Experiments done in animals usually need a fairly large amount of protein, and the purity of the material is also quite important. Is the Confirmation Lab going to produce their own supply of Protein X, or is going to be supplied by the Innovator Lab (i.e. the group that produced the original data)? Producing sufficient amounts of a bioactive, purified recombinant protein for injection into numerous mice for days or weeks is not a trivial (or inexpensive) undertaking. Suppose the Innovator lab simply supplies Protein X to the Confirmation Lab to make it easier to do the biology experiments. This would reduce the time and expense of doing the replication study, but it would be a bad idea scientifically. Suppose the Innovator’s vial of Protein X has been contaminated with another protein that has anti-cancer properties (it happens). If the Confirmation Lab does not test for contamination, then they could easily get the same result as the Innovators, but come to the same wrong conclusion: that Protein X has a specific anti-cancer activity. Suppose a pure, truly active batch of Protein X is shipped by the Innovator lab, but becomes inactivated during shipment, or while sitting on the lab bench at the Confirming Lab (this too happens). This would lead to a false negative result. Probably the best idea would be for the Confirmation Lab to make their own supply of Protein X, but this potentially creates new problems. If the original Innovator experiment is not reproduced, how do we know that the reason isn’t because the Confirming Lab produced an inactive batch of Protein X? This would then lead to even more experiments when the Confirming Lab shipped their batch of Protein X back to the Innovators for retesting of the original result.

Other elements of the experiment could also result in false positive results or false negatives. There could be genetic differences between the mice used in the two labs, or with the tumor cells, which are notoriously susceptible to genetic drift.

Suppose the Confirming Lab actually manages to replicate the original anti-tumor results. Does this mean that the Protein X actually works as advertised as an anti-cancer agent? Actually, it does not. The result can be due to some indirect effect that is unrelated to the current hypothesis. Let me share an example of this from my own experience.

I was tasked as a graduate student with screening a variety of compounds in mice to see if any of them might have anti-estrogenic activities (that is, they oppose the action of female hormones). I dove into the experiments and quickly identified a molecule as having the desired effect. The result was extremely reproducible and the purity of the test compound was beyond question. Rather than take the results at face value, however, I undertook a search for alternative hypotheses that would prove my result false. While examining the treated mice, I noticed that they were sleeping much more than mice of their age usually do. Delving further, I realized that the drug that I was injecting was a known sedative, so this made sense. And because the animals were sleeping most of the day, I guessed that they were not eating or drinking as much as normal. I put the mice on a scale and found that they had lost a significant amount of weight during the brief treatment period. Perhaps the apparent anti-estrogenic effect of the molecule was actually a result of the fact that the mice were simply not taking in enough food and water? I tested this idea by seeing if simply restricting the food and water intake of untreated mice would have the same anti-estrogenic effect I observed when injecting the test compound. This did, in fact, turn out to be the case. The net result: the drug was not anti-estrogenic; it only appeared to be via an indirect side effect. We published our data, though negative, as a cautionary tale for other researchers.

Final Thoughts

Sometimes it takes awhile to see your biological data confirmed, and other times it’s quick. I published my first science paper in 1980 detailing genetic variations in an enzymatic activity isolated from different types of mice. It was a solid project for a young graduate student, but the publication had little scientific impact at the time. Over the years I occasionally searched the scientific literature to see if anyone had uncovered a molecular basis for the genetic differences I had measured. Finally, in 2000, another lab followed up on my old paper and identified the single amino acid change that was responsible for the different enzymatic profiles I had uncovered. I was happy to see that my data were confirmed, albeit 20 years after my original publication. This time frame upset neither me, nor, I would guess, the rest of the scientific community. In contrast, the data in two high-profile papers of mine from 1990 and 1993 were already confirmed by the time they appeared in print.

Biology is amazingly complex, and uncovering its molecular underpinnings can be an extraordinarily difficult undertaking. The inability to reproduce scientific experiments may result from the data being truly wrong, or may simply reflect some technical complexity that has not been recognized. I know of no simple way to resolve this issue, and I don’t believe the Reproducibility Initiative, as structured, is likely to achieve this goal.

Here’s a better idea: ask industry to pay for replication studies in fields of interest to them. After all, it was industry groups that raised the alarm about the irreproducibility issue, would be among the primary beneficiaries of these efforts, and have the financial wherewithal to tackle the problem. This could be done independently of the academic groups that originally did the work. Fortuitously, 10 Big Pharma companies have just joined forces to form a new non-profit entity call TransCelerate, whose primary mission is “to solve common drug development challenges”. Reproducibility of academic data is an obvious place for these companies to start. Venture capitalist Bruce Booth recently pointed out that Big BioPharma companies are sitting on a big pile of cash with few good ideas of how to spend it. As a result, they have committed some $75B to stock buybacks in the past two years. Surely they could direct a little money from the petty cash box to tackle this irreproducibility issue? Whether industry takes on this challenge or not, I expect that the most important science will continue to be assessed by independent researchers trying to replicate and expand on discoveries old and new.

Single PageCurrently on Page: 1 2 previous page

Stewart Lyman is Owner and Manager of Lyman BioPharma Consulting LLC in Seattle. He provides strategic advice to clients on their research programs, collaboration management issues, as well as preclinical data reviews. Follow @

Trending on Xconomy

By posting a comment, you agree to our terms and conditions.

  • sony2005

    problem with your argument is that it assumes that fraud is only a small part of the problem. Varying degrees of fraud IS the problem and the worse part about it, it is accentuated in the top journals, the so called cutting edge research. The culture in academia is such that unless you don’t get a paper in certain journals with certain frequency you will not get a 1) job, 2) tenure, 3) grants. The peer review system is not honorable because people know their identity will be deducted if for a paper or worse, revealed, if for a grant, as study sections have multiple members. This generates fear of reprisal and for good reason; it happens. In addition, there is widespread cronyism that perpetuates hurding of grants and jobs among those within the right circles. Finally, this problem is accentuated by emerging technologies that lack enough experts for truly comprehensive reviews. Genome wide scans and protein/genome interaction analysis are an example. Many who don’t understand the technology won’t admit it, and are blinded by the flashiness of the data. It is completely broken and pharma would do better to revitalize their own R&D divisions. As to NIH and the basic science, nothing short of a complete revamp of the peer review system, and seismic culture change will help.