Scientific Reproducibility: Raising the Standards for Biomedicine

1/3/14

The vast majority of findings published in high-profile biomedical research publications can’t be reproduced by independent laboratories. Multiple groups have come to this same shocking conclusion in recent years, and it has deservedly generated considerable attention.

It matters because the ability to reproduce a result is fundamental to establishing the legitimacy of a new research finding. These early-stage results provide the foundation for studies that will ultimately be performed in people. Furthermore, the time invested in attempting to confirm an irreproducible result has a real opportunity cost; it distracts scientists and physicians from other more fruitful areas of endeavor.

It is important to realize that this debate, highlighting the deficiencies our scientific process, is taking place openly within the scientific community. This is a sign of the strength of our scientific system.

What constitutes ‘reproducibility’?

Each scientific publication has several component parts that fit together to drive an overall conclusion. This conclusion is typically summarized in one ‘big idea’ that is captured in the title of the scientific paper. While many people think peer review is designed to screen out flawed experiments, that’s not really what it does. Peer reviewers don’t have the time or inclination to repeat the experiments themselves to check accuracy—they mainly just read the manuscripts to make sure the scientists are drawing reasonable conclusions based on the data they’ve gathered.

At one level, a system for checking on scientific reproducibility could demand that each component be reproduced in specific detail. However, this demand seems unrealistic, particularly in biological systems where natural variation is expected. On the other hand, it seems completely reasonable that the ‘big idea’ or major conclusion should withstand close scrutiny. So while there may be considerable variability between individual experiments, the general conclusion should be able to be substantiated.

What has shocked so many in biomedicine the last couple years isn’t that many investigators are unable to reproduce the specific details of their experiments. The shocking part is that many cannot even confirm the ‘big idea’ when their experiments were performed blinded. Even the title of their paper was not confirmed.

Having discovered that investigators themselves could not confirm their own work, a review of the details of individual experiments provided the explanation. Experiments were not performed using what most would regard as standard scientific methodology (blinding researchers to their data during experiments; repeating experiments; reporting all results; use of appropriate controls; avoiding inappropriate data-selection or “cherry picking” data after an experiment is done; appropriate use of statistics).

What if anything, has changed?

The inability to substantiate key research findings is not just a recent phenomenon. It also seems unlikely that this is peculiar to biomedical research.

In fact, over the years, a key aspect of scientific meetings were conversations that took place outside the conference room, where … Next Page »

C. Glenn Begley is Chief Scientific Officer and Senior Vice President for Research and Development at TetraLogic Pharmaceuticals in Malvern, PA. Follow @

Single Page Currently on Page: 1 2

By posting a comment, you agree to our terms and conditions.

  • Ken

    This is a really nicely written and balanced article. I believe a key issue, though, is that of the incentives and disincentives for PIs around the issue of reproducibility and how these are weighed/balanced against one another: (1) prestige; (2) reputation; (3) expense relative to resources. One, perhaps simple-minded, notion would be to assign a score/index that is published along with a paper with respect to its statistical rigor/design. In principle, all papers should at least meet a minimum threshold. However, I would argue that even within the “universe” of accepted papers, some are far more rigorous than others, and there would be a nice, “crisp” incentive if on the title page of the paper there were a “rigor ranking,” e.g. “AAA” “AAB” etc…This separate evaluation would provide: (1) an incentive to be as rigorous as possible and perhaps make it worth using more resources to do so; (2) conversely, spending fewer resources to get a still publishable study, but perhaps with a somewhat lower “rigor score.” I’m sure there are multiple problems with this framework, but perhaps it would at least give a push in the right direction…

  • Hank

    Indeed well written. I would have liked to learn more though about what is actually being done to mitigate the problem and ideas what could be done like the suggestions of in previous comment. For sure the academic institutes need to get their act together as well as boards and SABs of companies.

  • A Suhrbier

    Scientists are pressured to undertake and publish research that’s popular among scientists, not research that is useful in the real world. See link to online article I published recently

    http://www.abc.net.au/science/articles/2013/11/27/3899981.htm

  • K. Francis

    Well written and really brings attention to something we should all be reminded of daily. There is no single solution but perhaps emphasizing the issue at the undergraduate or high school level may help awareness later in one’s career.