Scientific Reproducibility: Raising the Standards for Biomedicine
The vast majority of findings published in high-profile biomedical research publications can’t be reproduced by independent laboratories. Multiple groups have come to this same shocking conclusion in recent years, and it has deservedly generated considerable attention.
It matters because the ability to reproduce a result is fundamental to establishing the legitimacy of a new research finding. These early-stage results provide the foundation for studies that will ultimately be performed in people. Furthermore, the time invested in attempting to confirm an irreproducible result has a real opportunity cost; it distracts scientists and physicians from other more fruitful areas of endeavor.
It is important to realize that this debate, highlighting the deficiencies our scientific process, is taking place openly within the scientific community. This is a sign of the strength of our scientific system.
What constitutes ‘reproducibility’?
Each scientific publication has several component parts that fit together to drive an overall conclusion. This conclusion is typically summarized in one ‘big idea’ that is captured in the title of the scientific paper. While many people think peer review is designed to screen out flawed experiments, that’s not really what it does. Peer reviewers don’t have the time or inclination to repeat the experiments themselves to check accuracy—they mainly just read the manuscripts to make sure the scientists are drawing reasonable conclusions based on the data they’ve gathered.
At one level, a system for checking on scientific reproducibility could demand that each component be reproduced in specific detail. However, this demand seems unrealistic, particularly in biological systems where natural variation is expected. On the other hand, it seems completely reasonable that the ‘big idea’ or major conclusion should withstand close scrutiny. So while there may be considerable variability between individual experiments, the general conclusion should be able to be substantiated.
What has shocked so many in biomedicine the last couple years isn’t that many investigators are unable to reproduce the specific details of their experiments. The shocking part is that many cannot even confirm the ‘big idea’ when their experiments were performed blinded. Even the title of their paper was not confirmed.
Having discovered that investigators themselves could not confirm their own work, a review of the details of individual experiments provided the explanation. Experiments were not performed using what most would regard as standard scientific methodology (blinding researchers to their data during experiments; repeating experiments; reporting all results; use of appropriate controls; avoiding inappropriate data-selection or “cherry picking” data after an experiment is done; appropriate use of statistics).
What if anything, has changed?
The inability to substantiate key research findings is not just a recent phenomenon. It also seems unlikely that this is peculiar to biomedical research.
In fact, over the years, a key aspect of scientific meetings were conversations that took place outside the conference room, where investigators would share privately the high-profile work that no-one else could reproduce. However, the issue has probably been amplified and become more public as a result of the convergence of several factors. This includes the plethora of new ‘top tier’ journals spawned over recent decades: Nature alone now has approximately 20 subsidiary journals carrying the ‘Nature‘ label. This means that more experiments are getting the visibility, and scrutiny, that comes with publication in a “top-tier” journal.
The proximity of modern biomedical research to increasingly informed, vigilant and vocal patient-advocate groups, also serves to bring this into sharp focus. These groups expect research ‘breakthroughs’ to be quickly translated into improved human health, an entirely reasonable expectation given that this is funded by the public purse. Many more studies are being highlighted by media outlets, which, again, increases visibility and scrutiny of the underlying research. The attention provided by the media, although important, is not all new. Even as a practicing clinician over 25 years ago I would dread Monday morning clinics when another ‘breakthrough’ had been announced in newspapers over the weekend, and I would have to explain to patients that XYZ finding was still years away from reaching real-world patient care, if ever.
What is driving this?
It is important to acknowledge that the vast majority of researchers want to see their work translated into something that benefits humankind. Equally, industry researchers, journal editors, investors, governments and certainly patients want to see important discoveries made, and turned into improved treatments. As a result, medical research has delivered tremendous benefits to human health: this represents a wonderful return on investment and is unprecedented in human history. And biomedical research will continue to deliver. But there is room for tangible improvements that will provide further benefit.
The problem we have is not a problem of scientific fraud, as some have suggested. It is not a failure of the scientific method. Rather it is a consequence of the lack of rigorous application of the scientific principles in an intensely and increasingly competitive research environment in which scientists are scrambling to get their share of a national research budget that’s shrinking. It is also the consequence of a system that currently turns a blind-eye to a lack of rigor.
What can be done to improve this?
It is instructive to consider how the clinical-trials process has evolved over the last several decades. While there is still room for improvement, now there is an expectation that good quality clinical trials will be performed by investigators who don’t know which patients are getting the experimental treatment and which are in the control group during the course of the study. These studies have appropriate controls, have a pre-specified hypothesis, ensure data is not excluded at the whim of the investigator after the fact, and utilize rigorous statistical analyses.
This is not the current situation in the majority of preclinical biomedical research. Simply addressing investigator bias in preclinical studies would represent a major step forward. However there appears a general reluctance to incorporate this level of objectivity into experimental design of preclinical research, including among the leading scientific journals.
What is being done, and where will this take us?
Although some scientists are clearly able to self-censor and impose a level of experimental rigor to ensure the robustness of their results, many appear unable or unwilling to impose the same level of scientific self-control. As a result, funding agencies, journals and others are initiating changes to improve data quality. Somewhat disappointingly, although host institutions enjoy the kudos that come when their investigators get results published in top journals, these same institutions have not yet demonstrated a willingness to establish or enforce scientific standards among their investigators. This seems short-sighted as institutions that proactively take the lead will enjoy a distinct advantage as places known for producing high-quality, reproducible research.
It is reasonable to ask what degree of irreproducibility is acceptable as investigators work at the boundary of knowledge and push forward into the unknown? Clearly, we must be prepared to tolerate some level of inaccuracy, in fact it is sometimes the case that a false result gathered in the heat of competition can drive further advances as the error is corrected by other scientists later. But unlike the current reality, it is reasonable to expect that these false results are the exception rather than the rule. The majority of scientific discoveries in the biomedical sphere should be sufficiently rigorous and robust to allow other investigators to build on that work and move the field forward. Hopefully the changes instituted as a result of this debate will help further strengthen our discovery processes.