Bad Statistics, and Bad Training, Are Sabotaging Drug Discovery

1/6/14

One of the most widely read college textbooks in the 1960s and ‘70s was How to Lie with Statistics by Darrell Huff. Despite the humorous title, the serious intent of the book (written by a journalist, not a statistician) was to illustrate how common errors in the use of statistics frequently lead to misleading conclusions.

Though popular, it may not have done the job it was intended to do. Almost 50 years later, author John Allen Paulos revisited this subject in his book Innumeracy: Mathematical Illiteracy and Its Consequences. Unfortunately, problems with the proper use of statistics appear to still be a serious and widespread concern. A recent paper by statistician Valen Johnson of Texas A&M University in College Station suggested that one out of every four published scientific studies draw false conclusions because they employ weak statistical standards.

The idea that many researchers simply aren’t running and analyzing their experiments properly appeared in yet another article focused on trying to understand why mouse models often don’t wind up being useful predictors of drug responses in human diseases. A survey of 76 influential animal studies “found that half used five or fewer animals per group”, and many failed to properly randomize mice into control and treated groups. In a similar vein, a recent study compared the data obtained from two research groups who were testing cancer cell lines for their susceptibility to anti-cancer drugs. While some of the drugs gave similar results in both studies, the majority did not.

These findings dovetail with the widely reported observations by researchers at Amgen and Bayer Healthcare, who were unable to reproduce the data in most of the high profile academic papers they tested. Their failure to replicate these experiments left them with a morass of untrustworthy data, and they decided not to move forward with plans to develop new medicines based on this information. This quagmire is territory where Big Pharma doesn’t want to find itself, given its increasing reliance on academia for new drug candidates , along with the widespread downsizing of many of their internal research programs.

Recognizing that this failure to replicate experiments was a serious problem in science, a for-profit group known as the Science Exchange put forth a potential solution known as the Reproducibility Initiative. I’ve argued previously that the Reproducibility Initiative has its heart in the right place, but that it will fail for a number of reasons. These include a lack of grant funding to pay for repeating the experiments as well as a number of other scientific and cultural issues. There is good news to report: a philanthropic organization contributed $1.3M to have 50 high profile cancer biology articles put through the validation wringer; results are expected by the end of 2014. The average cost of $26,000 per article to repeat certain key experiments from these papers is quite high; where funding would come from … Next Page »

Stewart Lyman is Owner and Manager of Lyman BioPharma Consulting LLC in Seattle. He provides strategic advice to clients on their research programs, collaboration management issues, as well as preclinical data reviews. Follow @

Single Page Currently on Page: 1 2

By posting a comment, you agree to our terms and conditions.

  • David Miller

    A good start would be a specific certification/recertification requirement for CMEs specific to biostatistics. The number of practicing clinicians who cannot answer the following question correctly is staggering:

    Two identical trials and patient populations. Which drug is the superior drug for patients?

    Drug A: p-value = 0.001, Hazard Ratio=0.89
    Drug B: p-value = 0.01, Hazard Ratio=0.69

    That’s the easy question. Here’s the one that most everyone screws up, clinicians, researchers, and patients:

    Two identical trials and patient populations. Which drug is the superior drug for patients?

    Drug C: Median survival 2.0 months, Hazard Ratio = 0.49
    Drug D: Median survival 3.5 months, Hazard Ratio = 0.78

    • Anonymous

      I’m gonna say B & C. BTW I have absolutely no experience or background education in this field, but I do find it interesting and I’d like to know the right answer. Thanks, David.

      • david

        You’re right.

        In the first example, people often get confused that the p-value has anything to do with efficacy. It does not. All the p-value tells you is the degree to which the outcome seen in the trial was due to chance.

        The second example highlights median versus the entire population. The median tells you how one patient performed from each arm – the middle patient. That’s useful information, but the Hazard Ratio is a far superior measure as it described how the entire population performed — those who didn’t respond as well and those who responded really well. Especially when looking at interim (immature) data, the median can be pretty deceptive.

        • Anonymous

          Awesome. Thank You!!!

  • http://www.hollyip.com/ Suleman Ali

    When one is looking at statistical significance one can influence the p value a lot by choosing how to define the parameters. For example if you have a reading of 70 versus 65, with 60 in the control group, one can either compare 70 to 65 or you can subtract the control from both values and compare 10 with 5 to improve statistical significance. Statisticians can choose either to cooperate with such approaches or remain objective. However if the result is not significant it won’t get published.

  • jl

    interesting topic. One reason results don’t show up is also the wharped financial models used to pick cherries, and the incredible amount of costs loaded to product development.
    best