[Editor’s note: this post first appeared on the FasterCures blog.]
Bringing the ideas of “open source” into the pharmaceutical process is far from simple. It requires a careful understanding both of the realities of open source as a software development process well as the realities of therapy research, development, and regulatory approval.
The open source metaphor holds enormous power for reaching our goal of faster cures. The key is to understand when and where in the process the open source metaphor can immediately “port” over, and to understand where we need to take the ideas behind open source – distribution, peer production, low transaction costs, and political freedoms – and do some translational research of our own to bring them into local context. Because that’s the only way we’ll realize the massive potential of a transition to open systems in drug discovery.
First, let’s look at open source in software. Software is a human construct, built in languages designed for its creation, and governed by a harmonized and powerful international copyright regime (the drawbacks of this regime are widely known and discussed elsewhere, but are out of scope for this context). Developers have spent decades embedding abstraction and modularization into software development. And the tools of software development are widely available at low or zero cost: a computer, an internet connection, and the willingness to learn to write code.
This is the foundation for what most people mean when they think about “open source” – a loose collection of individuals, connected by technology, coming together for a variety of reasons to collectively create a product that is larger than the sum of its parts, distributed through computer networks at costs far lower than traditional commercial products. These products emerge in technical frameworks that track and reward the small edits and changes that improve software over time, and allow collective governance of projects without a centralized command-and-control system.
And most importantly, these products contain within themselves political freedoms: freedom to contribute, to change, to edit, to distribute, to reuse. The freedoms are embedded in copyright licenses that pass rights on from person to person, that travel with the documents that contain the software code. It is a remarkable thing, open source. It would have sounded insane in the 1970s.
And we’ve seen open source move from software to culture. The most obvious example is Wikipedia. Despite all the obvious reasons why no one would ever trust an online dictionary edited by pseudonymous users, one that contains entries on topics no academic would deem worthy, Wikipedia not only exists but has been empirically demonstrated to be as accurate as the best traditional peer-reviewed encyclopedias (this study was disputed at the time by Encylopedia Britannica, which has ironically signaled a move to a peer production model itself).
So it’s no surprise that the vision of a loose collection of individuals coming together to discover cures, connected by technology, empowered by technology, is having its moment in the sun for drug discovery. But the discovery process is a different animal than the software development process. A simple mapping of open source doesn’t exist.
The final product is not a modular piece of software, but a chemical entity, or a device. It must be manufactured at a certain level of quality, and cannot be distributed at zero marginal cost. The legal regimes are those of trade secret and patent, not copyright. And since drugs and devices cannot – yet – be tested in silico, there is a major human component not present in software – the humans who courageously volunteer for study, and their political rights.
So given this reality, where in the process can we make a rapid transition to open source approaches?
We start with the knowledge construction space around human biology. Knowledge is closer to code than anything else in the pharmaceutical value chain: it can be captured digitally, transmitted at zero marginal cost, and it’s not something that payors will reimburse as part of care. There is a massive public investment in the creation of knowledge about health and biology, in the form of data and scholarly papers. And there is momentum at state, federal, international, and institutional levels to expose knowledge for open reuse and recombination (despite some objections and lobbying efforts by knowledge brokers like publishers and scholarly societies).
Perhaps most telling, there is movement from within the industry itself to move towards an open source approach to biological knowledge. In the past five years, three distinct projects have been initiated from within the world’s largest pharmaceutical companies – a group not known for aggressive pro-sharing stances – to create pre-competitive spaces for data sharing and analysis.
1. First, in 2009, Merck spun out the Rosetta Inpharmatics unit into a non-profit organization called Sage Bionetworks (disclosure – I serve on the management team). Sage Bionetworks is focused on the platforms and services required for distributed knowledge creation. Sage distributes not only the knowledge modeling processes built inside of Merck, but technology platforms that allow teams of geographically dispersed scientists to collectively analyze data, that allow the tracking of individual contributions to complex projects, and that allow patients to engage directly in the research process.
Sage Bionetworks’ Synapse platform is the driver for internal research teams publishing more than a paper per month for more than four years as well as theCancer Genome Atlas Pan Cancer Consortium (18 papers in press or published) and the DREAM computational challenges. This is validation that the analysis of data doesn’t need a large company’s walls and support systems. It demonstrates that tasks can be broken into modules, contributions can be tracked and rewarded, and that the outcomes can be integrated into the larger systems of scientific knowledge distribution. All of these are key proof points in the advance of open source methods in the life sciences.
2. A second example is the release of transMart by Johnson & Johnson and Recombinant Data Corporation. tranSMART is an open source knowledge management platform that combines a data warehouse with access to federated sources of open and commercial databases with a dataset explorer that integrates and extends the open source i2b2 application, Lucene text indexing, and GenePattern analytical tools.
tranSMART also enables investigators to search published literature and other text sources to evaluate their analysis in context, and data in the platform is aligned to allow identification and analysis of associations between phenotypic and biomarker data, and it is normalized to conform with CDISC and other standards to facilitate search and analysis across different data sources. transMart was used initially by pharmaceutical researchers in Johnson & Johnson’s Centocor R&D division, and the transMart Foundation recently released a major new version of the software, available under an open source license. transMart has had real success penetrating the industrial knowledge management market with 20+ adoptions.
Taken together, Sage’s Synapse and transMart are evidence of the very real emergence of common platforms – which are a pre-condition for the kind of peer production we associate with the open source metaphor.
3. Third, the community awaits the launch of Project DataSphere from the CEO Roundtable on Cancer. Driven by Sanofi scientists, DataSphere promises a universal platform to share oncology clinical trial data sets among researchers, industry, academia, advocacy, and others in a collaborative effort that aims to transform “big data” into novel solutions for cancer patients. Since DataSphere is not yet released, we cannot examine the inner workings of its technology and governance, but early presentations indicate a model more inspired by low transaction costs than other elements of open source: a consortium to manage and broker access to data subject to both trade secret and privacy protection, with technical connections to platforms for collaborative analysis and knowledge management.
I look forward to their innovator presentation at Partnering for Cures in a couple of weeks. They will be among the 30 cross-sector programs to present their approach.
These three projects together represent a sea change in the pre-competitive landscape for pharmaceutical development. But it’s notable that each of them focus on the biology. Whether it’s early stage data like the TCGA, or late-stage data like clinical trials, the data is about targets and bodies – not the lead compounds. This is where we’re likely to see the most movement out of industry, and indeed this level of progress would have been unthinkable just a decade ago at the height of the first genomics bubble. But when three industry titans like Merck, J&J, and Sanofi are driving sharing, it’s fair to say the idea has traction.
In coming posts I’ll examine how non-traditional players, including patient groups and access to knowledge advocates, are fighting to bring open systems to the parts of the discovery process that the industry is resisting: clinical trials, lead development and optimization, and novel financing models.