First Comes The $1,000 genome, Then Comes The $10,000 Analysis
While everyone in the in the world of sequencing has heard this cynical joke, as an industry, the joke is about to be on us. We have focused our time, energy and investment on the ability to sequence the genome in a day, and have largely ignored the computational power required to extract and analyze the genomic data in a timeframe that keeps it relevant. This disparity creates significant risk of delay to the pace of research and creates a scenario where the joke becomes “the genome in a day but with the year-long analysis.”
The vast majority of academic and core laboratories are not prepared for the massive wave of data that will be coming off of multiple new “Genome-in-a-Day” technology offerings. Most labs are already computationally resource constrained, struggling to just store volumes of data. The average analysis pipeline of basic alignment, consensus calling, variant detection and filtering process takes 14 days, even with a large cluster for most sequencing centers. For most, the exorbitant cost of building more computing infrastructure is not a realistic short term solution and ultimately will not shorten computation time.
The answer that many labs have been dabbling with is going to “the cloud.” Unfortunately, as many researchers have discovered, the cloud is not as welcoming as its white fluffy exterior suggests. At the onset of a project, the learning curve just to start a computing instance is relatively steep. Additionally, there are many misconceptions about what the cloud can actually do to speed up a computation. Many researchers assume that if they put their existing analysis software in “the cloud” and purchase analysis time on 1,000 machines that the computation will automatically scale. Wrong.
As you might imagine, researchers are sorely disappointed with this outcome while at the same struggling in a deluge of data needing to be analyzed. Concurrently, the need for rapid, large scale genomic analysis continues to rise so the industry is at a technical impasse. Instrument producers while perfecting “the genome in a day,” have not offered solutions to solve this analytics crisis which begs the question, why would any lab purchase a genome in a day sequencing system if they can’t analyze the results in a day?
There is a new generation of cloud-based bioinformatics available. To date, the industry has taken a “wait and see” approach to the next generation of technology and now we have a pending bottleneck. It is critical for the industry to share its best practices, identify core investment dollars and support continued aggressive development of bioinformatics software that works in a cloud environment. Without industry support and bioinformatics technology allowing for analysis in a day, lab managers, researchers and sequencing instrument companies are running headlong into a data disaster.