Amazon’s Cloud Computing Service Sees Opportunity in Genomic Data Overload
The future of biology, if Amazon has its way, will be in the cloud.
The Seattle-based online retailer (NASDAQ: AMZN) has generated buzz the past few years with its foray into cloud computing through Amazon Web Services. This is the model in which customers rent server space on a pay-as-you-go basis, and get access to their data anytime via the Internet. It’s supposed to allow small businesses, governments, and anybody else to save cash and hassles by not having to buy and maintain their own in-house servers. The model is credited with enabling a new generation of lean tech startups to build businesses using far less capital.
Biological researchers haven’t embraced the new model as quickly as their tech brethren, but the cloud computing wave is coming to life sciences, says one of Amazon’s biotech liaisons, Deepak Singh. The trend is coming out of necessity. Gene sequencing has been on a breakneck pace of innovation over the past few years, as instrument makers like San Diego-based Illumina and Carlsbad, CA-based Life Technologies have lowered the cost of sequencing an entire human genome to as little as $10,000. Upstarts like Mountain View, CA-based Complete Genomics seek to sequence entire genomes for as little as $5,000, while a rival, Pacific Biosciences, is aiming to sequence genomes in 15 minutes. Since every human genome has 6 billion chemical units of DNA, this faster and cheaper form of sequencing is creating enormous datasets that somebody will need to store, analyze, compare, and visualize. Without that capability, it’s just a vast pile of data that doesn’t really lead to valuable new insights for medicine.
Computing challenges have become a “serious blocker” to people trying to make sense of the genomic wave, Singh says. And Amazon has made it a high priority over the past couple years to become the company that stores genomic data in a cheaper and more accessible way for researchers. Customers, Singh says, “have started looking at the cloud very seriously as a possible option. Over the last year or so, that curiosity has turned into serious adoption.”
Amazon‘s pay-as-you-go, rented server model has attracted partners and customers all over the country. The Broad Institute of MIT and Harvard in Cambridge, MA, is a user, along with Harvard Medical School. Life Technologies, an instrument maker, and Seattle-based Geospiza, a bioinformatics software company, have a partnership to use Amazon’s servers to store genomic data. Palo Alto, CA-based DNAnexus, an intriguing bioinformatics startup, has built its business model around using Amazon Web Services. And one of the leading evangelists for cloud computing in genomic research is C. Titus Brown, a computer science and microbiology professor at Michigan State University, who is teaching students how to use Amazon Web Services to store the data for their experiments.
Precisely how important this is to Amazon, a company with $24.5 billion in revenue in 2009, is hard to say. In keeping with Amazon’s close-to-the-vest culture, Singh would only offer vague adjectives when I asked for specifics on the number of customers, the percentage of Amazon Web Services business that comes from life sciences customers, the number of employees devoted to this effort, and the size of the market that Amazon ultimately sees for cloud computing services in life sciences.
There are still major barriers to be cleared before this can become a real earnings driver for Amazon. Much sequencing is done at centralized labs around the country that already have invested in expensive servers to store their data in a secured place on campus. So there’s incentive to keep using those tools to get the most value out of them. Some labs are generating so much data from the instruments that they aren’t always sure they have enough bandwidth to transmit it all to Amazon’s servers. The raw experimental data is so precious to a biologist’s career that it can be hard to just send it away to a vendor for safe-keeping, rather than have it under lock and key on campus.
And many researchers struggle with how to analyze, visualize and interpret the data being spit out by the sequencing machines. The software that needs to run on top of Amazon’s storage capacity and databases—the bioinformatics piece of the puzzle—is still a cottage industry with home-made programs, piecemeal open source alternatives, and a lot of researchers still using old-school spreadsheets like Microsoft Excel.
Yet even before companies like DNANexus achieve major market traction with simple and easy-to-use bioinformatics software, many researchers still feel compelled to store the data in anticipation of the day when it will be easier to sift through. And Amazon isn’t the only company wooing them. Microsoft and Google have their own cloud computing services to offer. At least one competitor, Seattle-based Isilon Systems, is making visible inroads in the life sciences market by selling of clustered servers. Isilon now generates 15 to 16 percent of its revenues from life sciences customers, up from 2 percent in early 2008, CEO Sujal Patel said at a recent Xconomy forum. Isilon’s customer roster includes a lot of heavy hitters, like Merck, Genentech, Sanofi-Aventis, Bristol-Myers Squibb, Illumina, Complete Genomics, the Broad Institute of MIT and Harvard, Stanford University, and Johns Hopkins University.
Amazon’s Singh knows this terrain well himself. He got his doctorate in chemistry from Syracuse University, and spent eight years of his career in the biotech industry, including stints at San Diego-based Accelrys and Seattle’s Rosetta Inpharmatics. The past two years, he’s been working as a business development manager for Amazon Web Services, with a particular emphasis on getting to know what the life sciences market wants from cloud computing.
Amazon has done a number of things to ease the transition for customers to cloud-based storage, Singh says. It has worked to obtain public databases and make them available to researchers. One example last month came from three recently completed pilot projects for the 1,000 Genomes Project. Putting that data out there for researchers, and enabling them to share it, has generated a lot of interest. “There’s been a lot of demand for 1,000 genomes pilot data in Amazon Web Services,” Singh says.
Showing the scientific value is one part of the equation, but the business proposition is just as important. Amazon’s case is a pretty simple one. A lab can make a big capital expenditure upfront, but they usually have peaks and valleys of computing power needs. That means the lab isn’t really fully utilizing its server capacity all the time. Plus, the new breed of sequencing instruments are getting so cheap and fast that there’s no way a lab can really anticipate its server capacity needs in the future. Instead of buying expensive equipment and risk getting it maxed out in a year or two, the argument goes, why not lean on Amazon and its basically limitless capacity and flexible pay-as-you-go pricing model?
That may make sense for a lot of labs, but Amazon has found it needs to tailor its product a bit more for life sciences customers. The feedback prompted Amazon to make one curious old-economy concession to help with cloud computing. For some customers who don’t have the bandwidth to efficiently communicate with Amazon’s cloud, the company has set up what it calls an “import/export” service, which allows the lab to save their data on disk and physically ship it to Amazon via FedEx. This helps with some customers who want to know they can get their data to and from Amazon in a reliable way, on a predictable time schedule, without hogging up too much bandwidth on campus.
As much as academic labs might be interested in what Amazon is offering, they can only do as much as their funding agencies really allow. And there are rumblings that the U.S. National Institutes of Health, the world’s primary funding agency for biomedical research, might be facing budget cuts in not so distant future.
Budget cuts at NIH could actually benefit Amazon, Singh says. It could put pressure on labs to be more careful with their capital spending, think a little harder about whether to build their own server clusters, and look more closely at alternatives like cloud computing. Even though the cloud is supposed to be cheaper, Amazon has felt the need to offer customers discounts on price. The company has started offering a cloud infrastructure service with less backup capability for the data, or “redundancy.” That offering is still a more durable backup option than a lab can build on its own, Singh says, and it comes at a lower price than Amazon’s regular cloud offering. The “Reduced Redundancy” service makes sense, he says, for a dataset that can be quickly reproduced (like another sequencing run on a blood sample, if the data is lost, for example).
Most of the interest in Amazon’s offering is from academic labs, rather than with biotech companies and Big Pharma, Singh says. If Amazon can gain a toehold first in academia, it will almost certainly look to continue the momentum in Big Pharma, which spent an estimated $45.8 billion on R&D last year. Big Pharma spends all that money, and is still living with an abysmal success rate, in which only one out of 10 drugs that enters testing ever becomes an approved product.
Sequencing of individual human genomes, and analysis of how they differ, is one of the ways researchers are hoping to someday lower the cost and increase the odds of success in developing new medicines. It’s a long-term trend that Amazon wants to be in position to reap.
“Over the past 12 months, there’s been significant interest. All you have to do is look at conference agendas,” Singh says. “The nature of the conversation has shifted from ‘What should we do?’ to ‘What are we doing, and how should we do it?’”