Edico Genome Aims at Data Processing Bottleneck in Whole Genome Sequencing
With the arrival of next-generation gene sequencing machines like the Illumina (NASDAQ: ILMN) HiSeq X Ten, medicine has been moving to develop new ways of using genomic data to treat patients. Last month, for example, J. Craig Venter unveiled plans to sequence the entire genome of every patient entering the UC San Diego Moores Cancer Center as an initial goal for his latest startup, Human Longevity Inc.
At the same time, though, it’s becoming clear that generating genomic data for thousands of cancer patients involves working with very large numbers—and that means a wave of new opportunities for innovation are emerging as genomics and Big Data come together. One startup moving to catch this wave is Edico Genome, a San Diego startup founded last year to fix a bottleneck in the way the data being generated by the HiSeq X Ten and other next-generation sequencing machines is processed.
Edico has developed a specialized computer processor for ordering the readout of nucleotides—A, C, T, or G—from short segments of DNA generated by next-generation sequencing technology so they align with a reference genome. It’s a process that genomics specialists refer to as “mapping.”
It is a Big Data problem. The human genome consists of roughly 3.2 billion nucleotide base pairs (made of that four-letter alphabet of DNA) that encode between 20,000 and 25,000 genes. Next-generation sequencing technology cuts the DNA molecule into millions of short segments to “read” the sequence and digitize the results. What comes out is a very large data file that can range from 150 gigabytes to more than 320 gigabytes. An average-size, 200-gigabyte data file would be roughly equivalent to 800 big city phone books—from the days when people used their phone books.
But the data file still consists of millions of segments of DNA that must be mapped to a reference genome. Think of throwing 800 telephone books into a paper shredder, and then trying to reassemble the millions of strips to make sense of the information.
Today, companies like Illumina use clusters of computer servers to map these random DNA segments with a reference genome—a process that typically takes about 20 hours, depending on the number of servers being used. Some companies, such as four-year-old Bina Technologies of Redwood City, CA, have developed new ways to optimize these server clusters to reduce the time, cost, and complexity of matching millions of short DNA sequences so they are in the correct order.
Edico CEO Pieter van Rooyen says his company’s technology can dramatically lower the cost of mapping a genome, and is more accurate and power-efficient than existing technologies. By mounting Edico’s proprietary Dragen processor on a standard computer expansion bus (similar to a graphics processing card) that is dedicated for genomics processing, van Rooyen says Edico’s technology could be installed in any next-generation sequencing machine—and would reduce the time needed to map a genome from 20 hours to 20 minutes.
In contrast to a computer server that operates according to a generalized set of instructions, the Dragen processor has been optimized specifically for genomic sequencing. As a result, van Rooyen says the Dragen processor can process in a single clock cycle what takes a server 4,000 clock cycles to accomplish.
“We can more than handle the data,” van Rooyen says. “This truly takes [genome sequencing] from a long process to more of a push-button sequencing, and helps address all the other issues surrounding genome sequencing, like IT staffing and data storage.”
Van Rooyen says the underlying innovation of Edico’s technology is in the way the company implemented the genome-mapping algorithm, incorporating data compression techniques into a Field Programmable Gate Array (FPGA), a processor that is configured for specialized use after it is manufactured.
“It’s not a hard problem in a sense, but it is a large problem,” said Tim Hunkapiller, a Seattle-based genomics expert who reviewed Edico’s chip design and technology. “They took a hard-core approach. It’s a really highly efficient implementation of the mapping algorithm… The secret sauce, if nothing else, is that they just did a really good job.”
Hunkapiller is a consultant for Carlsbad, CA-based Life Technologies, now part of Thermo Fisher. According to van Rooyen, he reviewed the Edicos technology on behalf of a high-profile investor who was considering backing the company.
Van Rooyen, a South African with a doctorate in electrical engineering, says he came up with the idea for Edico’s technology years ago, when he was working in South Africa as a strategic consultant in mobile health. He was searching for technologies that could help remote clinics diagnose tuberculosis. In assessing the feasibility of gene sequencing, he realized there was a bottleneck in processing the genomic data.
“The idea was to build a processor that could replace all the servers,” van Rooyen said.
He returned to San Diego to start Edico Genome in early 2013 with co-founders Robert McMillen and Michael Ruehle, who are both systems architects with many patents in a variety of computer-related technologies.
Van Rooyen, who is named on more than 110 patent filings, has lived and worked in San Diego for over a decade. He was previously a co-founder of San Diego-based ecoATM, the mobile device recycling startup acquired for $350 million in 2013 by Outerwall (NASDAQ: OUTR), and San Diego’s Zyray Wireless, a fabless provider of wireless processors that was acquired by Broadcom (NASDAQ: BRCM) for $100 million in 2004.
The co-founders provided Edico’s initial funding, and later raised some additional funding from friends and family and Qualcomm Labs. The company has adopted a lean business model, and has shaved some operating costs after it was admitted to EvoNexus, the free incubator operated by San Diego’s CommNexus industry group.
Still, FPGA processors can be relatively expensive, and one expert has questioned whether Edico’s technology can be commercialized at an affordable price. But Hunkapiller said cost is unlikely to be much of a factor for Edico, in part because the company already is moving its design to a standard ASIC (Application Specific Integrated Circuit) processor, which typically can be mass-produced at low cost.
“There are other appliances out there [to accelerate genomic data processing] Hunkapiller said. “But nobody has been able to get significant market share. They add speed, but they do not address the cost issue.”
Van Rooyen’s commercialization plan calls for mounting the Dragen processor on cards that have been customized to work with specific genome sequencing machines, such as Illumina’s HiSeq X Ten. The company has working prototypes, and van Rooyen says, “We already have alpha customers signed up.”
By using Edico’s technology, van Rooyen says a facility using Illumina’s HiSeq X Ten machines to sequence 150 human genomes every three days would be able to save $6 million over a four-year period.
That would make it easier for hospitals and other healthcare providers to use genome sequencing to better diagnose heart disease, inflammatory disease, prenatal disease, and other conditions, van Rooyen said. “What we enable are the resources for clinical genomics,” he said. “It’s a lot easier for companies to test for all these things, because it would be a lot cheaper.”