Recent news articles discuss how NextBio scales technology to handle genomics data
As genomic data makes its way from specialized laboratories into routine healthcare evaluations, it is perhaps appropriate that announcements of the latest sequencers were made at the Consumer Electronics Show in Las Vegas rather than the upcoming Advances in Genome Biology and Technology conference.
New machines from Illumina and Life Technologies only strengthen the idea that 2012 is, in fact, the year of the $1000 genome. Led by a trickle of individual success stories, genome sequencing appears to be on the verge of altering the clinical landscape. While the FDA and regulatory bodies resolve the consumer issues of reimbursement and regulation crucial to patients, here at NextBio we tackle an intermediate problem: Making sense of the data.
As technologies grow simultaneously larger and more cost-effective, platforms that can process the large amounts of data generated into comprehensible information become critical to the success of the $1000 genome in the clinic.
At NextBio, we envision a process of data interpretation that is as accessible and disruptive as the new sequencing technologies that create big data. In a recent article in Biotechniques, CEO Saeid Akhtari discusses the ways we integrate genomic data and patient molecular profiles to convert it into clinically useful information. As he describes NextBio in Andrew Wiecek’s article, “Our system is constantly learning and finding new connections between all these billions of data points.”
A single human genome encodes some of these millions of data points that define an individual, beginning with 3 billion base pairs of DNA sequence. Over this genetic code are layers of footnotes and additional instructions in epigenetic changes, alterations in RNA expression, somatic mutations and copy number variations. Sequencing the DNA information alone results in about 150GB of compressed data, which takes about half a terabyte of storage for processing.
Understanding this data to use it to improve health-care requires processing all this information from an individual, and putting it in the context of a cloud of related research on disease genetics, drugs, clinical trials and GWAS information. The NextBio platform brings all these kinds of biomedical data to a common integrated space. We curate and enrich these data types for relevant information, and then create the ‘billions of correlations’ described in the Biotechniques feature.
We accomplish this by scaling existing technologies to deal with these massive amounts of data, in part by using Hadoop, one of the technologies that powers other ‘big data’ companies like Google, eBay and Amazon. Though it has been used to analyze data to identify trends in shopping habits or seasonal flu maps, NextBio is one of the first to leverage the technology to genomics-based solutions for clinical advances. Read more about how we scale techniques to solve big data challenges in the full article by Todd Weiss at ComputerWorld.