Big data for the Biologist, Scientist and Clinician
Biologists, scientists and clinicians might tend to ignore all the hype and discussion around “big data’ assuming that it does not concern their profession and is of no importance in their work. In fact, the conversations about ‘predictive analysis’ in the context of consumer behavior on the web, mobile device, and retail might reinforce that thinking. However, while this consumer transactional data is commonly unstructured in millions, perhaps billions of records in the 100’s of bytes each, it is no where close to the ~350 gigabytes in raw form for the full genome of a single person. It is no surprise then, that at a recent Intel webinar conference, focused on discussing the challenges, solutions, and best practices of big data, NextBio’s expertise in using big data technologies for genomic data was showcased alongside PayPal and Forester Research’s ability to work with consumer data. Read more…
Genomics in Oncology
A newly diagnosed cancer patient and their family might have barely moved from the first stage to the next of the five stages of grief but, regardless of the course of their emotional journey, their cancer vocabulary begins to grow from day one. Biopsy, staging, chemo, neutropenia,… words that had meant nothing to them before, take on a very tangible meaning invoking thoughts of hospital beds and days spent being sick. As the battle with cancer continuous, their vocabulary continues to evolve as well. Remission, transplant, … relapse, metastasis, morphine, …hospice; relief alternating with despair, a roller coaster ride with too many ups and downs.
In the last few years, a newer set of words are beginning to make their way into the cancer lexicon- whole genome sequencing (WGS), targeted therapies, biomarkers… words that are becoming associated with some recent successes and cautious optimism as we relentlessly search for a cure to cancer. These partial successes, the understanding that cancer is a genetic disease, and the decreasing cost of whole genome sequencing raise important questions about making tumor sequencing an integral part of cancer treatment.
Central to this discussion are several different scientific and social issues. On the scientific side, intratumor heterogeneity, challenges with data interpretation and management, and physician training in genomics dominate the conversation. In the social area, cost and insurance coverage, and ethical issues remain center stage.
Genomics, HIPAA and Informed Consent
Andy Warhol talked about the fifteen minutes of fame of the future in 1968 and this quote captured everyone’s attention coming back again and again in a variety of forms and fashions. Those of us that work in health care and genomics can be just as captivated by the value of anonymity and privacy.
Interns at NextBio learn to set the stage for a unique kind of data exploration
At NextBio, genomic data snakes through the hands of scientific teams and the automated pipelines they design, connecting people intellectually and socially. Each department is responsible for their own piece of the NextBio puzzle as well as helping new team members cultivate their skills.
The data curation team at NextBio mimics the work of a heart, channeling in the public genomic data that’s essential to the NextBio platform. Beatrice Chiu, who graduated from the Molecular and Cell biology program at UC Berkeley, began her journey at NextBio as a Web Product intern, conducting usability tests to optimize the NextBio user interface. She switched over to the curation team earlier this year to help with a large scale GWAS (genome-wide association study) tagging project. As Beatrice explains, “All studies in the NextBio database usually have a minimum of a biodesign tag, like disease vs. normal, response to a drug, etc. and then a more specific phenotype tag, say for a disease. GWAS studies can also get classified using case-control or other association tags.”
Back for more excitement? We hope you’ve been following us closely, because this blog now has contests (with Rules, even), prizes and now, a quiz!
All these answers are on the blog. Think you can find them? Take our quiz and find out! All participants’ names will be entered in a drawing for one of three gift cards from Starbucks, Amazon or iTunes (your choice!). Entries must be received by August 19, 2011 to be eligible for prizes.
“Because of the way data is scaling up, we need to build systems focusing on high-throughput computational tools that are also biologically relevant.”
Mimicking the current data explosion in biology, a molecular biologist who began his career working on a single bacterial gene now handles terabytes of data from genomes, gene expression, and much more on a daily basis. After ten years in biotechnology research with Affymetrix, Venugopal Valmeekam moved to NextBio’s Biocomputing team, where members work to develop pipelines to handle curated data.
Next in our series on data analysis and life at NextBio, read on to find out more about what the Biocomputing team does!
NB: What is your background and how did you start working in this field?
I’m actually a biologist originally. I finished a Ph.D in Molecular Biology and worked at Cold Spring Harbor as a post-doctoral researcher after I graduated. Then I moved to the position of resident scientist at Affymetrix, where they were looking for someone with a biology background who wasn’t afraid of computers to build applications to analyze genomic data.
“We have to bring “genome-drug” interactions to (physicians’) attention just as we currently bring “drug-drug” interactions to their attention.”
Adverse drug events account for over 700,000 deaths each year, and nearly 30% of these are attributed to interactions of drug combinations. Public databases curate hundreds of thousands of gene variants linked to disease risks every year. Mining these diverse sources could help us learn how genetic variations, drug targets and clinical parameters come together to influence human health. Using computational tools to utilize this wealth of scientific data effectively is something we’ve discussed on the blog earlier as well.
Beginning at the “intersection of molecular biology and medical informatics” over ten years ago, Russ Altman is the founder of PharmGKB (PharmacoGenomics Knowledge Base), a database that curates and disseminates information about gene-drug-disease relationships. The professor of bio-engineering, genetics and medicine at Stanford University is also on the Scientific Advisory Board at NextBio, and spoke to us about genomics and the future of medicine.
“The interesting thing with biological data is that using new [software] technologies makes such a difference to what you can do with the data.”
Programming at NextBio could mean using software tools named after toy elephants or occasionally bribing the Curation team with chocolate. But working behind the scenes is still serious business. As Dan Grammas, Senior Software Engineer says, “I’m not just working to protect someone’s computer from a virus. The work we do here is relevant to people’s lives- researchers, clinicians, patients.”
Curation scientists keep track of all the data that’s published and import it to the NextBio pipeline. Software engineers process curated data, sorting and validating it so results can be accurately scored and categorized. Dan’s been programming for several decades now, and now develops APIs and pipelines to validate data imported into NextBio. Here’s what he has to say about where data goes when it ‘vanishes’ behind a progress bar that says “Processing”.
“In biology, the ability to create information has increased tremendously, faster than the traditional journal system gives the ability to propagate, review, endorse, and remix.”
The Web is an information source to most of us. But it’s also a dynamic, interactive medium, fluid as much in its substance as in its focus. In some ways, the same could be said of scientific data and the trajectory of research, especially in bioinformatics and genomics. As a constantly growing information repository and source, genomic data is constantly re-interpreted to increase our understanding of disease risks, pharmacogenomics, personalized medicine, and much more.
Sepandar Kamvar is no stranger to large amounts of confusing data. After all, he co-authored the book “We Feel Fine: An Almanac of Human Emotion”. Previous head of personalization at Google and currently on the technical advisory board of organizations as diverse as Etsy and NextBio, the assistant professor of Computational and Mathematical Engineering at Stanford University spoke with NextBio about the future of scientific information exchange.
“Everybody knows about genes that are studied. How do you find information about genes that aren’t? Where do you even start looking?”
As a graduate student, I was always amazed by—and a little skeptical of—any software that promised to help my data woes. I’m still curious to know what goes on behind the scenes, so to speak, when a website manages to take my search terms and raw data and turn them into pretty graphs and new correlations. Watch this space to find out about the “behind the scenes” people and ideas that shape NextBio.
We’re excited to bring you “Life @ NextBio”, a series which spotlights our curators, engineers, advisors, and others as they talk about life and work at NextBio. This week, meet Aisha Furqan, associate scientist in Curation, recent graduate from the Biological Sciences Department at Cal Poly Pomona, and enthusiastic NextBio user.