Interns at NextBio learn to set the stage for a unique kind of data exploration
At NextBio, genomic data snakes through the hands of scientific teams and the automated pipelines they design, connecting people intellectually and socially. Each department is responsible for their own piece of the NextBio puzzle as well as helping new team members cultivate their skills.
The data curation team at NextBio mimics the work of a heart, channeling in the public genomic data that’s essential to the NextBio platform. Beatrice Chiu, who graduated from the Molecular and Cell biology program at UC Berkeley, began her journey at NextBio as a Web Product intern, conducting usability tests to optimize the NextBio user interface. She switched over to the curation team earlier this year to help with a large scale GWAS (genome-wide association study) tagging project. As Beatrice explains, “All studies in the NextBio database usually have a minimum of a biodesign tag, like disease vs. normal, response to a drug, etc. and then a more specific phenotype tag, say for a disease. GWAS studies can also get classified using case-control or other association tags.”
“Because of the way data is scaling up, we need to build systems focusing on high-throughput computational tools that are also biologically relevant.”
Mimicking the current data explosion in biology, a molecular biologist who began his career working on a single bacterial gene now handles terabytes of data from genomes, gene expression, and much more on a daily basis. After ten years in biotechnology research with Affymetrix, Venugopal Valmeekam moved to NextBio’s Biocomputing team, where members work to develop pipelines to handle curated data.
Next in our series on data analysis and life at NextBio, read on to find out more about what the Biocomputing team does!
NB: What is your background and how did you start working in this field?
I’m actually a biologist originally. I finished a Ph.D in Molecular Biology and worked at Cold Spring Harbor as a post-doctoral researcher after I graduated. Then I moved to the position of resident scientist at Affymetrix, where they were looking for someone with a biology background who wasn’t afraid of computers to build applications to analyze genomic data.
“We have to bring “genome-drug” interactions to (physicians’) attention just as we currently bring “drug-drug” interactions to their attention.”
Adverse drug events account for over 700,000 deaths each year, and nearly 30% of these are attributed to interactions of drug combinations. Public databases curate hundreds of thousands of gene variants linked to disease risks every year. Mining these diverse sources could help us learn how genetic variations, drug targets and clinical parameters come together to influence human health. Using computational tools to utilize this wealth of scientific data effectively is something we’ve discussed on the blog earlier as well.
Beginning at the “intersection of molecular biology and medical informatics” over ten years ago, Russ Altman is the founder of PharmGKB (PharmacoGenomics Knowledge Base), a database that curates and disseminates information about gene-drug-disease relationships. The professor of bio-engineering, genetics and medicine at Stanford University is also on the Scientific Advisory Board at NextBio, and spoke to us about genomics and the future of medicine.
“The interesting thing with biological data is that using new [software] technologies makes such a difference to what you can do with the data.”
Programming at NextBio could mean using software tools named after toy elephants or occasionally bribing the Curation team with chocolate. But working behind the scenes is still serious business. As Dan Grammas, Senior Software Engineer says, “I’m not just working to protect someone’s computer from a virus. The work we do here is relevant to people’s lives- researchers, clinicians, patients.”
Curation scientists keep track of all the data that’s published and import it to the NextBio pipeline. Software engineers process curated data, sorting and validating it so results can be accurately scored and categorized. Dan’s been programming for several decades now, and now develops APIs and pipelines to validate data imported into NextBio. Here’s what he has to say about where data goes when it ‘vanishes’ behind a progress bar that says “Processing”.
“In biology, the ability to create information has increased tremendously, faster than the traditional journal system gives the ability to propagate, review, endorse, and remix.”
The Web is an information source to most of us. But it’s also a dynamic, interactive medium, fluid as much in its substance as in its focus. In some ways, the same could be said of scientific data and the trajectory of research, especially in bioinformatics and genomics. As a constantly growing information repository and source, genomic data is constantly re-interpreted to increase our understanding of disease risks, pharmacogenomics, personalized medicine, and much more.
Sepandar Kamvar is no stranger to large amounts of confusing data. After all, he co-authored the book “We Feel Fine: An Almanac of Human Emotion”. Previous head of personalization at Google and currently on the technical advisory board of organizations as diverse as Etsy and NextBio, the assistant professor of Computational and Mathematical Engineering at Stanford University spoke with NextBio about the future of scientific information exchange.
“Everybody knows about genes that are studied. How do you find information about genes that aren’t? Where do you even start looking?”
As a graduate student, I was always amazed by—and a little skeptical of—any software that promised to help my data woes. I’m still curious to know what goes on behind the scenes, so to speak, when a website manages to take my search terms and raw data and turn them into pretty graphs and new correlations. Watch this space to find out about the “behind the scenes” people and ideas that shape NextBio.
We’re excited to bring you “Life @ NextBio”, a series which spotlights our curators, engineers, advisors, and others as they talk about life and work at NextBio. This week, meet Aisha Furqan, associate scientist in Curation, recent graduate from the Biological Sciences Department at Cal Poly Pomona, and enthusiastic NextBio user.
Finally, the moment we’ve all been waiting for, a standing ovation for …. Kelly Bouchonville! Congratulations to NextBio’s first place Travel Grant winner!
Effective communication of data is one of the most important and also most overlooked aspects of any research career. In an age of digital technology and free and instant access to databases containing a plethora of information, it is vital that one be able to easily interpret and assemble that data. No longer is it enough to look at data from a single species and/or experiment. One must broaden the scope to look at effects seen in a broad array of organisms and/or experiments carried out under a range of conditions. Additionally, many research outcomes are no longer single gene-centric, but must take entire pathways and systems into account.
As a graduate student nearing completion of a Ph.D., it has become increasingly important to be able to integrate data from numerous studies, both in a variety of conditions for my organism of choice and for single genes/proteins of interest in a range of organisms. The tools available through NextBio have simplified some parts of the integration process. Additionally, NextBio tools facilitate broadening the implications of results by providing correlations and additional studies of interest, making it easy to see how specific results in one organism translate into an adverse effect in another organism.
A big round of applause please for… Elena Piskounova! Congratulations to NextBio’s second place Travel Grant winner!
One of the main interests of my PhD work has been a subset of the Argonaute family of proteins called the Piwi proteins. Piwi proteins have been shown to play a key role in male germ cell maintenance and to function with a novel class of small non-coding RNAs called piRNAs (Piwi-interacting RNAs). However, the mechanisms by which Piwi proteins control transposable elements and DNA methylation have not been completely elucidated.
When I was introduced to the capabilities of NextBio, there were several key features that I found particularly useful. I first performed a simple search for the different members of the Piwi protein subfamily. I was particularly struck by the convenient organization of the search results, allowing me to see the instances in which the different Piwi genes have been analyzed in various tissues, diseases, as well as drug studies. This has given me insight into the various systems that are being used to study Piwi proteins. Furthermore, delving deeper into these results allowed me to identify the less obvious studies that did not truly focus on Piwi proteins, yet contained valuable information on their regulation nonetheless.
Drum roll please… Congratulations, Bibhash Mukhopadhyay, and thanks to all participants for entering.
NextBio: A versatile one-stop-shop for researchers (and a data junkie’s paradise)!
I am currently writing my doctoral dissertation on a gene involved in retinal degeneration at Baylor College of Medicine, and I have had internship experience. Both of these responsibilities involve assimilation and organization of a large amount of information to create mental snapshots that can be recalled and applied to specific contexts that I am interested in looking at. For my dissertation, I need to interpret primary research data in the context of existing literature or “knowledge base”, whereas the internship required integration of technical and clinical data for enumerating commercial utility.