Interns at NextBio learn to set the stage for a unique kind of data exploration
At NextBio, genomic data snakes through the hands of scientific teams and the automated pipelines they design, connecting people intellectually and socially. Each department is responsible for their own piece of the NextBio puzzle as well as helping new team members cultivate their skills.
The data curation team at NextBio mimics the work of a heart, channeling in the public genomic data that’s essential to the NextBio platform. Beatrice Chiu, who graduated from the Molecular and Cell biology program at UC Berkeley, began her journey at NextBio as a Web Product intern, conducting usability tests to optimize the NextBio user interface. She switched over to the curation team earlier this year to help with a large scale GWAS (genome-wide association study) tagging project. As Beatrice explains, “All studies in the NextBio database usually have a minimum of a biodesign tag, like disease vs. normal, response to a drug, etc. and then a more specific phenotype tag, say for a disease. GWAS studies can also get classified using case-control or other association tags.”
“Everybody knows about genes that are studied. How do you find information about genes that aren’t? Where do you even start looking?”
As a graduate student, I was always amazed by—and a little skeptical of—any software that promised to help my data woes. I’m still curious to know what goes on behind the scenes, so to speak, when a website manages to take my search terms and raw data and turn them into pretty graphs and new correlations. Watch this space to find out about the “behind the scenes” people and ideas that shape NextBio.
We’re excited to bring you “Life @ NextBio”, a series which spotlights our curators, engineers, advisors, and others as they talk about life and work at NextBio. This week, meet Aisha Furqan, associate scientist in Curation, recent graduate from the Biological Sciences Department at Cal Poly Pomona, and enthusiastic NextBio user.
If you’ve looked at our NextBio Publications page lately, you’ve probably noticed that the list of publications from authors who have used NextBio to make novel connections is growing at a steady pace. To this, we add a publication from the Scientific and Computational Biology group at NextBio itself:
Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data
Ilya Kupershmidt, Qiaojuan Jane Su, Anoop Grewal, Suman Sundaresh, Inbal Halperin, James Flynn, Mamatha Shekar, Helen Wang, Jenny Park, Wenwu Cui, Gregory D. Wall, Robert Wisotzkey, Satnam Alag, Saeid Akhtari, Mostafa Ronaghi
PLoS ONE 5(9): e13066. doi:10.1371/journal.pone.0013066
In this article, we explain our processes for data curation and the computational methods by which signatures are compared in NextBio to yield novel findings. We also include four use cases that illustrate how this all comes together for the purpose of investigating brown preadipocytes and brown fat lineage. We hope you find the paper illustrative of how you can apply NextBio’s platform to discovery in your area of research. And don’t forget to include this publication in your citation when you make novel discoveries using NextBio.
By Anoop Grewal
The work of a curator never ends. Sometimes I anthropomorphize the raw data I’m about to work with, imagining it eating potato chips and lazing away on a couch in desperate need of conditioning. “Alright data, get ready for a little exercise,” I say and start it off with some stretching, or more accurately, assessing the quality of the data by examining the experimental design and sample annotations. Next on the program is weight lifting which consists of applying the appropriate statistical analyses to process the data. Don’t think we neglect aerobic activity! To this end, the NextBio Curation team runs each comparison through rigorous tagging by applying relevant biological labels. At this point, the resulting comparisons (now called biosets) are finally fit and ready to be imported to the NextBio platform where they can be scored against the thousands of other datasets.
The aerobics, or tagging of datasets with biomedical terms, goes a long way in helping us achieve our mission of rendering the huge volume of public domain data to formats that better serve the needs of all researchers. Typically experimental results are tagged with multiple biological concepts. For example, a leukemia study in which B-cell acute lymphoblastic leukemia is compared to acute myeloid leukemia using human blood samples would be tagged with the source tissue, peripheral blood mononuclear cells, as well as with the two diseases. And the terms used for tagging aren’t just any biological terms that come to mind. Rather they are derived from accredited biomedical vocabularies.