By Anoop Grewal
The work of a curator never ends. Sometimes I anthropomorphize the raw data I’m about to work with, imagining it eating potato chips and lazing away on a couch in desperate need of conditioning. “Alright data, get ready for a little exercise,” I say and start it off with some stretching, or more accurately, assessing the quality of the data by examining the experimental design and sample annotations. Next on the program is weight lifting which consists of applying the appropriate statistical analyses to process the data. Don’t think we neglect aerobic activity! To this end, the NextBio Curation team runs each comparison through rigorous tagging by applying relevant biological labels. At this point, the resulting comparisons (now called biosets) are finally fit and ready to be imported to the NextBio platform where they can be scored against the thousands of other datasets.
The aerobics, or tagging of datasets with biomedical terms, goes a long way in helping us achieve our mission of rendering the huge volume of public domain data to formats that better serve the needs of all researchers. Typically experimental results are tagged with multiple biological concepts. For example, a leukemia study in which B-cell acute lymphoblastic leukemia is compared to acute myeloid leukemia using human blood samples would be tagged with the source tissue, peripheral blood mononuclear cells, as well as with the two diseases. And the terms used for tagging aren’t just any biological terms that come to mind. Rather they are derived from accredited biomedical vocabularies.