Data specification is rewarded with funding opportunities

This blog is devoted to the topic of data specification (including data organization, data description, data retrieval and data sharing) in the life sciences and in medicine. You might wonder why anyone would think that data specification is sufficiently important for anyone to host a blog on the topic. Well, there are many reasons that I hope to describe in future posts. For today, just consider an RFA (request for applications) announced yesterday by the NIH.

The RFA is entitled: Genome-Wide Studies in Biorepositories with Electronic Medical Record Data.

It's a very good RFA. Basically, if you have a tissue repository and the tissues are linked to a hospital EMR (Electronic Medical Record), the NHGRI (National Human Genome Research Institute) is interested in receiving a grant application from you.

The NHGRI will perform or pay for genomic studies on collected tissues [provided by awardees] that can be integrated with clinical data in the EMR.

This approach rewards institutions that have made serious efforts in biorepository science, EMR data organization and genomic testing (basically, the bread-and-butter of biomedical data specification).

Why is data specification (for biorepository data and EMR data) so important? Why can't the NHGRI just figure everything out by experiments conducted on cells in a culture dish?

Several years ago there was a lot of hype written about the imminent impact of pharmacogenomics on medical care. Everyone would get drugs specifically tailored to their own genome. Well, that was six or seven years ago, and with very few exceptions, people are prescribing drugs the old-fashioned way (one drug/dose fits all, unless there are bad side-effects or a poor response, in which case, try another drug/dose).

Before you can get much benefit from pharmacogenomics, you need to collect a lot of phenotypic (treatment, outcome, clinical, historical, physical) data on a lot of patients and match these with genotypic data. There's no substitute for clinical correlation with millions of patients. To integrate genotypic and phenotypic data, you need to have large amounts of organized and specified data.

It will take years and years before we have rich collections of well-annotated medical data sets on large numbers of patients. Smart data specification is one of the hurdles that we, as a society, must cross. Yesterday's RFA announcement is a step in the right direction.

-Jules Berman
