Wednesday, June 18, 2008

Interoperability efforts: please re-think

I'm sure that every reader of this blog has noticed that the word "interoperabiity" has become hackneyed, along with "standards" and "data integration".

What is wrong with having interoperability for hospitals, and interoperability for biomedical scientists, and interoperability for cancer researchers, and so on?

The problem is that interoperability should extend between the individual data domain. It is self-defeating to spend a lot of money (usually taxpayer's) on efforts that try to achieve information interoperability for one or another interest group.

Assuming that you can achieve any kind of interoperability, you'll still be faced with making the hospital data integrate with the medical research data, and the cancer research data, and so on.

Wouldn't it make a lot more sense to develop generalized methods for describing data, of any kind? That's what this blog is chiefly about (though I often digress). Anyone with data should use a general syntax for describing the data and for relating the data to other data.

The method for describing and relating data is RDF. The best way to advance data interoperability is to start with an RDF-literate workforce.

Last year I addressed a group of about 100 scientists who had convened to discuss image standards in pathology (my field). I asked for a show of hands for the number of people who had "heard of" RDF. Only two or three people raised their hands.

This is a really big problem. How can you achieve interoperability if nobody speaks the language of data specification, RDF?

I've posted, with Bill Moore, a primer in medical image specification, using RDF. It's not a bad place to start, if you're interested in the subject.

-Jules Berman

My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.



I urge you to explore my book. Google books has prepared a generous preview of the book contents.

tags: big data, metadata, data preparation, data analytics, data repurposing, datamining, data mining, biomedical informatics, standards organizations, resource description framework, xml, data integration