Saturday, October 27, 2007

National Cancer Institute Thesaurus

The National Cancer Institute (NCI) Thesaurus is a free medical vocabulary available in OWL format from:

It's really quite an impressive document, and there are very few standardized vocabularies that have been prepared as formal ontologies. The creators wisely used the semantics of OWL (Web Ontology Language), a dialect of RDF.

The NCI thesaurus contains terms related to the interests of the NCI and contains the names of many neoplasms.

This vocabulary has been curated for over a decade by in-house ontologists (NCI employees), contractors, and through the use of domain consultants (including some pathologists). It is updated monthly. A lot of money has gone into the development of the NCI Thesaurus, and it is one of the most worked-on vocabularies in the medical field.

The NCI Thesaurus has been reviewed by Barry Smith and colleagues, who found it somewhat lacking.

"RESULTS: We found many mistakes and inconsistencies
with respect to the term-formation principles used,
the underlying knowledge representation system,
and missing or inappropriately assigned verbal and
formal definitions.."
Ceusters W, Smith B, Goldberg L.
A terminological and ontological analysis of the
NCI Thesaurus. Methods Inf Med. 2005;44(4):498-507.

My question is, "If the Thesaurus contains many different knowledge domains (medications, general diseases, neoplasms, etc.) how can it adequately cover all of its constituent domains?" In the neoplasm domain, it is missing many thousands of names of neoplasms. The terminology may be sufficient for its intended purpose (meeting the needs of the NCI community), but because the terminology is not comprehensive, the NCI Thesaurus will not necessarily serve those who want a thesaurus that comes close to including the names of ALL neoplasms.

Also, there doesn't seem to be any single organizing principle for the neoplasm domain. Some neoplasms are subclassed by their anatomic site (e.g. urinary tract neoplasm). Others are subclassed by their tissue type (e.g. soft tissue neoplasm). And so on. This is allowable under an ontology, so long as the ontology maintains consistency and competence (ability to answer questions about the members of classes). But I wonder if this is the best way of organizing tumors. Of course, I'm deeply biased. The Developmental Lineage Classification and Taxonomy of Neoplasms has a single organizing principle.

The NCI Thesaurus is an impressive piece of work and definitely worth looking over.

tags: biomedical informatics, cancer, classification, nomenclature, thesaurus, vocabulary, ontology, rare diseases, orphan drugs, genetics of disease, pathology, common diseases, complex diseases

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.

- Jules J. Berman, Ph.D., M.D.