Thursday, March 13, 2008

Updated files for the Neoplasm Classification now available

Updated versions of the Neoplasm Classification are now available:

The Neoplasm Classification contains over 135,000 classified names of neoplasms in a biological hierarchy based on developmental lineage of the tumor. It is the largest and most comprehensive neoplasm nomenclature in existence. It is available as a simple XML file, an RDF ontology, or a plain flat-file.

These files were prepared by Jules J. Berman. The first version of this file was created November 15, 2003. The modifications were created on March 13, 2008.

The following applies to the distributed documents:

Copyright (c) 2007-2008 Jules J. Berman. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is available at:

The files are provided "as is", without warranty of any kind, expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. in no event shall the author or copyright holder be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

An explanation of the classification can be found in the following
two publications, which should be cited in any publication or work that may
result from any use of this file.

Berman JJ. Tumor classification: molecular analysis meets Aristotle.
BMC Cancer 4:8, 2004.

Berman JJ. Biomedical Informatics . Jones and Bartlett Publishers,
Sudbury, MA, 2007.

In the Neoplasm Classification, all classified names of neoplasms are coded with a "C" followed by a 7 digit number other than 0000000 or 0000001.

For example, "C9168000" = rectal signet ring adenocarcinoma

In addition to classified terms, there are four groups of unclassified terms that are provided special items that follow the list of classified terms in this file.

"S" followed by 7 digits
"ST" followed by 7 digits

This list of unclassified terms coded as "C0000000" consists of general cancer terms that do not specify any particular neoplasm; overly specific terms that provide so-called pre-coordinated annotations related to terms contained elsewere in the Classification; and valid terms that have not been added (yet) to the list of classified neoplasm terms.

Examples of non-specific cancer-related terms are:

borderline tumor
mucinous tumor
blast crisis
preinvasive carcinoma

Examples of overly specific terms are:

squamous carcinoma of the nasal vestibule
gastric non-hodgkin lymphoma of mucosa-associated lymphoid tissue
primary primitive neuroectodermal tumor of the kidney

The terms that are coded with "C0000001" are precancers and related conditions that have not yet been added to the list of classified terms.

The terms that are coded "S" followed by 7 digits are inherited syndromes that have a neoplastic component (i.e., the occasional or frequent appearance of neoplasms in the syndrome).

The terms that are coded "ST" followed by 7 digits are staging terms used by oncologists.

The classification is intended for informatics projects that use computer parsing techniques. Programmers should simply insert statements that filter the unclassified terms included in the file.

Additional information may be available from the author's web site:

The gzipped version of the RDF file (under 1 Megabyte)

The flat file version, listing each term followed by its lineage (gzipped file).

The plain old XML version, with no RDF semantics (gzipped file).

- Jules Berman

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.

tags: common disease, orphan disease, orphan drugs, rare disease, subsets of disease, disease genetics, medical nomenclature, ontology, classification, rdf, resource description framework, xml, semantic web