Sunday, October 28, 2007

Developmental Classification of Neoplasms now an RDF Ontology

I am publishing today the first ontology version of the Developmental Lineage Classification and Taxonomy of Neoplasms. It is available for download in several file versions.

The full ontology is a 10 Megabyte RDF file. Note that the file is so large that some browsers may not be able to open the entire file. On my computer, I had no trouble opening the file in my Internet Explorer browser, but the file was too large for my Mozilla browser.
http://www.julesberman.info/neordf.xml


The file was validated using the w3c validator service at http://www.w3.org/rdf/validator/, with a caveat. The full ontology file (10+ Mbytes) was too large for the validator, so I truncated the ontology, validated the truncated file (that contained all of the classes, subclasses, properties), and left out the repetitive list of terms. Then I took the entire file and validated it with an XML parser to verify that the file was well-formed. That really covers everything (RDF logic and XML structure).

The gzipped version of the RDF file (under 1 Megabyte).
http://www.julesberman.info/neorxml.gz


The flat file version, listing each term followed by its lineage (gzipped file).
http://www.julesberman.info/neoself.gz


The plain old XML version, with no RDF semantics (gzipped file). http://www.julesberman.info/neoclxml.gz

The ontology contains several parts:

1. The neoplasm classification proper (as illustrated in the schematic)



2. A listing of cancer terms that will probably never be entered into the proper classification (more about this later)

3. A listing of hyperplasias or hamartomas, some of which will be entered into the proper classification and others of which will remain in class Hyperplasia

4. A listing of precancer terms

5. A listing of syndromes associated with increased risk for cancer.

In this version, there are 5841 classified types of neoplasms and 130,503 terms representing the 5,841 types of neoplasms.

This represents the largest nomenclature of neoplasms in existence and, with today's publication, the largest formal ontology (in RDF syntax) of neoplasm names.

Over the next few weeks, I'll post additional blogs to further explain the RDF ontology files.

- Jules Berman
Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.