Thursday, June 28, 2007

Embryology specifications

The Nomina Embryologica Veterinaria (2nd Edition) is available online and is an authoritative, internationally recognized standard for specifying the embryologic derivation of anatomic structures.

At the same web site, you can download the Nomina Anatomica Veterinaria

The Terminologia Anatomica is the authoritative standard for human anatomy. To the best of my knowledge, it is not available as a free, publicly available electronic document.

The drawback all three of these excellent sources is that their organization is somewhat old-fashioned.
Their anatomic hierarchies are determined via indentation of terms in a long list. They are not really conducive to computer parsing.

Another interesting embryologic source is the Ontology of Human Developmental Anatomy, found at http://www.ana.ed.ac.uk/anatomy/database/humat/mchome.html. This lists the anatomic parts for each Carnegie stage of development. Again, it is organized through indentation.

A very interesting embryology specification is found at
http://www.berkeleybop.org/ontologies/obo-all/human-dev-anat-staged/human-dev-anat-staged.obo_xml
, available as part of the OBO (Open Biomedical Ontologies) project.




The hierarchy for each entry can be compiled from the XML file with the following Ruby Script:

#!/usr/local/bin/ruby
#embryo.rb
#
#This Ruby script was created by Jules J. Berman and updated on 6/28/2007
#
#The software is provided "as is", without warranty of any kind,
#express or implied, including but not limited to the warranties
#of merchantability, fitness for a particular purpose and
#noninfringement. in no event shall the authors or copyright
#holders be liable for any claim, damages or other liability,
#whether in an action of contract, tort or otherwise, arising
#from, out of or in connection with the software or the use or
#other dealings in the software.
#
class Taxonomy < Hash
def initialize
@id_name = Hash.new
@child_parent = Hash.new
@parent_child = Hash.new
@out = File.open("ancestor.txt","w")
end

def print(some_string)
@out.print(some_string)
end

def add(name, entry_id, parent)
@id_name[entry_id] = name
@child_parent[entry_id] = parent
@parent_child[parent] = entry_id
end

def get_names_and_ids(file_handle)
@id_name.each {|key,value| file_handle.print(key," ",value,"\n")}
file_handle.close
end

def get_ancestors(first)
@out.printf "%-8d %-s \n", first, @id_name[first]
upper = @child_parent[first]
get_ancestors(upper) if @id_name.has_key?(upper)
end

def check_descendant(first)
@parent_child.value?(first)
end

def get_descendants(first)
@out.printf "%-8d %-s \n", first, @id_name[first]
lower = @parent_child[first]
get_descendants(lower) if @id_name.has_key?(lower)
end
end

start = Time.now.to_f
class_finder = Taxonomy.new
# EHDA:10028
# floor plate
# EHDA:10026
taxon = File.open("berkeley.txt")
name_id_file = File.open("taxnames.txt","w")
$/ = "\"
while record = taxon.gets
next if record !~ /part_of/
record =~ /\(.+)\<\/name\>/
name = $1.to_s
record =~ /\EHDA\:([\d]+)\<\/id\>/
entry_id = $1.to_s
record =~ /\EHDA\:([\d]+)\<\/to\>/
parent = $1.to_s
class_finder.add(name, entry_id, parent)
end
class_finder.get_names_and_ids(name_id_file)
taxon_file = File.open("taxnames.txt")
$/ = "\n"
while record = taxon_file.gets
next if record == "\n"
record =~ /^[0-9]+/
code = $&
if (class_finder.check_descendant(code))
class_finder.print("\/\/\n")
class_finder.get_ancestors(code)
end
end
print "\nTotal time\, ", ((Time.now.to_f - start).to_i), " seconds\n"
exit

This takes about 20 seconds on a 2.5 GHz CPU to parse through the 325 Megabyte human-dev-anat-staged.obo_xml file. The output file lists the hierarchy for each entry. A few output records are:

10331 ventral mesentery
10329 mesentery
10311 foregut-midgut junction
10255 gut
10251 alimentary system
10250 visceral organ
9739 organ system
9584 embryo
//
6870 endodermal epithelium
6868 ultimobranchial body
6852 gland
6841 pharynx
6829 foregut
6828 gut
6824 alimentary system
6823 visceral organ
6379 organ system
6053 embryo
//
9390 respiratory system
9097 visceral organ
8550 organ system
8384 embryo
//
7195 mesothelium
7193 pleural component
7169 intraembryonic coelom
7168 cavities and their linings
7167 embryo
//
6871 hyoid bone
6841 pharynx
6829 foregut
6828 gut
6824 alimentary system
6823 visceral organ
6379 organ system
6053 embryo

-Jules Berman
My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.



I urge you to explore my book. Google books has prepared a generous preview of the book contents.