Specified Life: June 2007

Saturday, June 30, 2007

Newest version of Neoplasm Classification now available

The June 30, 2007 version of the Developmental Lineage Classification and Taxonomy of Neoplasms (also called the "Neoplasm Classification") is available for download as a no-cost gzip-compressed XML file distributed under the GNU document license.

The Neoplasm Classification consists of over 130,000 different terms for neoplasms arranged by synonymy (i.e., concept identifier) and by developmental lineage. The 130,000+ terms are distributed among approximately 5,800 different neoplasm concepts. The Neoplasm Classification has been described in a freely available BMC journal article.

Each classified neoplasm terms is designated by a code identifier consisting of the letter "C" followed by a 7 digit number.

In addition to (and following within the file) the list of classified neoplasm terms are the following supplemental lists:

- a list of unclassified cancer related terms (all identified by the same identifier, "C0000000")

- a list of unclassified terms of precancers (all identified by the same identifier, "C0000001")

- a list of classified medical syndromes associated with the co-occurrence of neoplasms in some cases, and designated by an identifier consisting of the letter "S" followed by a 7 digit number.

- a list of classified staging terms for neoplasms, each designated by an identifier consisting of the letter "ST" followed by a 7 digit number.

The total number of cancer-related terms in the file exceeds 146,000. To the best of my knowledge, this exceeds the size of any other cancer nomenclature by over 10-fold.

-Jules Berman

In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.

I urge you to read more about my book. There's a generous preview of the book at the Google Books site.

tags: common disease, orphan disease, orphan drugs, genetics of disease, disease genetics, rules of disease biology, rare disease, pathology, classifications, developmental biology, neoplasm, precancer

Thursday, June 28, 2007

Embryology specifications

The Nomina Embryologica Veterinaria (2nd Edition) is available online and is an authoritative, internationally recognized standard for specifying the embryologic derivation of anatomic structures.

At the same web site, you can download the Nomina Anatomica Veterinaria

The Terminologia Anatomica is the authoritative standard for human anatomy. To the best of my knowledge, it is not available as a free, publicly available electronic document.

The drawback all three of these excellent sources is that their organization is somewhat old-fashioned.
Their anatomic hierarchies are determined via indentation of terms in a long list. They are not really conducive to computer parsing.

Another interesting embryologic source is the Ontology of Human Developmental Anatomy, found at http://www.ana.ed.ac.uk/anatomy/database/humat/mchome.html. This lists the anatomic parts for each Carnegie stage of development. Again, it is organized through indentation.

A very interesting embryology specification is found at
http://www.berkeleybop.org/ontologies/obo-all/human-dev-anat-staged/human-dev-anat-staged.obo_xml, available as part of the OBO (Open Biomedical Ontologies) project.

The hierarchy for each entry can be compiled from the XML file with the following Ruby Script:

#!/usr/local/bin/ruby
#embryo.rb
#
#This Ruby script was created by Jules J. Berman and updated on 6/28/2007
#
#The software is provided "as is", without warranty of any kind,
#express or implied, including but not limited to the warranties
#of merchantability, fitness for a particular purpose and
#noninfringement. in no event shall the authors or copyright
#holders be liable for any claim, damages or other liability,
#whether in an action of contract, tort or otherwise, arising
#from, out of or in connection with the software or the use or
#other dealings in the software.
#
class Taxonomy < Hash
def initialize
@id_name = Hash.new
@child_parent = Hash.new
@parent_child = Hash.new
@out = File.open("ancestor.txt","w")
end

def print(some_string)
@out.print(some_string)
end

def add(name, entry_id, parent)
@id_name[entry_id] = name
@child_parent[entry_id] = parent
@parent_child[parent] = entry_id
end

def get_names_and_ids(file_handle)
@id_name.each {|key,value| file_handle.print(key," ",value,"\n")}
file_handle.close
end

def get_ancestors(first)
@out.printf "%-8d %-s \n", first, @id_name[first]
upper = @child_parent[first]
get_ancestors(upper) if @id_name.has_key?(upper)
end

def check_descendant(first)
@parent_child.value?(first)
end

def get_descendants(first)
@out.printf "%-8d %-s \n", first, @id_name[first]
lower = @parent_child[first]
get_descendants(lower) if @id_name.has_key?(lower)
end
end

start = Time.now.to_f
class_finder = Taxonomy.new
# EHDA:10028
# floor plate
# EHDA:10026
taxon = File.open("berkeley.txt")
name_id_file = File.open("taxnames.txt","w")
$/ = "\"
while record = taxon.gets
next if record !~ /part_of/
record =~ /\(.+)\<\/name\>/
name = $1.to_s
record =~ /\EHDA\:([\d]+)\<\/id\>/
entry_id = $1.to_s
record =~ /\EHDA\:([\d]+)\<\/to\>/
parent = $1.to_s
class_finder.add(name, entry_id, parent)
end
class_finder.get_names_and_ids(name_id_file)
taxon_file = File.open("taxnames.txt")
$/ = "\n"
while record = taxon_file.gets
next if record == "\n"
record =~ /^[0-9]+/
code = $&
if (class_finder.check_descendant(code))
class_finder.print("\/\/\n")
class_finder.get_ancestors(code)
end
end
print "\nTotal time\, ", ((Time.now.to_f - start).to_i), " seconds\n"
exit

This takes about 20 seconds on a 2.5 GHz CPU to parse through the 325 Megabyte human-dev-anat-staged.obo_xml file. The output file lists the hierarchy for each entry. A few output records are:

10331 ventral mesentery
10329 mesentery
10311 foregut-midgut junction
10255 gut
10251 alimentary system
10250 visceral organ
9739 organ system
9584 embryo
//
6870 endodermal epithelium
6868 ultimobranchial body
6852 gland
6841 pharynx
6829 foregut
6828 gut
6824 alimentary system
6823 visceral organ
6379 organ system
6053 embryo
//
9390 respiratory system
9097 visceral organ
8550 organ system
8384 embryo
//
7195 mesothelium
7193 pleural component
7169 intraembryonic coelom
7168 cavities and their linings
7167 embryo
//
6871 hyoid bone
6841 pharynx
6829 foregut
6828 gut
6824 alimentary system
6823 visceral organ
6379 organ system
6053 embryo

-Jules Berman

My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.

I urge you to explore my book. Google books has prepared a generous preview of the book contents.

Wednesday, June 27, 2007

Are human tissues owned?

There's not a whole lot of legal precedent in the biorepository field, and when a case lands in an appeals court, it's very important.

The bare bones of the case are as follows: Dr. Catalano, a prostate cancer researcher at Washington University, saved tissues in a biorepository over several decades. When he took a job at another institution, he sent out a letter to many of the people whose tissues were stored in the biorepository. The letter asked for permission to transfer their tissues to his new employer. He received a number of letters granting the permission. Washington University stepped in and informed Dr. Catalano that the tissues would need to remain in the Washington University biorepository.

Wash. U. asked the court for a declaratory ruling that the tissues were owned by Washington University and could not be taken by Dr. Catalano. The appellate court upheld that the tissues were owned by Washington University.

The text of the ruling is publicly available.

I read over the entire ruling and found myself in total agreement with the court. It was very interesting to read the evidence that the court considered important for their decision.

There was one point in their ruling that I found troubling. The court noted that the tissues were donated to Washington University by patients (which assumes that patients were the original owners of the tissues). Once donated (according to the court), the tissues were owned by Wash U (not Dr. Catalona). The problem here is that there's been a long tradition of not assigning ownership for human tissues. I'm no lawyer, but it's been my impression that removed human tissues carry a variety responsibilities (e.g. render a diagnosis, store in a paraffin archive, provide copies of diagnostic reports as needed) and rights (use tissues for biomedical research). Traditionally, All this is done through an exercise of rights and responsibilities, without an assignment of ownership.

Ownership is a mercantile concept and implies that the owner can sell the property. If you own a cow, that means you can sell the cow. If you own a house, you can sell the house. If you own a tissue sample, that would seem to mean that you can sell the tissue sample. But that is exactly why the tradition has been that tissue samples are never owned. Ethicists have commented that we shouldn't be selling tissues, and that implies that we shouldn't be owning tissues. We can use tissues for different purposes without assigning ownership. An analogy is that we breathe air from the atmosphere, but nobody owns the atmosphere. We also can be required not to pollute the air, even though we don't own the air. Basically, we can have a system of rights and obligations without assigning ownership.

Well, this court decision seems to break a long-held tradition. The court clearly thought it needed to determine who owns the tissues (Wash U or Dr. Catalona), and they ruled that the tissues were property owned by Wash U. So what happens next? Have archived tissues become property [that can be bought and sold]?

-Jules Berman

Science is not a collection of facts. Science is what facts teach us; what we can learn about our universe, and ourselves, by deductive thinking. From observations of the night sky, made without the aid of telescopes, we can deduce that the universe is expanding, that the universe is not infinitely old, and why black holes exist. Without resorting to experimentation or mathematical analysis, we can deduce that gravity is a curvature in space-time, that the particles that compose light have no mass, that there is a theoretical limit to the number of different elements in the universe, and that the earth is billions of years old. Likewise, simple observations on animals tell us much about the migration of continents, the evolutionary relationships among classes of animals, why the nuclei of cells contain our genetic material, why certain animals are long-lived, why the gestation period of humans is 9 months, and why some diseases are rare and other diseases are common. In “Armchair Science”, the reader is confronted with 129 scientific mysteries, in cosmology, particle physics, chemistry, biology, and medicine. Beginning with simple observations, step-by-step analyses guide the reader toward solutions that are sometimes startling, and always entertaining. “Armchair Science” is written for general readers who are curious about science, and who want to sharpen their deductive skills.

Tuesday, June 19, 2007

"Precancer" versus "early cancer"

Precancers are the lesions from which cancers grow. Some people question why we need to specify some lesions as precancers when we know that carcinogenesis is a multistep process and that every cancer traverses many un-named biological states as it develops into a fully malignant lesion. Why can't we recognize that precancers are just an early form of cancer and refer to the precancers by the name of its developed cancer? Can't we just use adjectives like "early stage" squamous carcinoma or "non-invasive" pancreatic carcinoma? Wouldn't that make life a lot easier than inventing names for the pre-invasive stage of every cancer?

Much as I like data simplification, it just can't be done in the case of the precancers. Precancers have specific, characteristic properties that separate them from cancers. Because of these properties, the treatment of precancers may be very different from the treatment of cancers. In fact, if we take full advantage of the biologic features that separate the precancers from the cancers, we may actually find that we can eliminate deaths from cancer.

What are these special properties of the precancers?

1. Precancers, unlike cancers, tend to regress. Cancers tend to grow and only rarely regress. Furthermore, it some cases, we can influence the rate of regression of the precancers with relatively non-toxic drugs. Understanding the biology of regression is something that we can only learn from the precancers.

2. When a precancer progresses, it progresses to cancer. But not all precancers progress. Many precancers just stay precancers indefinitely, as far as we can tell. Why should we think of a precancer as an early stage of a cancer if it never becomes a cancer?

3. Precancers that progress to cancer can apparently progress into more than one type of cancer. Consequently, there are more types of cancers than there are types of precancers. For instance, in the lung, squamous metaplasia/dysplasia of bronchial epithelium may give rise to bronchogenic squamous cell carcinoma, bronchogenic adenocarcinoma, bronchogenic small cell carcinoma, or bronchogenic mixed carcinoma. If a lesion can progress into any of several different lesions, it is impossible to pretend that the lesion is just an early form of one named cancer.

4. Precancers can be cured. When a precancer is cured, the cancer never develops. The treatments that we use for precancers are likely to be different from (and much less toxic than) the treatments that we use for cancers.

Because the biology of precancer is distinguishable from the biology of cancer, and because there are clinically useful reasons (i.e., treatment and prevention of cancer) to make these distinctions, the precancers should be curated as designated entities.

Jules Berman

Friday, June 15, 2007

Ruby Programming

I've recently written a book, "Ruby Programming for Medicine and Biology." The Table of Contents is available at the Jones and Bartlett Web Site.

In my opinion, it's important that biomedical informaticians become self-sufficient and less reliant on vendor-supplied applications. The simple act of writing your own programs is an empowering experience and permits us to develop and try new ideas, something that would not be feasible with commercial software.

For a long time, I've been an advocate of Perl (see my book).

Perl is very good when you want to do imperative programming (sometimes called procedural programming). Basically, in imperative programming, the program consists of the implementation of an algorithm in the syntax of the programming language. Each line of the program is another command that executes a step in the algorithm. Procedural programming is virtually the same as imperative programming. The only difference is that in procedural programming, a step in the algorithm may involve calling an external method (i.e., another algorithm). You can think of procedural programs as imperative programs with subroutines. This is what Perl does very well. Because it's easy to learn Perl syntax and because the built-in Perl commands and the available Perl modules provide most of the functionality that anyone would need in the biomedical field, Perl has become a very popular language among bioinformaticians.

The problem with Perl is that it is not well suited as a language that models and integrates biomedical classifications and ontologies. This last jargon-heavy sentence deserves a little explanation, but you probably don't need the standard essay on the data-intensive aspects of modern biomedicine. Suffice it to say that when you have lots and lots of complex data, you need some way to simplify the data and to relate one kind of data to other kinds of data. The best way to simplify data is with classifications or ontologies that can annotate data in a manner that everyone can understand and exchange. When you talk about classifications and ontologies, you're talking about data objects, object (instance) methods, class methods, inheritance, metadata descriptions, specifications, on and on. These are the things that object oriented languages provide.

Ruby is a great object oriented language because it is free, open source, has a very simple and logical syntax, and gracefully models existing biomedical classifications and ontologies. I tried using object-oriented Perl for my work with classifications and ontologies, but it just was not a good fit. I dabbled in Python (an excellent object-oriented programming language that has many of the features I was seeking), but it lacked a few things that I wanted.

Let's not get into an endless argument over Ruby v Python v Java. Let me just say that Python is fine (I won't get into my peeves regarding Java), but I chose Ruby because 1) its syntax was beautiful and simple, and I had no trouble learning the language; 2) it enforces single lineage inheritance (which greatly simplifies the language and fits well with the biological classifications that I work with), and 3) it uses the so-called open world paradigm for evaluating assertions, returning true, false or nil (rather than the true/false dichotomy of Perl and Python). I really need Ruby's "nil".

When do you use Perl, and when do you use Ruby? I use Perl whenever I want to create simple utility scripts (transforming one file into another file of a different structure, performing a single algorithm on an input, and so on). In the past, most of my work was this sort of thing. I don't use Ruby to create short utilities because Ruby is slower than Perl. A Ruby script will execute in about twice the time as a Perl script for the same algorithm. This is true of all object-oriented languages. The primary reason they run slowly is because they need to traverse their object libraries when methods are sent to objects.

I use Ruby for modeling biomedical domains. This usually means that if I'm using RDF, ontologies, classifications, objects, object libraries, I use Ruby.

Some of you may have heard of Ruby on Rails (RoR). This is a web server programming environment for creating simple, quick, elegant, object-oriented Web applications. It is wildly popular at the moment. It's just one more perk to learning Ruby.

-Jules Berman

Wednesday, June 13, 2007

Obviousness in Patents

On April 30, 2007, the U.S. Supreme Court, in a unanimous opinion, reversed a decision of the Court of Appeals, and determined that a certain patent claim (read opinion if interested in details) was obvious (and thus not enforceable).

The text of the opinion is available.

As someone who has been following the patents issued on medical standards, and on the uses of medical standards, I have long been interested in the "obviousness" issue. For a device to be patented, it should be new, useful and non-obvious. The problem is that it can be difficult to determine when a device is obvious.

The Supreme court opinion provides a fascinating discussion of the principles of obviousness. I particularly liked the discussion on pages 12-24, which addressed general issues of obviousness.

One issue related to the use of combinations of prior art in a novel manner. This is the most common way in which software gets patented. With virtually no exception, all software is built from algorithms that were already in existence. The novel combination of prior algorithms can produce a new, non-obvious and useful device.

My [layman's] interpretation of the Supreme Court opinion is that merely putting together prior art to make a new device can only qualify for a patent if the resulting device is unexpected, because people in the field would not be expected to put the prior art together in the manner of the patent or because the result of combining the prior art yielded a result that would not have been predicted by the people in the field.

It seems as though the decision raises the bar for patents, particularly patents that are built on prior art (e.g., all software and most software stanndards). I urge all those involved in software standards to read the Supreme Court decision and draw their own conclusions.

Jules Berman

Friday, June 1, 2007

Funding opportunity in precancer research

The U.S. National Cancer Institute (NCI) has put out an innovative Request for Applications (RFA) for precancer research. As you know, I support the idea that attacking precancers is the best way to eliminate human cancer. There's every reason to think that precancers can be treated successfully with low-toxicity agents that interfere with the pathways of precancer growth and progression or that enhance the pathways of precancer death. Most of the research in this field will be data-intensive. Those who know how to specify their data will probably welcome the data sharing provisions in the RFA.

The RFA, focused on breast precancers, just came out, and can be viewed at:

http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-07-047.html

Release/Posted Date: May 30, 2007
Opening Date: September 14, 2007
Letters of Intent Receipt Date: October 14, 2007

The RFA uses an R01 funding mechanism (that's good).

The RFA cites our November 2004 conference on precancers that was co-sponsored by George Washington University.

From the RFA: "The NCI as well as experts in the extramural scientific community recommend further research related to the biology of the pre-malignant state in human breast cancer. An expert panel convened at the November 2004 NCI Workshop on Pre-Cancers identified delineation of the biological, genetic, and functional characteristics of pre-cancers as major scientific needs (Cancer Detect Prev. 2006;30(5):387-94). The distinctive early lesions that occur have characteristic properties that should permit them to be detected, diagnosed, and prevented from progressing to invasive cancer. The Workshop participants noted a number of impediments to conducting research on pre-cancers, including:

* insufficient understanding of normal and pre-cancer biology;
* limited access to appropriate specimens;
* a highly subjective, histology-based classification scheme; and
* the lack of strategic partnerships among research communities."

-Jules Berman tags: cancer research, data sharing, funding, nci, precancer, science