open (TEXT, "MRCONSO");
$line = " ";
while ($line ne "")
$line = <TEXT>;
next if ($line !~ /ENG/);
if ($line =~ /\b[a-z]+iform\b/i)
$term = lc($&);
foreach $key (sort keys %subhash)
The MRCONSO file (previously called the Mr. Con file) is the large (greater than 800 Megabyte) UMLS Metathesaurus file that contains all of the metathesaurus terms. It is available free from the U.S. National Library of Medicine, but you need to register and complete an online license agreement before they will release the metathesaurus files to you.
The Perl script (above) can be easily modified for simple extraction projects. If you're interested in learning Perl to help you with biomedical projects, you might want to read my book, Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby (Chapman & Hall/CRC Mathematical and Computational Biology).
Here is the complete list of "iform" words:
In June, 2014, my book, entitled Rare Diseases and Orphan Drugs: Keys to Understanding and Treating the Common Diseases was published by Elsevier. The book builds the argument that our best chance of curing the common diseases will come from studying and curing the rare diseases.
I urge you to read more about my book. There's a generous preview of the book at the Google Books site. If you like the book, please request your librarian to purchase a copy of this book for your library or reading room.
- Jules J. Berman, Ph.D., M.D. tags: common disease, orphan disease, orphan drugs, rare disease, disease genetics, biomedical informatics, perl programming