I just loaded a new web page that contains equivalent scripts written in Ruby, Perl and Python. Each script will expand the 100+ Megabyte European Bioinformatics Institute's taxonomy.dat file to include the complete phylogenetic lineage for each species included in the file.
Instructions for downloading taxonomy.dat are included at the site. This is an incredible file. There are over 400,000 species listed in taxonomy.dat.
Ruby, Perl and Python scripts:
http://www.julesberman.info/taxon.htm
As discussed in an earlier blog, the site for obtaining the lineage of individually entered species, via a query box, is at:
http://www.julesberman.info/post.htm
- Jules Berman
My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.
I urge you to explore my book. Google books has prepared a generous preview of the book contents.
tags: big data, metadata, data preparation, data analytics, data repurposing, datamining, data mining, classification, organisms, taxa, taxon, taxonomy, nomenclature, ruby programming, perl programming, python programming