Thursday, June 19, 2008

Biomedical Informatics Book

I just visited (6/19/08) my book's Amazon page to see how sales were doing, and I found that Amazon has reduced its price. They're currently selling it for $50.91 (a 32% savings) and free shipping. This is pretty good, because they usually sell it with no reduction, or with a negligible reduction.

If you're curious about Biomedical Informatics, here is a list of contents.

0. Preface.

1. What is biomedical data, and what do we do with it?

1.1. Background.

1.2. The challenge of translational research.

1.3. Disasters in translational research.

1.4. The role of biomedical data in translational research.

1.5. Expertise in biomedical informatics.

1.6. The good news: no-cost tools.

1.7. The bad news: the high-cost of human cooperation.

1.8. Realistic opportunities for biomedical informaticians.

2. The data of biomedical informatics.

2.1. Background.

2.2. Data files and databases.

2.3. Medical databases and hospital information systems.

2.4. Every patient must be uniquely identified within the system.

2.5. All data entered should be retrievable.

2.6. Entered data should only be modified with great caution.

2.7. The government as a source of biomedical data.

2.8. Your right to obtain government data - freedom of information act.

2.9. Access to research data discovered under u.s. grants.

2.10. Grantees strike back: the u.s. bayh-dole act.

2.11. Intellectual property.

2.12. Fair use and other academic privileges.

2.13. Madey v duke and the erosion of academic privilege.

2.14. Further cautions on the use of proprietary software and data.

2.15. The often misunderstood concept of patient data "ownership".

2.16. Sharing data.

2.17. Legacy data.

2.18. Free, open source and proprietary software and data.

2.19. Undifferentiated software.

2.20. What are some of the open access biomedical databases?

2.21. Open access medical terminologies.

2.22. Mesh (the national library of medicine's medical subject headings).

2.23. Taxonomy.

2.24. Disease data and epidemiologic data.

2.25. The impact of free and open source data and software on biomedical informatics.

3. Confidential biomedical data.

3.1. Background.

3.2. Human subject risks.

3.3. The risk to life and health as a direct result of a medical intervention.

3.4. The risk of loss of database functionality.

3.5. The differences between confidentiality and privacy.

3.6. Example: loss of privacy resulting from participation in a medical study.

3.7. Loss of confidentiality.

3.8. The responsibilities of biomedical informaticians to human subjects.

3.9. Patient record anonymization.

3.10. Patient record de-identification.

3.11. An example of the law of unintended consequences.

3.12. Violations against the common rule (in u.s.).

3.13. Violations against hipaa (in u.s.).

3.14. Tort and violations against individuals.

3.15. What consents does the patient have on record?

3.16. Consented versus unconsented human subject research.

4. Standards for biomedical data.

4.1. Background.

4.2. The criticality of common standards.

4.3. The non-role of government (in the u.s.) in standards-making.

4.4. The hazards of creating a new standard.

4.5. Overview of standards development.

4.6. How are standards developed, approved and adopted?

4.7. The utility of non-standards.

4.8. The non-standard present - specifications and unique objects.

4.9. The non-standard future - data semantics.

4.10. Unique object identifiers.

4.11. Life science unique identifiers.

4.12. Hl7 unique identifiers.

4.13. Unique problems associated with uniqueness.

4.14. Specifying information: do you have the time?

4.15. Introduction to meaning.

5. Just enough programming.

5.1. Background.

5.2. Why you should learn some fundamental programming.

5.3. Just enough perl.

5.4. Downloading perl.

5.5. File operations.

5.6. Perl script basics.

5.7. The directory path to perl.

5.8. Accessing files.

5.9. The open1.pl script, line by line.

5.10. An 8-line perl word processor.

5.11. Don't panic! perl will forgive you.

5.12. Pseudocode for a general biomedical informatics program.

5.13. Interactively reading lines from a file.

5.14. Scanning enormous files quickly.

5.15. Getting just what you want with perl regular expressions.

5.16. Pseudocode for common uses of regex (regular expression pattern matching).

5.17. Regular expression syntax.

5.18. Removing periods that do not delineate sentences.

5.19. Counting all the words in a text file.

5.20. Finding the frequency of occurrence of each word in a text file (zipf distribution).

5.21. Creating a persistent database object.

5.22. Retrieving information from a persistent database object.

5.23. Validating xml tags using regular expressions.

5.24. What have we learned?

6. Programming common biomedical informatics tasks.

6.1. Background.

6.2. Computing a one-way hash for a word, phrase or file.

6.3. Simple statistics.

6.4. Invoking statistical tests through perl modules.

6.5. Avoiding type 4 errors with resampling.

6.6. Using random numbers.

6.7. Resampling and monte carlo statistics.

6.8. How often can i have a bad day?

6.9. Rough test of the built-in random number generator.

6.10. The monty hall problem: solving what we cannot grasp.

6.11. Internal and external math modules for perl.

6.12. Using external modules - fast fourier transform.

6.13. Indexing text.

6.14. Searching large text files.

6.15. Finding needles fast using a binary-tree search of the haystack.

6.16. Clustering: algorithms that group similar objects.

6.17. Retrieving information from the internet.

6.18. Gene sequence parsing: finding palindromes in a gene database.

6.19. Why counting is non-trivial and important.

6.20. Why you should write your own counting programs.

6.21. Software utilities versus software applications.

6.22. Software evaluation.

7. Biomedical nomenclatures.

7.1. Background.

7.2. Big nomenclatures and small nomenclatures.

7.3. Curating nomenclatures.

7.4. Automatic expansion of a medical nomenclature.

8. Misbehaving text: dealing with poorly written medical text.

8.1. Background.

8.2. Spelling errors.

8.3. Homonymous terms.

8.4. Abbreviations that are sometimes both acronyms and shortened forms.

8.5. Prepositions and articles retained in an acronym.

8.6. Single expansions with multiple abbreviations.

8.7. Nonsense abbreviations.

8.8. Common usage that confounds meaning.

8.10. Pejorative abbreviations.

8.11. Locale-dependent abbreviations.

8.12. Classifying abbreviations by their expansion algorithms.

8.13. Ephemeral abbreviations.

8.14. Hyponymous abbreviations.

8.15. Polysemous abbreviations.

8.16. Abbreviations masquerading as words.

8.17. Fatal abbreviations: innocent victims of abbreviation drift.

8.18. Forbidden abbreviations.

9. Autocoding unstructured data (narrative ext).

9.1. Background.

9.2. Machine translation.

9.3. Autocoding.

9.4. Human fallibility and the limitations of human-collected data.

9.5. A fast lexical autocoder.

9.6. Evaluating autocoders: dealing with precision and recall.

9.7. Other performance issues.

9.8. On-the-fly coded data retrieval without pre-coding.

9.9. Different philosophical approaches to term-based data retrieval.

9.10. Why it is important to have fast autocoding software.

10. Computational methods for de-identification and data scrubbing.

10.1. Background.

10.2. Anonymization, de-identification, data scrubbing.

10.3. Identifiers.

10.4. Stripping identifiers.

10.5. How good is good enough?

10.6. Scrubbing data.

10.7. De-identification algorithms.

10.8. Feasibility of de-identification.

10.9. Non-uniqueness and de-identification.

10.10. Leveraging some confidential information to learn more confidential information.

10.11. Performance considerations for de-identification software.

10.12. De-identification and data sharing patents.

11. Cryptography in biomedical informatics.

11.1. Background.

11.2. One-way hashing algorithms.

11.3. One-way hash weaknesses: dictionary attacks and collisions.

11.4. Zero-knowledge patient reconciliation.

11.5. Threshold protocol.

11.6. Electronic signatures.

12. Describing data with metadata.

12.1. Background.

12.2. Metadata, xml (extensible markup language) and rdf (resource description framework).

12.3. Enforced and defined structure (xml rules and schemas).

12.4. Formal metadata (through the iso11179 specification).

12.5. Namespaces (sharing metadata).

12.6. Linking data via the internet.

12.7. Logic and meaning.

12.8. Self-awareness (embedded protocols and commands).

12.9. Integrating heterogeneous data with rdf.

12.10. Meaning requires a fully-specified subject.

12.11. Meaningfully biomedical description with notation 3.

12.12. The daml extension of rdf .

12.13. Owl extension of daml.

13. Simplifying complex data with classifications and ontologies.

13.1. Background.

13.2. The value of hospital information technology.

13.3. Understanding complexity.

13.4. The importance of data simplification.

13.5. Example case: a molecular classification of cancer.

13.6. Cancer nomenclatures, taxonomies, classifications and ontologies.

13.7. Practical limitations of classifications.

13.8. Ontologies: multi-class inheritance and logical inferences.

13.9. Go, the gene ontology that is not an ontology.

14. Clinical trials: the informatician lives in a statistical world.

14.1. Background.

14.2. Do we need clinical trials?

14.3. The length and expense of clinical trials.

14.4. An imaginary clinical trial.

14.5. Modeling a clinical trial.

14.6. What do models tell us?

14.7. The informatics of clinical trials.
14.8. Clinical trials need to be validated by post-trial experience.

15. Distributed computing.

15.1. Background.

15.2. Remote procedure calls, soap, web services and grid computing.

15.3. Data utopia.

15.4. Data dystopia.

16. Grantsmanship for biomedical informaticians.

16.1. Background.

16.2. Institutional risks from biomedical informatics research.

16.3. Funders' risks from biomedical informatics research.

16.4. Suggestions for biomedical informaticians who write grant applications.

17. A practical approach to ethics for biomedical informaticians.

17.1. Background.

17.2. Is it ever ok to lie?

17.3. When can you use unconsented identified medical records?

17.4. When can you use proprietary software and standards?

17.5. When is it ok to have conflicts of interest?

17.6. When is it ok to refuse consent?

17.7. Is it ethical to patent biomedical discoveries?

17.8. The etiquette of free software usage.

17.9. Hoarding research data.

17.10. Are there ethical alternates to hipaa's safe harbor de-identification method?

17.11. Can you use consented data for unconsented research?

17.12. When is it ethical to enforce copyright medical research publications?

17.13. Is it ok to profit from tissue banking services?

17.14. How likely is a hipaa lawsuit?

17.15. Being fair to the outraged patient.

17.16. When can i be wrong?

17.17. Closing platitudes.

18. References (commented).

19. Appendix.

19.1. The c programming language.

19.2. The java programming language.

19.3. Perl, open source programming language.

19.4. Python, open source programming language.

19.5. Ruby, open source object oriented programming language.

19.6. Swig, open source glue tool.

19.7. Open microscopy environment (ome).

19.8. R open source statistical programming language and bioconductor.

19.9. Open source bioperl, biopython, bioruby.

19.10. Open source electronic laboratory notebook, neurosys.

19.11. Open source gimp image software.

19.12. Open source nih image.

19.13. Pov-ray image rendering open source software.

19.14. Open source compression and archiving utilities (gzip, gunzip, tar, 7-zip, bunzip).

19.15. Cygwin, open source unix/linux emulator.

19.16. Gnupg, open source encryption tool.

19.17. Wget web site mirroring software.

19.18. Open source indexing software (swishe-e and lucene).

19.19. Open source wordprocessing software (abiword and openoffice writer).

19.20. Open source emacs text editor.

19.21. Open source spreadsheet software.

19.22. Open source presentation software.

19.23. Mumps, an ansi standard programming language for medical informatics.

19.24. MySQL, open source database software.

19.25. Protege, open source ontology editor.

19.26. Vista, a free hospital information system courtesy of the u.s. government.

19.27. CWM, a closed world machine for rdf (in python).

19.28. Pubmed and pubmed central.

19.29. Resources from the national center for biotechnology information.

19.30. Database issue of nucleic acids research.

19.31. Locuslink and its successor, entrez gene.

19.32. Time stamping.

19.33. Google, as if you didn't already know.

19.34. Sourceforge.

19.35. CVS, concurrent versions system.

19.36. Cpan, the comprehensive perl archive network.

19.37. Requests for comment.

19.38. Omim - online mendelian inheritance in man.

19.39. Loinc, logical observations identifiers, names, and codes.

19.40. HL7 - health level 7.

19.41. Seer.

19.42. U,LS metathesaurus.

19.43. Medical subject headings - mesh.

19.44. Gene ontology - GO .

19.45. OBO (open biology ontologies).

19.46. Ushik metadata registry.

19.47. Neoplasm classification.

19.48. US census.

20. Glossary.

21. List of lists.

22. Index.

23. Author biography.

More book information is available from the publisher's web site.

-Jules Berman

key words: medical informatics, bioinformatics, Perl programming, biomedical data, medical confidentiality, medical privacy, hipaa, big data, metadata, data preparation, data analytics, data repurposing, datamining, data mining
My book, Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information was published in 2013 by Morgan Kaufmann.



I urge you to explore my book. Google books has prepared a generous preview of the book contents.