Thursday, March 3, 2016

Data Simplification: Contents

On March 23, my book, Data Simplification: Taming Information With Open Source Tools, will be published.

Here is a preview of the contents:
TABLE OF CONTENTS

Chapter 0. Preface
   References for Preface
   Glossary for Preface

Chapter 1. The Simple Life
   Section 1.1. Simplification drives scientific progress
   Section 1.2. The human mind is a simplifying machine
   Section 1.3. Simplification in Nature
   Section 1.4. The Complexity Barrier
   Section 1.5. Getting ready
   Open Source Tools for Chapter 1
      Perl
      Python
      Ruby
      Text Editors
      OpenOffice
      Command line utilities
      Cygwin, Linux emulation for Windows
      DOS batch scripts
      Linux bash scripts
      Interactive line interpreters
      Package installers
      System calls
References for Chapter 1
   Glossary for Chapter 1

Chapter 2. Structuring Text
   Section 2.1. The Meaninglessness of free text
   Section 2.2. Sorting text, the impossible dream
   Section 2.3. Sentence Parsing
   Section 2.4. Abbreviations
   Section 2.5. Annotation and the simple science of metadata
   Section 2.6. Specifications Good, Standards Bad
   Open Source Tools for Chapter 2
      ASCII
      Regular expressions
      Format commands
      Converting non-printable files to plain-text
      Dublin Core
   References for Chapter 2
   Glossary for Chapter 2

Chapter 3. Indexing Text
   Section 3.1. How Data Scientists Use Indexes
   Section 3.2. Concordances and Indexed Lists
   Section 3.3. Term Extraction and Simple Indexes
   Section 3.4. Autoencoding and Indexing with Nomenclatures
   Section 3.5. Computational Operations on Indexes
   Open Source Tools for Chapter 3
      Word lists
      Doublet lists
      Ngram lists
   References for Chapter 3
   Glossary for Chapter 3

Chapter 4. Understanding Your Data
   Section 4.1. Ranges and Outliers
   Section 4.2. Simple Statistical Descriptors
   Section 4.3. Retrieving Image Information
   Section 4.4. Data Profiling
   Section 4.5. Reducing data
   Open Source Tools for Chapter 4
      Gnuplot
      MatPlotLib
      R, for statistical programming
      Numpy
      Scipy
      ImageMagick
      Displaying equations in LaTex
      Normalized compression distance
      Pearson's correlation
      The ridiculously simple dot product
   References for Chapter 4 
   Glossary for Chapter 4

Chapter 5. Identifying and Deidentifying Data
   Section 5.1. Unique Identifiers
   Section 5.2. Poor Identifiers, Horrific Consequences
   Section 5.3. Deidentifiers and Reidentifiers
   Section 5.4. Data Scrubbing
   Section 5.5. Data Encryption and Authentication
   Section 5.6. Timestamps, Signatures, and Event Identifiers
   Open Source Tools for Chapter 5
      Pseudorandom number generators
      UUID
      Encryption and decryption with OpenSSL
      One-way hash implementations
      Steganography
   References for Chapter 5
   Glossary for Chapter 5

Chapter 6. Giving Meaning to Data
   Section 6.1. Meaning and Triples
   Section 6.2. Driving Down Complexity with Classifications
   Section 6.3. Driving Up Complexity with Ontologies
   Section 6.4. The unreasonable effectiveness of classifications
   Section 6.5. Properties that Cross Multiple Classes
   Open Source Tools for Chapter 6
      Syntax for triples
      RDF Schema
      RDF parsers
      Visualizing class relationships
   References for Chapter 6
   Glossary for Chapter 6

Chapter 7. Object-oriented data
   Section 7.1. The Importance of Self-explaining Data
   Section 7.2. Introspection and Reflection
   Section 7.3. Object-Oriented Data Objects
   Section 7.4. Working with Object-Oriented Data
   Open Source Tools for Chapter 7
      Persistent data
      SQLite databases
   References for Chapter 7
   Glossary for Chapter 7

Chapter 8. Problem simplification
   Section 8.1. Random numbers
   Section 8.2. Monte Carlo Simulations
   Section 8.3. Resampling and Permutating
   Section 8.4. Verification, Validation, and Reanalysis
   Section 8.5. Data Permanence and Data Immutability
   Open Source Tools for Chapter 8
      Burrows Wheeler transform
      Winnowing and chaffing
   References for Chapter 8
   Glossary for Chapter 8


Over the next few weeks, I will be blogging on topics selected from Data Simplification: Taming Information With Open Source Tools. I hope I can convince you that this is a book worth reading.

- Jules Berman

key words: computer science, data analysis, data repurposing, data simplification, data wrangling, information science, simplifying data, taming data, jules j berman


Blog readers can use the discount code: COMP315 for a 30% discount, at checkout.

No comments: