Saturday, August 4, 2018

Second Edition of Principles and Practice of Big Data now on Science Direct

The Second edition of my book Principles and Practice of Big Data has just been released and is available for purchase at many sites, including Amazon.

For those of you fortunate enough to have access to Science Direct, you can download chapters of my book at:

https://www.sciencedirect.com/science/book/9780128156094



TABLE OF CONTENTS

  Author's Preface to Second Edition 

  Author's Preface to First Edition 

  Chapter 1. Introduction
    Section 1.  Definition of Big Data
    Section 2.  Big Data Versus small data
    Section 3.  Whence Comest Big Data?
    Section 4.  The Most Common Purpose of Big Data is to Produce small data
    Section 5.  Big Data Sits at the Center of the Research Universe
    Section 6.  Case Study: From the Press: Big Claims for Big Data

  Chapter 2. Providing Structure to Unstructured Data
    Section 1.  Nearly all Data is Unstructured and Unusable in its Raw Form
    Section 2.  Term Extraction
    Section 3.  Autocoding
    Section 4.  Concordances
    Section 5.  Indexing
    Section 6.  Machine Translation
    Section 7.  Case Study: Sorted Lists (Why and Why Not)
    Section 8.  Case Study: Doublet Lists 
    Section 9.  Case Study: Ngram Lists 
    Section 10.  Case Study: Proximity Searches Using Only a Concordance  
    Section 11.  Case Study (Advanced): Burrows Wheeler Transform (BWT) 

  Chapter 3. Identification, Deidentification, and Reidentification
    Section 1.  What are Identifiers?
    Section 2.  Difference Between an Identifier and an Identifier System
    Section 3.  Generating Identifiers
    Section 4.  Really Bad Identifier Methods
    Section 5.  Registered Unique Object Identifiers
    Section 6.  Deidentification
    Section 7.  Reidentification
    Section 8.  Case Study: Data Scrubbing
    Section 9.  Case Study: Identifiers in Image Headers
    Section 10.  Case Study: Hospital Registration
    Section 11.  Case Study: One-Way Hashes

  Chapter 4. Metadata, Semantics, and Triples
    Section 1.  Metadata
    Section 2.  eXtensible Markup Language
    Section 3.  Namespaces
    Section 4.  Semantics and Triples
    Section 5.  Case Study: Syntax for Triples 
    Section 6.  Case Study: RDF Schema
    Section 7.  Case Study: RDF Parsers and the Fungibility of Triples
    Section 8.  Case Study: Dublin Core 

  Chapter 5. Classifications and Ontologies
    Section 1.  It's All About Object Relationships 
    Section 2.  The Difference Between Object Relationships and Object Similarities
    Section 3.  Classifications, the Simplest of Ontologies
    Section 4.  Ontologies, Classes with Multiple Parents
    Section 5.  Choosing a Class Model
    Section 6.  Paradoxes
    Section 7.  Class Blending
    Section 8.  Common Pitfalls in Ontology Development
    Section 9.  Case Study: An Upper Level Ontology 
    Section 10.  Case Study: Visualizing Class Relationships 
    Section 11.  Case Study: Bringing Order from Chaos with the Classification of Living Organisms

  Chapter 6. Introspection
    Section 1.  Knowledge of Self
    Section 2.  Data Objects
    Section 3.  How Big Data Uses Introspection 
    Section 4.  Case Study: Timestamping Data 
    Section 5.  Case Study: A Visit to the TripleStore 

  Chapter 7. Data Integration and Software Interoperability
    Section 1.  Another Big Problem for Big Data
    Section 2.  The Standard for Standards
    Section 3.  Standard Trajectories
    Section 4.  Specifications and Standards
    Section 5.  Versioning
    Section 6.  Compliance Issues
    Section 7.  Interfaces to Big Data Resources
    Section 8.  Case Study: Standardizing the Chocolate Teapot

  Chapter 8. Immutability and Immortality
    Section 1.  The Importance of Data that Cannot Change  
    Section 2.  Immutability and Identifiers
    Section 3.  Persistent Data Objects
    Section 4.  Coping with the Data that Data Creates
    Section 5.  Reconciling Identifiers Across Institutions
    Section 6.  Case Study: The Trusted Timestamp
    Section 7.  Case Study: Blockchains and Distributed Ledgers
    Section 8.  Case Study: Zero-Knowledge Reconciliation   

  Chapter 9. Assessing the Adequacy of a Big Data Resource
    Section 1.  Looking at the Data 
    Section 2.  The Minimal Necessary Properties of Big Data 
    Section 3.  Case Study: Utilities for Viewing and Manipulating Very Large Files
    Section 4.  Case Study: Flattened Data 
    Section 5.  Case Study: Data that Comes with Conditions 

  Chapter 10. Measurement
    Section 1.  Accuracy and Precision
    Section 2.  Data Range
    Section 3.  Counting
    Section 4.  Normalizing, and Transforming Your Data
    Section 5.  Reducing Your Data
    Section 6.  Understanding Your Control
    Section 7.  Practical Significance of Measurements
    Section 8.  Case Study: Gene Counting
    Section 9.  Case Study: The Significance of Narrow Data Ranges
    Section 10.  Case Study (Advanced): Fast Fourier Transform
    Section 11.  Case Study (Advanced): Principal Component Analysis

  Chapter 11. Indispensable Tips for Fast and Simple Big Data Analysis
    Section 1.  Speed and Scalability
    Section 2.  Fast Operations, Suitable for Big Data, that Every Computer Supports
    Section 3.  Fast Correlation Methods
    Section 4.  Clustering 
    Section 5.  Methods for Data Persistence (Without Using a Database)
    Section 6.  Back_of_Envelope Computations for Big Data
    Section 7.  Fast Data Retrieval for Lists of any Size 
    Section 8.  Case Study: One-Pass Mean and Standard Deviation
    Section 9.  Case Study: Climbing a Classification
    Section 10.  Pre-computing lookup lists: Google's PageRank
    Section 11.  Case Study: A Database Example 
    Section 12.  NoSQL and other Non-Relational Big Data Databases

  Chapter 12. Finding the Clues in Large Collections of Data
    Section 1.  Denominators 
    Section 2.  Frequency Distributions
    Section 3.  Multimodality
    Section 4.  Outliers and Anomalies
    Section 5.  Case Study: Discarding the Noisiest Frequencies in a Data Signal
    Section 6.  Case Study: Predicting User Preferences
    Section 7.  Case Study: Multimodality in Legacy Data
    Section 8.  Case Study: Big and Small Black Holes

  Chapter 13. Using Random Numbers to Your Big Data Analytic Problems Down to Size
    Section 1.  The Remarkable Utility of (Pseudo)Random Numbers 
    Section 2.  Resampling and Permutating 
    Section 3.  Case Study: Sample Size and Power Estimates
    Section 4.  Monte Carlo Simulations
    Section 5.  Case Study: Monty Hall Problem: Solving What We Cannot Grasp
    Section 6.  Case Study: Frequency of Unlikely String of Occurrences 
    Section 7.  Case Study: The Infamous Birthday Problem
    Section 8.  Case Study: A Bayesian Analysis of Insurance Costs 

  Chapter 14. Special Considerations in Big Data Analysis
    Section 1.  Theory in Search of Data 
    Section 2.  Data in Search of Theory
    Section 3.  Overfitting
    Section 4.  Bigness Bias
    Section 5.  Too Much Data
    Section 6.  Fixing Data
    Section 7.  Data Subsets in Big Data: Neither Additive nor Transitive
    Section 8.  Additional Big Data Pitfalls
    Section 9.  Case Study: Curse of Dimensionality

  Chapter 15. Big Data Failures and How to Avoid (Some of) Them
    Section 1.  Failure is Common
    Section 2.  Failed Standards
    Section 3.  Blaming Complexity
    Section 4.  Perils of Redundancy
    Section 5.  Save Time and Money; Don’t Protect Data that Does not Need Protection
    Section 6.  An Approach to Big Data that May Work For You
    Section 7.  After Failure
    Section 8.  Case Study: Cancer Biomedical Informatics Grid, a Bridge too Far
    Section 9.  Case Study: The Gaussian Copula Function

  Chapter 16. Legalities
    Section 1.  Responsibility for the Accuracy and Legitimacy of Data
    Section 2.  Rights to Create, Use, and Share the Resource
    Section 3.  Copyright and Patent Infringements Incurred by Using Standards
    Section 4.  Protections for Individuals
    Section 5.  Consent
    Section 6.  Unconsented Data
    Section 7.  Good Policies are a Good Policy
    Section 8.  Case Study: The "Inconclusive" Data Analysis
    Section 9.  Case Study: The Havasupai Story
    Section 10.  Case Study: Double-edged Sword of the U.S. Data Quality Act 

  Chapter 17. Data Sharing 
    Section 1.  What Is Data Sharing, and Why Don't We Do More of It?
    Section 2.  Common Complaints
    Section 3.  Case Study: Life on Mars
    Section 4.  Case Study: Who Shares Their Data 
    Section 5.  Case Study: National Patient Identifier

  Chapter 18. Data Reanalysis: Much More Important than Analysis
    Section 1.  First Analysis (Nearly) Always Wrong 
    Section 2.  Why Reanalysis is More Important than Analysis
    Section 3.  Case Study: Reanalysis of Old JADE Collider Data 
    Section 4.  Case Study: Vindication Through Reanalysis 
    Section 5.  Case Study: Finding New Planets from Old Data 

  Chapter 19. Repurposing Big Data
    Section 1.  What is Data Repurposing? 
    Section 2.  Dark Data, Abandoned Data, and Legacy Data 
    Section 3.  Case Study: From Postal Code to Demographic Keystone 
    Section 4.  Case Study: Fingerprints and Data-driven Forensics
    Section 5.  Scientific Inferencing from a Databases of Genetic Sequences
    Section 6.  Case Study: Linking global warming to high-intensity hurricanes
    Section 7.  Case Study: Inferring climate trends with geologic data
    Section 8.  Case Study: Old tidal data, and the iceberg that sank the Titanic
    Section 9.  Case Study: Lunar Orbiter Image Recovery Project
    Section 10.  Case Study: The Cornucopia of the Natural Sciences

  Chapter 20. Societal Issues
    Section 1.  How Big Data Is Perceived by the Public
    Section 2.  Reducing Costs and Increasing Productivity with Big Data
    Section 3.  Public Mistrust
    Section 4.  Saving Us from Ourselves 
    Section 5.  Who is Big Data?
    Section 6.  Hubris and Hyperbole
    Section 7.  Case Study: The Citizen Scientists
    Section 8.  Case Study: 1984, by George Orwell

  




- Jules Berman

No comments: