Specified Life: Second Edition of Principles and Practice of Big Data now on Science Direct

Saturday, August 4, 2018

Second Edition of Principles and Practice of Big Data now on Science Direct

The Second edition of my book Principles and Practice of Big Data has just been released and is available for purchase at many sites, including Amazon.

For those of you fortunate enough to have access to Science Direct, you can download chapters of my book at:

https://www.sciencedirect.com/science/book/9780128156094

TABLE OF CONTENTS

Author's Preface to Second Edition

Author's Preface to First Edition

Chapter 1. Introduction
Section 1. Definition of Big Data
Section 2. Big Data Versus small data
Section 3. Whence Comest Big Data?
Section 4. The Most Common Purpose of Big Data is to Produce small data
Section 5. Big Data Sits at the Center of the Research Universe
Section 6. Case Study: From the Press: Big Claims for Big Data

Chapter 2. Providing Structure to Unstructured Data
Section 1. Nearly all Data is Unstructured and Unusable in its Raw Form
Section 2. Term Extraction
Section 3. Autocoding
Section 4. Concordances
Section 5. Indexing
Section 6. Machine Translation
Section 7. Case Study: Sorted Lists (Why and Why Not)
Section 8. Case Study: Doublet Lists
Section 9. Case Study: Ngram Lists
Section 10. Case Study: Proximity Searches Using Only a Concordance
Section 11. Case Study (Advanced): Burrows Wheeler Transform (BWT)

Chapter 3. Identification, Deidentification, and Reidentification
Section 1. What are Identifiers?
Section 2. Difference Between an Identifier and an Identifier System
Section 3. Generating Identifiers
Section 4. Really Bad Identifier Methods
Section 5. Registered Unique Object Identifiers
Section 6. Deidentification
Section 7. Reidentification
Section 8. Case Study: Data Scrubbing
Section 9. Case Study: Identifiers in Image Headers
Section 10. Case Study: Hospital Registration
Section 11. Case Study: One-Way Hashes

Chapter 4. Metadata, Semantics, and Triples
Section 1. Metadata
Section 2. eXtensible Markup Language
Section 3. Namespaces
Section 4. Semantics and Triples
Section 5. Case Study: Syntax for Triples
Section 6. Case Study: RDF Schema
Section 7. Case Study: RDF Parsers and the Fungibility of Triples
Section 8. Case Study: Dublin Core

Chapter 5. Classifications and Ontologies
Section 1. It's All About Object Relationships
Section 2. The Difference Between Object Relationships and Object Similarities
Section 3. Classifications, the Simplest of Ontologies
Section 4. Ontologies, Classes with Multiple Parents
Section 5. Choosing a Class Model
Section 6. Paradoxes
Section 7. Class Blending
Section 8. Common Pitfalls in Ontology Development
Section 9. Case Study: An Upper Level Ontology
Section 10. Case Study: Visualizing Class Relationships
Section 11. Case Study: Bringing Order from Chaos with the Classification of Living Organisms

Chapter 6. Introspection
Section 1. Knowledge of Self
Section 2. Data Objects
Section 3. How Big Data Uses Introspection
Section 4. Case Study: Timestamping Data
Section 5. Case Study: A Visit to the TripleStore

Chapter 7. Data Integration and Software Interoperability
Section 1. Another Big Problem for Big Data
Section 2. The Standard for Standards
Section 3. Standard Trajectories
Section 4. Specifications and Standards
Section 5. Versioning
Section 6. Compliance Issues
Section 7. Interfaces to Big Data Resources
Section 8. Case Study: Standardizing the Chocolate Teapot

Chapter 8. Immutability and Immortality
Section 1. The Importance of Data that Cannot Change
Section 2. Immutability and Identifiers
Section 3. Persistent Data Objects
Section 4. Coping with the Data that Data Creates
Section 5. Reconciling Identifiers Across Institutions
Section 6. Case Study: The Trusted Timestamp
Section 7. Case Study: Blockchains and Distributed Ledgers
Section 8. Case Study: Zero-Knowledge Reconciliation

Chapter 9. Assessing the Adequacy of a Big Data Resource
Section 1. Looking at the Data
Section 2. The Minimal Necessary Properties of Big Data
Section 3. Case Study: Utilities for Viewing and Manipulating Very Large Files
Section 4. Case Study: Flattened Data
Section 5. Case Study: Data that Comes with Conditions

Chapter 10. Measurement
Section 1. Accuracy and Precision
Section 2. Data Range
Section 3. Counting
Section 4. Normalizing, and Transforming Your Data
Section 5. Reducing Your Data
Section 6. Understanding Your Control
Section 7. Practical Significance of Measurements
Section 8. Case Study: Gene Counting
Section 9. Case Study: The Significance of Narrow Data Ranges
Section 10. Case Study (Advanced): Fast Fourier Transform
Section 11. Case Study (Advanced): Principal Component Analysis

Chapter 11. Indispensable Tips for Fast and Simple Big Data Analysis
Section 1. Speed and Scalability
Section 2. Fast Operations, Suitable for Big Data, that Every Computer Supports
Section 3. Fast Correlation Methods
Section 4. Clustering
Section 5. Methods for Data Persistence (Without Using a Database)
Section 6. Back_of_Envelope Computations for Big Data
Section 7. Fast Data Retrieval for Lists of any Size
Section 8. Case Study: One-Pass Mean and Standard Deviation
Section 9. Case Study: Climbing a Classification
Section 10. Pre-computing lookup lists: Google's PageRank
Section 11. Case Study: A Database Example
Section 12. NoSQL and other Non-Relational Big Data Databases

Chapter 12. Finding the Clues in Large Collections of Data
Section 1. Denominators
Section 2. Frequency Distributions
Section 3. Multimodality
Section 4. Outliers and Anomalies
Section 5. Case Study: Discarding the Noisiest Frequencies in a Data Signal
Section 6. Case Study: Predicting User Preferences
Section 7. Case Study: Multimodality in Legacy Data
Section 8. Case Study: Big and Small Black Holes

Chapter 13. Using Random Numbers to Your Big Data Analytic Problems Down to Size
Section 1. The Remarkable Utility of (Pseudo)Random Numbers
Section 2. Resampling and Permutating
Section 3. Case Study: Sample Size and Power Estimates
Section 4. Monte Carlo Simulations
Section 5. Case Study: Monty Hall Problem: Solving What We Cannot Grasp
Section 6. Case Study: Frequency of Unlikely String of Occurrences
Section 7. Case Study: The Infamous Birthday Problem
Section 8. Case Study: A Bayesian Analysis of Insurance Costs

Chapter 14. Special Considerations in Big Data Analysis
Section 1. Theory in Search of Data
Section 2. Data in Search of Theory
Section 3. Overfitting
Section 4. Bigness Bias
Section 5. Too Much Data
Section 6. Fixing Data
Section 7. Data Subsets in Big Data: Neither Additive nor Transitive
Section 8. Additional Big Data Pitfalls
Section 9. Case Study: Curse of Dimensionality

Chapter 15. Big Data Failures and How to Avoid (Some of) Them
Section 1. Failure is Common
Section 2. Failed Standards
Section 3. Blaming Complexity
Section 4. Perils of Redundancy
Section 5. Save Time and Money; Don’t Protect Data that Does not Need Protection
Section 6. An Approach to Big Data that May Work For You
Section 7. After Failure
Section 8. Case Study: Cancer Biomedical Informatics Grid, a Bridge too Far
Section 9. Case Study: The Gaussian Copula Function

Chapter 16. Legalities
Section 1. Responsibility for the Accuracy and Legitimacy of Data
Section 2. Rights to Create, Use, and Share the Resource
Section 3. Copyright and Patent Infringements Incurred by Using Standards
Section 4. Protections for Individuals
Section 5. Consent
Section 6. Unconsented Data
Section 7. Good Policies are a Good Policy
Section 8. Case Study: The "Inconclusive" Data Analysis
Section 9. Case Study: The Havasupai Story
Section 10. Case Study: Double-edged Sword of the U.S. Data Quality Act

Chapter 17. Data Sharing
Section 1. What Is Data Sharing, and Why Don't We Do More of It?
Section 2. Common Complaints
Section 3. Case Study: Life on Mars
Section 4. Case Study: Who Shares Their Data
Section 5. Case Study: National Patient Identifier

Chapter 18. Data Reanalysis: Much More Important than Analysis
Section 1. First Analysis (Nearly) Always Wrong
Section 2. Why Reanalysis is More Important than Analysis
Section 3. Case Study: Reanalysis of Old JADE Collider Data
Section 4. Case Study: Vindication Through Reanalysis
Section 5. Case Study: Finding New Planets from Old Data

Chapter 19. Repurposing Big Data
Section 1. What is Data Repurposing?
Section 2. Dark Data, Abandoned Data, and Legacy Data
Section 3. Case Study: From Postal Code to Demographic Keystone
Section 4. Case Study: Fingerprints and Data-driven Forensics
Section 5. Scientific Inferencing from a Databases of Genetic Sequences
Section 6. Case Study: Linking global warming to high-intensity hurricanes
Section 7. Case Study: Inferring climate trends with geologic data
Section 8. Case Study: Old tidal data, and the iceberg that sank the Titanic
Section 9. Case Study: Lunar Orbiter Image Recovery Project
Section 10. Case Study: The Cornucopia of the Natural Sciences

Chapter 20. Societal Issues
Section 1. How Big Data Is Perceived by the Public
Section 2. Reducing Costs and Increasing Productivity with Big Data
Section 3. Public Mistrust
Section 4. Saving Us from Ourselves
Section 5. Who is Big Data?
Section 6. Hubris and Hyperbole
Section 7. Case Study: The Citizen Scientists
Section 8. Case Study: 1984, by George Orwell

- Jules Berman

Saturday, August 4, 2018

Second Edition of Principles and Practice of Big Data now on Science Direct

No comments: