The Kindle edition is available now, and Amazon has a "look-inside" option on their book page. The print version will be available in a week or two, and Amazon is taking pre-orders Here is the complete Table of Contents:
Acknowledgments xi Author Biography xiii Preface xv Introduction xix 1. Providing Structure to Unstructured Data Background 1 Machine Translation 2 Autocoding 4 Indexing 9 Term Extraction 11 2. Identification, Deidentification, and Reidentification Background 15 Features of an Identifier System 17 Registered Unique Object Identifiers 18 Really Bad Identifier Methods 22 Embedding Information in an Identifier: Not Recommended 24 One-Way Hashes 25 Use Case: Hospital Registration 26 Deidentification 28 Data Scrubbing 30 Reidentification 31 Lessons Learned 32 3. Ontologies and Semantics Background 35 Classifications, the Simplest of Ontologies 36 Ontologies, Classes with Multiple Parents 39 Choosing a Class Model 40 Introduction to Resource Description Framework Schema 44 Common Pitfalls in Ontology Development 46 4. Introspection Background 49 Knowledge of Self 50 eXtensible Markup Language 52 Introduction to Meaning 54 Namespaces and the Aggregation of Meaningful Assertions 55 Resource Description Framework Triples 56 Reflection 59 Use Case: Trusted Time Stamp 59 Summary 60 5. Data Integration and Software Interoperability Background 63 The Committee to Survey Standards 64 Standard Trajectory 65 Specifications and Standards 69 Versioning 71 Compliance Issues 73 Interfaces to Big Data Resources 74 6. Immutability and Immortality Background 77 Immutability and Identifiers 78 Data Objects 80 Legacy Data 82 Data Born from Data 83 Reconciling Identifiers across Institutions 84 Zero-Knowledge Reconciliation 86 The Curator’s Burden 87 7. Measurement Background 89 Counting 90 Gene Counting 93 Dealing with Negations 93 Understanding Your Control 95 Practical Significance of Measurements 96 Obsessive-Compulsive Disorder: The Mark of a Great Data Manager 97 8. Simple but Powerful Big Data Techniques Background 99 Look at the Data 100 Data Range 110 Denominator 112 Frequency Distributions 115 Mean and Standard Deviation 119 Estimation-Only Analyses 122 Use Case: Watching Data Trends with Google Ngrams 123 Use Case: Estimating Movie Preferences 126 9. Analysis Background 129 Analytic Tasks 130 Clustering, Classifying, Recommending, and Modeling 130 Data Reduction 134 Normalizing and Adjusting Data 137 Big Data Software: Speed and Scalability 139 Find Relationships, Not Similarities 141 10. Special Considerations in Big Data Analysis Background 145 Theory in Search of Data 146 Data in Search of a Theory 146 Overfitting 148 Bigness Bias 148 Too Much Data 151 Fixing Data 152 Data Subsets in Big Data: Neither Additive nor Transitive 153 Additional Big Data Pitfalls 154 11. Stepwise Approach to Big Data Analysis Background 157 Step 1. A Question Is Formulated 158 Step 2. Resource Evaluation 158 Step 3. A Question Is Reformulated 159 Step 4. Query Output Adequacy 160 Step 5. Data Description 161 Step 6. Data Reduction 161 Step 7. Algorithms Are Selected, If Absolutely Necessary 162 Step 8. Results Are Reviewed and Conclusions Are Asserted 164 Step 9. Conclusions Are Examined and Subjected to Validation 164 12. Failure Background 167 Failure Is Common 168 Failed Standards 169 Complexity 172 When Does Complexity Help? 173 When Redundancy Fails 174 Save Money; Don’t Protect Harmless Information 176 After Failure 177 Use Case: Cancer Biomedical Informatics Grid, a Bridge Too Far 178 13. Legalities Background 183 Responsibility for the Accuracy and Legitimacy of Contained Data 184 Rights to Create, Use, and Share the Resource 185 Copyright and Patent Infringements Incurred by Using Standards 187 Protections for Individuals 188 Consent 190 Unconsented Data 194 Good Policies Are a Good Policy 197 Use Case: The Havasupai Story 198 14. Societal Issues Background 201 How Big Data Is Perceived 201 The Necessity of Data Sharing, Even When It Seems Irrelevant 204 Reducing Costs and Increasing Productivity with Big Data 208 Public Mistrust 210 Saving Us from Ourselves 211 Hubris and Hyperbole 213 15. The Future Background 217 Last Words 226 Glossary 229 References 247 Index 257In the next few days, I'll be posting short excerpts from the book, along with commentary. Best,
Jules Berman
key words: big data, heterogeneous data, complex datasets, Jules J. Berman, Ph.D., M.D., immutability, introspection, identifiers, de-identification, deidentification, confidentiality, privacy, massive data, lotsa data