For those of you fortunate enough to have access to Science Direct, you can download chapters of my book at:
https://www.sciencedirect.com/science/book/9780128156094
TABLE OF CONTENTS Author's Preface to Second Edition Author's Preface to First Edition Chapter 1. Introduction Section 1. Definition of Big Data Section 2. Big Data Versus small data Section 3. Whence Comest Big Data? Section 4. The Most Common Purpose of Big Data is to Produce small data Section 5. Big Data Sits at the Center of the Research Universe Section 6. Case Study: From the Press: Big Claims for Big Data Chapter 2. Providing Structure to Unstructured Data Section 1. Nearly all Data is Unstructured and Unusable in its Raw Form Section 2. Term Extraction Section 3. Autocoding Section 4. Concordances Section 5. Indexing Section 6. Machine Translation Section 7. Case Study: Sorted Lists (Why and Why Not) Section 8. Case Study: Doublet Lists Section 9. Case Study: Ngram Lists Section 10. Case Study: Proximity Searches Using Only a Concordance Section 11. Case Study (Advanced): Burrows Wheeler Transform (BWT) Chapter 3. Identification, Deidentification, and Reidentification Section 1. What are Identifiers? Section 2. Difference Between an Identifier and an Identifier System Section 3. Generating Identifiers Section 4. Really Bad Identifier Methods Section 5. Registered Unique Object Identifiers Section 6. Deidentification Section 7. Reidentification Section 8. Case Study: Data Scrubbing Section 9. Case Study: Identifiers in Image Headers Section 10. Case Study: Hospital Registration Section 11. Case Study: One-Way Hashes Chapter 4. Metadata, Semantics, and Triples Section 1. Metadata Section 2. eXtensible Markup Language Section 3. Namespaces Section 4. Semantics and Triples Section 5. Case Study: Syntax for Triples Section 6. Case Study: RDF Schema Section 7. Case Study: RDF Parsers and the Fungibility of Triples Section 8. Case Study: Dublin Core Chapter 5. Classifications and Ontologies Section 1. It's All About Object Relationships Section 2. The Difference Between Object Relationships and Object Similarities Section 3. Classifications, the Simplest of Ontologies Section 4. Ontologies, Classes with Multiple Parents Section 5. Choosing a Class Model Section 6. Paradoxes Section 7. Class Blending Section 8. Common Pitfalls in Ontology Development Section 9. Case Study: An Upper Level Ontology Section 10. Case Study: Visualizing Class Relationships Section 11. Case Study: Bringing Order from Chaos with the Classification of Living Organisms Chapter 6. Introspection Section 1. Knowledge of Self Section 2. Data Objects Section 3. How Big Data Uses Introspection Section 4. Case Study: Timestamping Data Section 5. Case Study: A Visit to the TripleStore Chapter 7. Data Integration and Software Interoperability Section 1. Another Big Problem for Big Data Section 2. The Standard for Standards Section 3. Standard Trajectories Section 4. Specifications and Standards Section 5. Versioning Section 6. Compliance Issues Section 7. Interfaces to Big Data Resources Section 8. Case Study: Standardizing the Chocolate Teapot Chapter 8. Immutability and Immortality Section 1. The Importance of Data that Cannot Change Section 2. Immutability and Identifiers Section 3. Persistent Data Objects Section 4. Coping with the Data that Data Creates Section 5. Reconciling Identifiers Across Institutions Section 6. Case Study: The Trusted Timestamp Section 7. Case Study: Blockchains and Distributed Ledgers Section 8. Case Study: Zero-Knowledge Reconciliation Chapter 9. Assessing the Adequacy of a Big Data Resource Section 1. Looking at the Data Section 2. The Minimal Necessary Properties of Big Data Section 3. Case Study: Utilities for Viewing and Manipulating Very Large Files Section 4. Case Study: Flattened Data Section 5. Case Study: Data that Comes with Conditions Chapter 10. Measurement Section 1. Accuracy and Precision Section 2. Data Range Section 3. Counting Section 4. Normalizing, and Transforming Your Data Section 5. Reducing Your Data Section 6. Understanding Your Control Section 7. Practical Significance of Measurements Section 8. Case Study: Gene Counting Section 9. Case Study: The Significance of Narrow Data Ranges Section 10. Case Study (Advanced): Fast Fourier Transform Section 11. Case Study (Advanced): Principal Component Analysis Chapter 11. Indispensable Tips for Fast and Simple Big Data Analysis Section 1. Speed and Scalability Section 2. Fast Operations, Suitable for Big Data, that Every Computer Supports Section 3. Fast Correlation Methods Section 4. Clustering Section 5. Methods for Data Persistence (Without Using a Database) Section 6. Back_of_Envelope Computations for Big Data Section 7. Fast Data Retrieval for Lists of any Size Section 8. Case Study: One-Pass Mean and Standard Deviation Section 9. Case Study: Climbing a Classification Section 10. Pre-computing lookup lists: Google's PageRank Section 11. Case Study: A Database Example Section 12. NoSQL and other Non-Relational Big Data Databases Chapter 12. Finding the Clues in Large Collections of Data Section 1. Denominators Section 2. Frequency Distributions Section 3. Multimodality Section 4. Outliers and Anomalies Section 5. Case Study: Discarding the Noisiest Frequencies in a Data Signal Section 6. Case Study: Predicting User Preferences Section 7. Case Study: Multimodality in Legacy Data Section 8. Case Study: Big and Small Black Holes Chapter 13. Using Random Numbers to Your Big Data Analytic Problems Down to Size Section 1. The Remarkable Utility of (Pseudo)Random Numbers Section 2. Resampling and Permutating Section 3. Case Study: Sample Size and Power Estimates Section 4. Monte Carlo Simulations Section 5. Case Study: Monty Hall Problem: Solving What We Cannot Grasp Section 6. Case Study: Frequency of Unlikely String of Occurrences Section 7. Case Study: The Infamous Birthday Problem Section 8. Case Study: A Bayesian Analysis of Insurance Costs Chapter 14. Special Considerations in Big Data Analysis Section 1. Theory in Search of Data Section 2. Data in Search of Theory Section 3. Overfitting Section 4. Bigness Bias Section 5. Too Much Data Section 6. Fixing Data Section 7. Data Subsets in Big Data: Neither Additive nor Transitive Section 8. Additional Big Data Pitfalls Section 9. Case Study: Curse of Dimensionality Chapter 15. Big Data Failures and How to Avoid (Some of) Them Section 1. Failure is Common Section 2. Failed Standards Section 3. Blaming Complexity Section 4. Perils of Redundancy Section 5. Save Time and Money; Don’t Protect Data that Does not Need Protection Section 6. An Approach to Big Data that May Work For You Section 7. After Failure Section 8. Case Study: Cancer Biomedical Informatics Grid, a Bridge too Far Section 9. Case Study: The Gaussian Copula Function Chapter 16. Legalities Section 1. Responsibility for the Accuracy and Legitimacy of Data Section 2. Rights to Create, Use, and Share the Resource Section 3. Copyright and Patent Infringements Incurred by Using Standards Section 4. Protections for Individuals Section 5. Consent Section 6. Unconsented Data Section 7. Good Policies are a Good Policy Section 8. Case Study: The "Inconclusive" Data Analysis Section 9. Case Study: The Havasupai Story Section 10. Case Study: Double-edged Sword of the U.S. Data Quality Act Chapter 17. Data Sharing Section 1. What Is Data Sharing, and Why Don't We Do More of It? Section 2. Common Complaints Section 3. Case Study: Life on Mars Section 4. Case Study: Who Shares Their Data Section 5. Case Study: National Patient Identifier Chapter 18. Data Reanalysis: Much More Important than Analysis Section 1. First Analysis (Nearly) Always Wrong Section 2. Why Reanalysis is More Important than Analysis Section 3. Case Study: Reanalysis of Old JADE Collider Data Section 4. Case Study: Vindication Through Reanalysis Section 5. Case Study: Finding New Planets from Old Data Chapter 19. Repurposing Big Data Section 1. What is Data Repurposing? Section 2. Dark Data, Abandoned Data, and Legacy Data Section 3. Case Study: From Postal Code to Demographic Keystone Section 4. Case Study: Fingerprints and Data-driven Forensics Section 5. Scientific Inferencing from a Databases of Genetic Sequences Section 6. Case Study: Linking global warming to high-intensity hurricanes Section 7. Case Study: Inferring climate trends with geologic data Section 8. Case Study: Old tidal data, and the iceberg that sank the Titanic Section 9. Case Study: Lunar Orbiter Image Recovery Project Section 10. Case Study: The Cornucopia of the Natural Sciences Chapter 20. Societal Issues Section 1. How Big Data Is Perceived by the Public Section 2. Reducing Costs and Increasing Productivity with Big Data Section 3. Public Mistrust Section 4. Saving Us from Ourselves Section 5. Who is Big Data? Section 6. Hubris and Hyperbole Section 7. Case Study: The Citizen Scientists Section 8. Case Study: 1984, by George Orwell
- Jules Berman
No comments:
Post a Comment