library / lib4c98f8e13319104c
Big Data: A Revolution That Will Transform How We Live, Work, and Think
Viktor Mayer-Schönberger, Kenneth Cukier · 2013
In a sentence
Big data—the ability to analyze vast quantities of information rather than samples—is transforming how we understand the world by privileging correlation over causation, scale over exactitude, and prediction over explanation.
Big Data: A Revolution That Will Transform How We Live, Work, and Think argues that we are entering a new era in which the sheer scale of available data changes not just what we know but how we know it. Mayer-Schönberger and Cukier show how using all the data (N=all) instead of samples, accepting messiness instead of demanding precision, and embracing correlation instead of obsessing over causation unlocks enormous new economic and social value—from predicting flu outbreaks and airfare prices to preventing infrastructure failures and saving premature babies. Through vivid examples (Google Flu Trends, Farecast, Amazon recommendations, exploding manholes), the authors explain the mechanics of datafication—rendering ever more aspects of reality into quantifiable, analyzable form—and reveal how data's value increasingly lies in its reuse and option value. But they also confront the dark side: the erosion of privacy, the menace of punishing people for predicted (not actual) behavior, and the danger of a 'dictatorship of data.' The book offers a framework for governance—shifting from individual consent to data-user accountability, safeguarding human agency, and creating a new profession of 'algorithmists.' It is at once an enthusiastic primer, a business strategy guide, and a cautionary meditation on humanity's place amid a quantifiable world.
The four lenses
- Science
- Statistics
- Systems
- Strategy
The model
A causal-framework model expressing how design levers (using all data, accepting messiness, prioritizing correlation, datafication) and a big-data mindset drive psychological and behavioral states (data reuse, prediction-based decision-making) that produce outcomes (economic value, predictive accuracy) while also generating risks (privacy erosion, loss of human agency, dictatorship of data) that governance mechanisms moderate.
Using All the Data (N=all)design lever
The practice of analyzing the comprehensive or near-comprehensive dataset relating to a phenomenon rather than relying on a statistical sample, enabling granular insights into subgroups and outliers that sampling cannot reveal.
Embracing Messiness (Imprecision Tolerance)design lever
The willingness to accept inexactitude, inconsistency, and lower-quality data in exchange for far larger volumes, trading micro-level accuracy for macro-level insight and the ability to capture unstructured information.
Prioritizing Correlation Over Causationdesign lever
The analytic orientation toward identifying statistical associations and useful proxies (knowing what) rather than insisting on understanding underlying causal mechanisms (knowing why), enabling faster and cheaper insights and predictions.
Dataficationdesign lever
The process of rendering aspects of the world—location, sentiment, posture, relationships, behaviors—into quantified, tabulated, analyzable data formats, distinct from mere digitization, thereby unlocking latent informational value for new uses.
Big-Data Mindsetpsychological state
The cognitive orientation that recognizes latent and option value in data, imagines novel secondary uses, and frees itself from conventional thinking about what is feasible, often held by creative outsiders rather than domain incumbents.
Data Reuse and Recombinationbehavioral pattern
The behavioral pattern of applying collected data to multiple purposes beyond its primary use—through basic reuse, merging datasets, extensibility, and capturing data exhaust—thereby releasing data's latent option value.
Prediction-Based Decision-Makingbehavioral pattern
The behavioral shift toward augmenting or overruling human judgment with data-driven predictions and correlations, replacing intuition and subject-expertise with statistical models in operational and managerial decisions.
Economic Value Creationoutcome metric
The outcome whereby big data becomes a vital economic input and corporate asset, generating new goods, services, business models, productivity gains, and competitive advantage for those who hold and analyze data effectively.
Predictive Accuracy and Insightoutcome metric
The outcome of improved ability to forecast events, detect anomalies, identify trends, and prevent problems—such as flu spread, equipment failure, infection onset, or fire risk—through large-scale correlational analysis.
Privacy Erosionoutcome metric
The risk outcome whereby the scale and reuse of personal data, combined with the failure of anonymization, notice-and-consent, and opting out, exposes individuals to surveillance and re-identification.
Loss of Human Agency (Predictive Punishment)outcome metric
The risk outcome whereby big-data predictions of future behavior are used to judge and punish individuals for propensities rather than actions, negating free will, individual responsibility, and the presumption of innocence.
Dictatorship of Dataoutcome metric
The risk outcome of fetishizing data and predictions—becoming mindlessly bound by analytic output, collecting data for its own sake, or attributing undeserved truth to figures—leading to misuse and impoverished judgment, as exemplified by McNamara's body counts.
Governance Safeguardscontextual condition
The contextual mechanisms—shifting privacy from consent to data-user accountability, protecting human agency, employing algorithmists, and applying antitrust-style regulation—designed to contain big data's risks while enabling its benefits.
How they connect
- use all data → predicts predictive accuracy
- embrace messiness → influences use all data
- prioritize correlation → predicts prediction decision making
- datafication → predicts data reuse
- big data mindset → moderates data reuse
- data reuse → predicts economic value creation
- prediction decision making → predicts economic value creation
- prediction decision making → predicts predictive accuracy
- data reuse → predicts privacy erosion
- prediction decision making → predicts loss of human agency
- prediction decision making → predicts dictatorship of data
- governance safeguards − moderates privacy erosion
- governance safeguards − moderates loss of human agency
- governance safeguards − moderates dictatorship of data
A candidate measure
Big Data: A Revolution That Will Transform How We Live, Work, and Think — derived measurement candidates
Using All the Data (N=all)
ratio of analyzed to total available data; number of subcategories analyzable; frequency of sampling vs. full-dataset analysis
self-report suitability: low
Embracing Messiness (Imprecision Tolerance)
proportion of unstructured data used; stated error-tolerance thresholds; adoption of noSQL/Hadoop tools
self-report suitability: medium
Prioritizing Correlation Over Causation
share of correlational vs. experimental methods; decision rationales citing 'what' not 'why'; proxy usage frequency
self-report suitability: medium
Datafication
count of datafied domains; volume of quantified records; sensor coverage
self-report suitability: low
Big-Data Mindset
count of new data-reuse ideas; diversity of secondary uses proposed; innovation outputs
self-report suitability: medium
Data Reuse and Recombination
number of distinct uses per dataset; count of dataset merges; revenue attributable to reuse
self-report suitability: medium
Prediction-Based Decision-Making
proportion of decisions model-driven; degree of automation; expert-vs-model override rates
self-report suitability: medium
Economic Value Creation
data-product revenue; productivity differentials (e.g., 6%); market-vs-book value gap
self-report suitability: low
Predictive Accuracy and Insight
prediction hit rate; error margin; lead-time of warning
self-report suitability: none
Privacy Erosion
re-identification incident counts; volume of personal data aggregated; perceived privacy-loss surveys
self-report suitability: medium
Loss of Human Agency (Predictive Punishment)
presence of propensity-based punishment policies; prevalence of pre-crime interventions; share of decisions based on predicted vs. actual acts
self-report suitability: low
Dictatorship of Data
instances of metric fetishism; decisions made despite known data flaws; absence of data-quality scrutiny
self-report suitability: low
Governance Safeguards
presence of accountability rules; openness/certification/disprovability requirements; number of algorithmists; differential-privacy adoption
self-report suitability: low
The story
The reader A curious manager, technologist, policymaker, or citizen who wants to understand and harness the transformative power of big data while navigating its risks.
External problem
Overwhelming volumes of data are reshaping business, science, and society, yet most people lack a framework for understanding what big data is and how to extract its value.
Internal problem
They feel whiplashed by information overload, uncertain whether big data is hype or revolution, and anxious about its threats to privacy and freedom.
Philosophical problem
It is wrong to cling to outdated assumptions of information scarcity, exactitude, and causality when a new paradigm of abundance, messiness, and correlation offers deeper understanding.
The plan
- Recognize the three shifts: use more data (N=all), embrace messiness, and favor correlation.
- Adopt a big-data mindset: see latent value in data and imagine novel reuses.
- Position yourself or your organization within the data value chain (data, skills, or ideas).
- Collect data with extensibility and option value in mind.
- Establish governance through accountability, protecting human agency, and expert oversight.
Success
- You extract new value and insight from data others discard or overlook.
- You make faster, better-informed decisions by letting data speak.
- You anticipate and prevent problems before they occur.
- You help build governance that captures big data's benefits while safeguarding privacy and freedom.
At stake
- You remain trapped in small-data thinking and miss enormous value.
- Competitors who master data outpace and displace you.
- Society drifts into a dictatorship of data, eroding privacy, free will, and justice.
- Predictive systems punish people for what they might do rather than what they have done.