library / lib0e87545fd3f6b74b
Big Data_ A Very Short Introduction (Very Short Introductions)
In a sentence
A concise introduction to what big data is, how it is collected, stored, and analysed, and how it is transforming medicine, business, security, and society.
Big Data: A Very Short Introduction demystifies one of the defining technological forces of our age, charting how data evolved from notched bones and census tallies to the exabyte-scale streams of the digital universe. Dawn Holmes explains, in plain language and with clear diagrams, what distinguishes big data from traditional 'small data'—volume, variety, velocity, and veracity—and how new storage architectures (Hadoop, NoSQL, the Cloud) and analytic techniques (clustering, classification, MapReduce, Bloom filters, PageRank, recommender systems) extract useful information from massive, messy datasets. Through vivid case studies—Google Flu Trends, the Ebola and Nepal disaster responses, Amazon, Netflix, the Snowden leaks, WikiLeaks, and the rise of smart homes and cities—the book shows both the immense promise and the real perils of a data-driven world, ending with a call to use big data's power responsibly.
The story it tells the reader
The reader A curious general reader who wants to understand what big data really is and how it is changing their everyday life and the wider world.
External problem
Big data is everywhere but explanations are either superficial or buried in mathematical textbooks aimed at graduate students.
Internal problem
The reader feels swamped, intimidated, and uncertain about a powerful technology shaping their privacy, health, and work.
Philosophical problem
Such a transformative force should not be opaque to ordinary people who have a stake in how it is used.
The plan
- Learn how data evolved and what distinguishes big data via the four V's.
- Understand how big data is stored using distributed file systems, NoSQL, and the Cloud.
- See how analytic techniques mine raw data into useful information.
- Explore real applications in medicine, business, security, and society.
- Reflect on the privacy, security, and ethical responsibilities involved.
Success
- The reader can confidently define big data, recognize its techniques in daily life, and critically evaluate its benefits and risks.
- They make more informed choices about their own data, privacy, and the technologies they adopt.
At stake
- The reader remains mystified by big data and passively subject to its uses and abuses.
- They overlook the privacy, security, and ethical stakes of a rapidly data-driven world.
Model of the world · 9 constructs · 11 relations
A causal framework describing how data-generating conditions and design choices (storage architecture, analytic technique, data quality, security measures) drive intermediate processing states (information extraction, predictive accuracy) that produce outcomes such as decision quality, organizational value, and privacy/security risk.
Design levers
Intermediate states & behaviors
Outcomes
- Distributed Storage Architecture
- Analytic Technique Application
- Data Security and Privacy Measures
- Useful Information Extraction
- Predictive Model Accuracy
- Privacy and Security Risk
- Decision Quality and Organizational Value
Design levers
- Distributed Storage Architecture
- Analytic Technique Application
- Data Security and Privacy Measures
Intermediate states & behaviors
- Useful Information Extraction
- Predictive Model Accuracy
Outcomes
- Privacy and Security Risk
- Decision Quality and Organizational Value
Moderators / context: Data Volume, Velocity, and Variety · Data Veracity (Quality)
Data Volume, Velocity, and Varietycontextual condition
The scale, speed of generation, and heterogeneity (structured, semi-structured, unstructured) of data produced across search engines, sensors, social media, and other digital sources that together characterize big data.
Data Veracity (Quality)contextual condition
The accuracy, reliability, and trustworthiness of collected data, recognizing that digital-age data is often imprecise, uncertain, biased, or simply untrue and requires pre-processing for consistency.
Distributed Storage Architecturedesign lever
The design choice of scalable storage and management systems such as Hadoop distributed file systems, NoSQL databases, and Cloud infrastructure that enable horizontal scalability and fault tolerance for big data.
Analytic Technique Applicationdesign lever
The application of data mining and machine learning methods such as clustering, classification, MapReduce, Bloom filters, PageRank, and recommender algorithms to discover patterns and extract knowledge from big data.
Useful Information Extractionbehavioral pattern
The intermediate state in which raw, often unstructured data is transformed into meaningful, valuable information and patterns that can inform understanding, prediction, and action.
Predictive Model Accuracybehavioral pattern
The degree to which models built from big data correctly forecast outcomes, sensitive to model construction issues such as over-fitting, spurious correlation, and failure to update for changing conditions.
Data Security and Privacy Measuresdesign lever
The protective controls such as encryption, firewalls, access authentication, and anonymization deployed to safeguard data from theft, tampering, hacking, and unauthorized disclosure.
Decision Quality and Organizational Valueoutcome metric
The downstream benefits of big data including improved business decisions, targeted marketing, better patient care, cost reduction, scientific discovery, and societal efficiency gains in smart homes and cities.
Privacy and Security Riskoutcome metric
The adverse outcome of exposure to data theft, breaches, surveillance, identity theft, and loss of personal privacy arising from large-scale collection and storage of personal data.
How they connect
- data volume velocity variety → influences information extraction
- storage architecture → predicts information extraction
- analytic technique → predicts information extraction
- information extraction → influences predictive accuracy
- data veracity → moderates predictive accuracy
- data volume velocity variety → influences predictive accuracy
- predictive accuracy → predicts decision and value outcomes
- information extraction → predicts decision and value outcomes
- data volume velocity variety → predicts privacy security risk
- security measures − moderates privacy security risk
- storage architecture → influences privacy security risk
Frameworks & instruments in this book
- Big data is only useful if useful information can be extracted from it.
- Correlation does not imply causation, and human interpretation is needed to judge which patterns matter.
- All models are wrong, but some are useful.
- Distributed, redundant storage is necessary because system failure is inevitable at scale.
- With great data power comes a moral responsibility to prevent its abuse.
Several of these are operationalized as tools in the People Analytics Toolbox.
Topics
- ai
- applied statistics
Related in the library
- Networks_ A Very Short Introduction (Very Short Introductions)Guido Caldarelli & Michele CatanzaroSystems · Statistics
- People Analytics & Text Mining with RCedric Ng Mong ShenSystems · Statistics
- People Analytics For DummiesMike WestSystems · Statistics
- Practical Statistics for Data ScientistsPeter Bruce, Andrew Bruce & Peter GedeckSystems · Statistics
- Predictive HR AnalyticsCedric Ng Mong ShenSystems · Statistics
- Predictive HR Analytics, Text Mining & Organizational Network Analysis_ with ExcelCedric Ng Mong ShenSystems · Statistics