library / libf9552b4609cadbd3
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Ralph Kimball, Margy Ross · 2013
In a sentence
The definitive guide to dimensional modeling for data warehousing and business intelligence, teaching practitioners how to design simple, fast, business-driven analytic databases through case-study-driven techniques.
The Data Warehouse Toolkit, Third Edition, by Ralph Kimball and Margy Ross, is the industry's seminal reference for dimensional modeling. Drawing on decades of real-world consulting experience, the authors present a comprehensive, technique-centric methodology for building data warehouse and business intelligence (DW/BI) systems that prioritize two simultaneous goals: business user understandability and fast query performance. Organized around vivid business vignettes spanning retail, inventory, procurement, order management, accounting, CRM, HR, finance, telecom, transportation, education, healthcare, e-commerce, and insurance, the book develops more than 75 dimensional modeling techniques—fact tables and dimension tables, the enterprise data warehouse bus matrix with conformed dimensions, slowly changing dimension techniques, bridge tables, and accumulating snapshots—while dispelling persistent myths about dimensional design. It then extends into the full Kimball Lifecycle, the 34 subsystems of ETL, and emerging big data analytics best practices. Whether you are a designer, modeler, ETL developer, or DW/BI manager, this toolkit equips you to marshal an organization's data and deliver it to business users for better, fact-based decision making.
The four lenses
- Science
- Statistics
- Systems
- Strategy
The model
A causal/framework model expressing how design levers (dimensional modeling discipline, conformed dimensions, atomic grain, ETL subsystem quality) and contextual conditions (business requirements alignment, business sponsorship) drive psychological and behavioral states (user understandability, query performance, agile development) which in turn produce the ultimate outcomes of business acceptance and improved fact-based decision making.
Dimensional Modeling Disciplinedesign lever
The rigorous application of the four-step dimensional design process (selecting the business process, declaring the atomic grain, identifying dimensions, and identifying facts) along with adherence to dimensional best practices and avoidance of common modeling mistakes.
Atomic Grain Data Availabilitydesign lever
The provision of the most detailed, lowest-level atomic measurement data in fact tables, capturing each physical measurement event as a single row, rather than only pre-aggregated summaries which limit dimensionality and flexibility.
Conformed Dimensions and Bus Architecturedesign lever
The use of standardized, shared dimensions with identical keys, attribute names, definitions, and values reused across business-process fact tables, documented and planned via the enterprise data warehouse bus matrix to enable integration via drill-across.
Verbose Descriptive Dimension Attributesdesign lever
The richness, completeness, and legibility of textual descriptive attributes in dimension tables that serve as query constraints, groupings, and report labels, including decoded values in place of cryptic operational codes.
Slowly Changing Dimension Change Handlingdesign lever
The deliberate, business-data-steward-driven selection and correct application of techniques (types 0 through 7) for tracking or overwriting dimension attribute changes over time to accurately represent history as required by the business.
ETL Subsystem Qualitydesign lever
The robustness and completeness of the extract, transformation, and load architecture across its 34 subsystems covering extracting, cleaning and conforming, delivering, and managing the ETL environment.
Business Requirements Alignmentcontextual condition
The degree to which the DW/BI design is driven by gathered business needs, key performance indicators, and decision-making processes rather than solely by source data or specific report requests.
Strong Executive Business Sponsorshipcontextual condition
The presence of an influential, accessible business sponsor with a compelling business motivation who champions the DW/BI initiative and drives organizational consensus and data governance.
User Understandabilitypsychological state
The extent to which business users find the data structures, labels, and dimensional schema intuitive, recognizable, and easy to navigate, mirroring their business vocabulary and thought processes.
Query Performancepsychological state
The speed and efficiency with which the database and BI tools return results to users, enabled by simple symmetric schemas, surrogate keys, aggregates, indexing, and partitioning.
Agile Incremental Development Velocitybehavioral pattern
The pace and manageability with which new business processes can be delivered by reusing conformed dimensions and tackling iterative, realistically scoped increments rather than galactic multiyear projects.
Data Integration and Consistencybehavioral pattern
The degree to which information from separate business processes and source systems can be combined and consistently interpreted across the enterprise via drill-across on shared conformed dimensions and facts.
Business Acceptance and Adoptionoutcome metric
The active embrace and ongoing use of the DW/BI environment by the business community, deemed the ultimate measure of DW/BI success.
Improved Fact-Based Decision Makingoutcome metric
The ultimate organizational outcome of better, more informed decisions and business impact/value attributable to the DW/BI system serving as the authoritative decision support foundation.
How they connect
- dimensional modeling discipline → influences user understandability
- dimensional modeling discipline → influences query performance
- atomic grain data → influences user understandability
- verbose dimension attributes → predicts user understandability
- conformed dimensions → predicts data integration consistency
- conformed dimensions → influences agile incremental development
- scd change handling → influences data integration consistency
- etl subsystem quality → influences data integration consistency
- etl subsystem quality → influences query performance
- user understandability → predicts business acceptance
- query performance → predicts business acceptance
- data integration consistency → influences business acceptance
- business acceptance → predicts improved decision making
- business requirements alignment → moderates business acceptance
- business sponsorship → moderates data integration consistency
- business requirements alignment → influences dimensional modeling discipline
A candidate measure
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling — derived measurement candidates
Dimensional Modeling Discipline
Percent of fact tables with documented grain; Count of grain violations per schema; Technique-checklist compliance score
self-report suitability: medium
Atomic Grain Data Availability
Grain level gap (source vs. warehouse); Proportion of fact tables at atomic grain
self-report suitability: low
Conformed Dimensions and Bus Architecture
Number of conformed dimensions; Percent of fact tables sharing conformed dimensions; Bus matrix coverage ratio
self-report suitability: medium
Verbose Descriptive Dimension Attributes
Average descriptive attributes per dimension; Decode coverage percentage; Attribute value completeness rate
self-report suitability: low
Slowly Changing Dimension Change Handling
Percent of attributes with assigned SCD type; Historical association accuracy rate
self-report suitability: medium
ETL Subsystem Quality
Subsystem coverage count (of 34); Error event rates; ETL job success/restart rates
self-report suitability: medium
Business Requirements Alignment
Requirements-to-design traceability percentage; Stakeholder alignment ratings
self-report suitability: high
Strong Executive Business Sponsorship
Readiness assessment sponsorship rating; Frequency of sponsor engagement
self-report suitability: high
User Understandability
User understandability survey score; Training hours per user; Navigation support ticket rate
self-report suitability: high
Query Performance
Average query response time; 95th percentile response time; Throughput (queries/hour)
self-report suitability: medium
Agile Incremental Development Velocity
Time-to-market per new business process; Dimension reuse ratio per increment
self-report suitability: medium
Data Integration and Consistency
Reconciliation success rate; Number of stovepipe inconsistencies detected
self-report suitability: medium
Business Acceptance and Adoption
Active user count and trend; Query/report volume; User satisfaction index
self-report suitability: high
Improved Fact-Based Decision Making
Documented decisions referencing warehouse data; Revenue/profit improvements attributable to DW/BI
self-report suitability: medium
The story
The reader A DW/BI designer, modeler, ETL developer, or manager who wants to deliver analytic data that business users can easily understand and query quickly to make better, fact-based decisions.
External problem
Organizations collect tons of data scattered across disparate operational systems but can't access, integrate, or analyze it consistently or quickly.
Internal problem
Practitioners feel overwhelmed by complexity and uncertain whether their designs will withstand unpredictable business questions, fearing the dreaded 'it depends' spaghetti-mess of databases.
Philosophical problem
It's just plain wrong to build elegant technical solutions that the business community can't accept and use; a data warehouse exists to serve readers (business users), not to showcase technology.
The plan
- Focus first on business requirements and the most critical, feasible business process.
- Apply the four-step dimensional design process: select the business process, declare the grain, identify dimensions, identify facts.
- Build fact tables at the lowest atomic grain with verbose dimension attributes.
- Integrate processes using conformed dimensions on the enterprise data warehouse bus architecture.
- Handle attribute changes with appropriate slowly changing dimension techniques.
- Design and operate the ETL system across its 34 subsystems.
- Follow the Kimball Lifecycle through deployment, maintenance, and iterative growth.
Success
- Business professionals make better decisions and generate payback on DW/BI investments.
- A coherent, integrated DW/BI environment delivers consistent, understandable, and fast information across the enterprise.
- Development becomes more agile as conformed dimensions are reused, shrinking time-to-market for new business processes.
- The business community embraces the system as the simple and fast source for actionable information.
At stake
- Isolated stovepipe data sets perpetuate incompatible, irreconcilable views of organizational performance.
- Endless meetings arguing about who has the right numbers rather than making decisions.
- Premature summarization traps users behind analytic brick walls when they need detail.
- The business abandons the unaccepted DW/BI system, wasting millions invested to design, build, and staff it.
Related in the library