peopleanalyst

library / libf9552b4609cadbd3

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling

Ralph Kimball, Margy Ross · 2013

In a sentence

The definitive guide to dimensional modeling for data warehousing and business intelligence, teaching practitioners how to design simple, fast, business-driven analytic databases through case-study-driven techniques.

The Data Warehouse Toolkit, Third Edition, by Ralph Kimball and Margy Ross, is the industry's seminal reference for dimensional modeling. Drawing on decades of real-world consulting experience, the authors present a comprehensive, technique-centric methodology for building data warehouse and business intelligence (DW/BI) systems that prioritize two simultaneous goals: business user understandability and fast query performance. Organized around vivid business vignettes spanning retail, inventory, procurement, order management, accounting, CRM, HR, finance, telecom, transportation, education, healthcare, e-commerce, and insurance, the book develops more than 75 dimensional modeling techniques—fact tables and dimension tables, the enterprise data warehouse bus matrix with conformed dimensions, slowly changing dimension techniques, bridge tables, and accumulating snapshots—while dispelling persistent myths about dimensional design. It then extends into the full Kimball Lifecycle, the 34 subsystems of ETL, and emerging big data analytics best practices. Whether you are a designer, modeler, ETL developer, or DW/BI manager, this toolkit equips you to marshal an organization's data and deliver it to business users for better, fact-based decision making.

The four lenses

  • Science
  • Statistics
  • Systems
  • Strategy

The model

A causal/framework model expressing how design levers (dimensional modeling discipline, conformed dimensions, atomic grain, ETL subsystem quality) and contextual conditions (business requirements alignment, business sponsorship) drive psychological and behavioral states (user understandability, query performance, agile development) which in turn produce the ultimate outcomes of business acceptance and improved fact-based decision making.

Dimensional Modeling Disciplinedesign lever

The rigorous application of the four-step dimensional design process (selecting the business process, declaring the atomic grain, identifying dimensions, and identifying facts) along with adherence to dimensional best practices and avoidance of common modeling mistakes.

Atomic Grain Data Availabilitydesign lever

The provision of the most detailed, lowest-level atomic measurement data in fact tables, capturing each physical measurement event as a single row, rather than only pre-aggregated summaries which limit dimensionality and flexibility.

Conformed Dimensions and Bus Architecturedesign lever

The use of standardized, shared dimensions with identical keys, attribute names, definitions, and values reused across business-process fact tables, documented and planned via the enterprise data warehouse bus matrix to enable integration via drill-across.

Verbose Descriptive Dimension Attributesdesign lever

The richness, completeness, and legibility of textual descriptive attributes in dimension tables that serve as query constraints, groupings, and report labels, including decoded values in place of cryptic operational codes.

Slowly Changing Dimension Change Handlingdesign lever

The deliberate, business-data-steward-driven selection and correct application of techniques (types 0 through 7) for tracking or overwriting dimension attribute changes over time to accurately represent history as required by the business.

ETL Subsystem Qualitydesign lever

The robustness and completeness of the extract, transformation, and load architecture across its 34 subsystems covering extracting, cleaning and conforming, delivering, and managing the ETL environment.

Business Requirements Alignmentcontextual condition

The degree to which the DW/BI design is driven by gathered business needs, key performance indicators, and decision-making processes rather than solely by source data or specific report requests.

Strong Executive Business Sponsorshipcontextual condition

The presence of an influential, accessible business sponsor with a compelling business motivation who champions the DW/BI initiative and drives organizational consensus and data governance.

User Understandabilitypsychological state

The extent to which business users find the data structures, labels, and dimensional schema intuitive, recognizable, and easy to navigate, mirroring their business vocabulary and thought processes.

Query Performancepsychological state

The speed and efficiency with which the database and BI tools return results to users, enabled by simple symmetric schemas, surrogate keys, aggregates, indexing, and partitioning.

Agile Incremental Development Velocitybehavioral pattern

The pace and manageability with which new business processes can be delivered by reusing conformed dimensions and tackling iterative, realistically scoped increments rather than galactic multiyear projects.

Data Integration and Consistencybehavioral pattern

The degree to which information from separate business processes and source systems can be combined and consistently interpreted across the enterprise via drill-across on shared conformed dimensions and facts.

Business Acceptance and Adoptionoutcome metric

The active embrace and ongoing use of the DW/BI environment by the business community, deemed the ultimate measure of DW/BI success.

Improved Fact-Based Decision Makingoutcome metric

The ultimate organizational outcome of better, more informed decisions and business impact/value attributable to the DW/BI system serving as the authoritative decision support foundation.

How they connect

  • dimensional modeling discipline influences user understandability
  • dimensional modeling discipline influences query performance
  • atomic grain data influences user understandability
  • verbose dimension attributes predicts user understandability
  • conformed dimensions predicts data integration consistency
  • conformed dimensions influences agile incremental development
  • scd change handling influences data integration consistency
  • etl subsystem quality influences data integration consistency
  • etl subsystem quality influences query performance
  • user understandability predicts business acceptance
  • query performance predicts business acceptance
  • data integration consistency influences business acceptance
  • business acceptance predicts improved decision making
  • business requirements alignment moderates business acceptance
  • business sponsorship moderates data integration consistency
  • business requirements alignment influences dimensional modeling discipline

A candidate measure

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling — derived measurement candidates

Dimensional Modeling Discipline

Percent of fact tables with documented grain; Count of grain violations per schema; Technique-checklist compliance score

self-report suitability: medium

Atomic Grain Data Availability

Grain level gap (source vs. warehouse); Proportion of fact tables at atomic grain

self-report suitability: low

Conformed Dimensions and Bus Architecture

Number of conformed dimensions; Percent of fact tables sharing conformed dimensions; Bus matrix coverage ratio

self-report suitability: medium

Verbose Descriptive Dimension Attributes

Average descriptive attributes per dimension; Decode coverage percentage; Attribute value completeness rate

self-report suitability: low

Slowly Changing Dimension Change Handling

Percent of attributes with assigned SCD type; Historical association accuracy rate

self-report suitability: medium

ETL Subsystem Quality

Subsystem coverage count (of 34); Error event rates; ETL job success/restart rates

self-report suitability: medium

Business Requirements Alignment

Requirements-to-design traceability percentage; Stakeholder alignment ratings

self-report suitability: high

Strong Executive Business Sponsorship

Readiness assessment sponsorship rating; Frequency of sponsor engagement

self-report suitability: high

User Understandability

User understandability survey score; Training hours per user; Navigation support ticket rate

self-report suitability: high

Query Performance

Average query response time; 95th percentile response time; Throughput (queries/hour)

self-report suitability: medium

Agile Incremental Development Velocity

Time-to-market per new business process; Dimension reuse ratio per increment

self-report suitability: medium

Data Integration and Consistency

Reconciliation success rate; Number of stovepipe inconsistencies detected

self-report suitability: medium

Business Acceptance and Adoption

Active user count and trend; Query/report volume; User satisfaction index

self-report suitability: high

Improved Fact-Based Decision Making

Documented decisions referencing warehouse data; Revenue/profit improvements attributable to DW/BI

self-report suitability: medium

Run the assessment

The story

The reader A DW/BI designer, modeler, ETL developer, or manager who wants to deliver analytic data that business users can easily understand and query quickly to make better, fact-based decisions.

External problem

Organizations collect tons of data scattered across disparate operational systems but can't access, integrate, or analyze it consistently or quickly.

Internal problem

Practitioners feel overwhelmed by complexity and uncertain whether their designs will withstand unpredictable business questions, fearing the dreaded 'it depends' spaghetti-mess of databases.

Philosophical problem

It's just plain wrong to build elegant technical solutions that the business community can't accept and use; a data warehouse exists to serve readers (business users), not to showcase technology.

The plan

  1. Focus first on business requirements and the most critical, feasible business process.
  2. Apply the four-step dimensional design process: select the business process, declare the grain, identify dimensions, identify facts.
  3. Build fact tables at the lowest atomic grain with verbose dimension attributes.
  4. Integrate processes using conformed dimensions on the enterprise data warehouse bus architecture.
  5. Handle attribute changes with appropriate slowly changing dimension techniques.
  6. Design and operate the ETL system across its 34 subsystems.
  7. Follow the Kimball Lifecycle through deployment, maintenance, and iterative growth.

Success

  • Business professionals make better decisions and generate payback on DW/BI investments.
  • A coherent, integrated DW/BI environment delivers consistent, understandable, and fast information across the enterprise.
  • Development becomes more agile as conformed dimensions are reused, shrinking time-to-market for new business processes.
  • The business community embraces the system as the simple and fast source for actionable information.

At stake

  • Isolated stovepipe data sets perpetuate incompatible, irreconcilable views of organizational performance.
  • Endless meetings arguing about who has the right numbers rather than making decisions.
  • Premature summarization traps users behind analytic brick walls when they need detail.
  • The business abandons the unaccepted DW/BI system, wasting millions invested to design, build, and staff it.

Related in the library