resource: Prashant talk
Main Objective
- Deliver data that’s understandable to the business users
- Deliver fast query performance
Facts
Facts can be aggregated to different level of details.
- Atomic grain: The event itself (1 row per sale)
- Weekly grain (1 row for sales per that week)
- Monthly, etc
Storing atomic grain will lead us to compute higher levels of grain, so storing atomic grain data can be useful but may not be possible based on the volume of data.
Fact table: Contains lot of fields including the fact data + the relationships (lots of FK)
- Additive, Semi-additive and non-additive facts
- Most facts are additive
- Dont do ratios and then sum, but instead do sum and then ratio.
3 types of facts:
- Transactional, store at atomic grain level (events)
- Snapshot: daily snapshot, etc (inventory in a day)
- Accumulating snapshot
Drilling Down / Drilling Up / Rolling up
- Drilling down: Adding more headers to the report.
- Rolling up: Aggregation (individual SKUs roll up the brands, brands roll up to categories, and categories roll up to departments)
Dimensions
-
Lots of columns (less rows)
-
Primary source of query constraints, groupings and report labels
-
An example can be a Product table
- Product key (PK)
- SKU Number (natural key?)
- product description
- brand name
- category name
- etc
-
Generally not normalised/deduplicated
- Too much normalised schema is hard to understand
- Normalised data can lead to too many joins
-
Dimensions should be very descriptive
-
Flags as textual elements
Star Schema:
- Schema after bringing dimensions and facts together
Data Warehouse Design
- Design by business processes instead of departments.
- Conformed dimensions: Dimensions will appear from different sources. Standardise across sources, used to stitch everything together
- Don’t create schemas as per query pattern
- This is bad because the queries can be changing. So modelling should be done based on business processes.