Brains Up AnalyticsBRAINSUPAnalytics
ModelingData WarehouseSQL

Slowly Changing Dimensions Type 2, without the headache

The essential pattern for tracking history in dimensions — explained with a concrete example and the most common mistakes.

Every time someone asks "what was this customer's address when the sale happened?", you're facing a Slowly Changing Dimension (SCD) problem. And the answer, most of the time, is SCD Type 2.

The problem

Imagine a customer dimension. A customer moves to a new city. If you simply update the row (SCD Type 1), you lose where they were before — and your historical sales now point to the wrong city.

The solution: Type 2

In SCD Type 2, you never update an existing row for a tracked attribute. Instead, you close the current row and insert a new version. Each row gets three control columns:

  • valid_from — when the version became effective
  • valid_to — when it stopped being effective (or NULL/future date if active)
  • is_current — boolean flag for the live version
-- Close the previous version
UPDATE dim_customer
SET valid_to = current_date(), is_current = false
WHERE customer_id = 42 AND is_current = true;

-- Insert the new version
INSERT INTO dim_customer (customer_id, city, valid_from, valid_to, is_current)
VALUES (42, 'Lisbon', current_date(), NULL, true);

The fact table always references the surrogate key of the correct version at the time of the event — not the customer's natural key.

The most common mistakes

  1. Using the natural key in the fact. Always use the surrogate key of the version valid at the time.
  2. Forgetting idempotency. If the pipeline runs twice, it must not create duplicate versions. Compare attributes before generating a new version.
  3. Tracking too many attributes. Not every field needs history. Consciously decide what is Type 1 and what is Type 2.

In the Lakehouse, Delta Lake's MERGE makes Type 2 far more elegant than it was in the SSIS days. But the concept has been the same for decades — and it remains one of the fundamentals that most separates those who can model data from those who can only load tables.

Related articles

Enjoyed this? Check out the e-books for in-depth content.

E-books