Tracking Mutable Data in Production
- SCD2
- Data Modeling
- Schema Design
Tracking changes over time for mutable records sounds simple until you try to make it work in production. I ran into this recently while building a payment system where some records naturally mutate over time, but we still need to keep their history. One example is business-related people: owners, representatives or directors; basically a person record that keeps getting refined over time.
At first, this looked like normal CRUD. Create a person row when they start onboarding, update it as we collect more data and move on. Simple and clean. The kind of thing anyone can design on a Monday afternoon and ship it by Tuesday morning.
Then I started integrating with external systems (assume KYC, AML etc) and documenting what the system must be able to answer in production. It’s not what should the API look like, rather what do we need to prove later. Two questions surfaced immediately:
- What is the “current” person data?
- What exact person data did we have on “a specific date”?
Question one is trivial. Question two killed the plain UPDATE approach.
If we overwrite a row, the previous state is gone. We can’t reconstruct what we knew last Tuesday. In a payment system like ours, “what we knew last Tuesday” is sometimes the only thing that matters.
This is where SCD2 enters the picture: keep the current view, keep the historical view and make both normal queries instead of special handling.
(I’m still learning this pattern while implementing it, so treat this post as a practical note from the field. Not a reference architecture)
The core issue
A regular UPDATE person SET ... works great when we only care about the latest value. Most apps live here and that’s fine.
In payment systems, it’s rarely that simple. Person data changes over time. Someone corrects a typo in their name. Verification status moves from pending to verified to requires_review. External references get attached days after initial creation. If we’re overwriting rows every time, we lose the trail. And during audit, when someone asks “what data did we send to the KYC provider on March 3rd”, we’re stuck grepping through application logs and hoping our structured logging was actually structured.
So the history requirement becomes load-bearing from day one.
Picking an alternative approach
To be honest, I didn’t start out wanting SCD2. I wasn’t familiar with it and for an MVP my instinct was to reach for something I already knew. So I spent some time looking at alternatives and what they would imply for day-to-day reads and migrations.
-
Audit log table: What if we keep a separate
person_audittable and write a copy of the row before every mutation? This can work, but it splits “current state” and “historical state” across two tables, usually with slightly different schemas over time. Every schema change becomes a two-table migration problem. Reads that need history turn into cross-table queries. It’s (probably?) fine when we only need history for post-mortems, but I needed point-in-time reads as a normal part of the product. -
Event sourcing: It’s a strong model, but it’s also a real architectural commitment. At this stage I didn’t need to replay event streams or model every change as a domain event. I needed versioned snapshots of a person record with clear “current” and “as-of” semantics. Event sourcing felt like more surface area than the problem needed.
-
Soft deletes with a
modified_atcolumn. This gives us a last-updated timestamp, not history. We still end up with one row per person, so we can’t reliably answer “what did this record look like on date X?”
SCD2 fit because it keeps history in the main table with a simple mental model:
- Each meaningful change creates a new row
- The old row gets a
valid_totimestamp, closing its validity window - The new row gets
valid_to = NULL, marking it as current
One table gives me two straightforward queries: “what’s current?” and “what did this look like at time t?”
The schema
Here’s a minimal slice of the schema from our system:
CREATE TABLE person (
id UUID PRIMARY KEY,
identity_id TEXT NOT NULL,
version INTEGER NOT NULL,
valid_from TIMESTAMPTZ NOT NULL,
valid_to TIMESTAMPTZ,
is_current BOOLEAN GENERATED ALWAYS AS (valid_to IS NULL) STORED,
superseded_by UUID REFERENCES person(id),
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT,
verification_status TEXT,
change_reason TEXT,
CONSTRAINT uq_person_identity_version UNIQUE (identity_id, version),
CONSTRAINT chk_person_valid_period CHECK (valid_from < valid_to OR valid_to IS NULL)
);
CREATE UNIQUE INDEX uq_person_current
ON person (identity_id)
WHERE valid_to IS NULL;
The schema may look simple, but a few fields worth talking about:
-
idvsidentity_id: This is the single most important design choice in the schema.identity_idis the stable identifier for a person across all their versions. It answers “who is this”.ididentifies one specific version row. It answers “which snapshot of this person are we looking at”. Every foreign key from a related table points atid, meaning it points at a specific version, not a floating concept of a person. -
version: An integer that increments with each new row for a givenidentity_id. Paired with the unique constraint(identity_id, version), it prevents accidental duplicate versions. It also makes it trivial to answer how many times has this person’s data changed without scanning timestamps. -
valid_fromandvalid_to: These define the time window during which a row was the active version.valid_to = NULLmeans this is the current row. The check constraintvalid_from < valid_to OR valid_to IS NULLprevents nonsensical time ranges where a row expires before it starts. -
is_current: A generated column computed fromvalid_to IS NULL. We could skip this and just filter onvalid_to IS NULLeverywhere, and honestly, that would be fine. I added it because it makes intent explicit in queries and it’s free to maintain since Postgres handles it automatically. -
superseded_by: A forward pointer from the old row to the new row. Not strictly necessary for SCD2 to work. We can always find the next version by querying onidentity_idwith the next higherversion. But having an explicit link makes it easy to walk the version chain in either direction without sorting. -
change_reason: A freeform text field to store the update reason, e.g. “Name corrected by user”, “Verification status updated by KYC provider” or whatever. From my experience, this is the kind of thing we’ll be grateful for six months from now when we’re staring at a version history and trying to figure out why version 4 exists. -
The partial unique index
uq_person_current: This one does real work. It guarantees that for anyidentity_id, there can be at most one row wherevalid_to IS NULL. In other words, there can only be one current version. If a bug in our write path tries to create two active rows for the same person, the database will reject it.
Now, before getting into the sharp edges with related tables, it helps to pin down the two things we do all the time with SCD2: writing changes and reading them back. The reads are the payoff, but they only stay clean if the write path is strict.
Write semantics
The read side of SCD2 is easy. The write side is where we either get it right or spend weeks debugging subtle data corruption.
The rule is simple: never do an in-place update of mutable person data on the current row. Instead, close the current row and open a new one. ALWAYS IN TRANSACTION.
BEGIN;
-- close the current row
UPDATE person
SET valid_to = GREATEST(valid_from + INTERVAL '1 microsecond', $effective_at)
WHERE id = $current_row_id;
-- insert the new version
INSERT INTO person (
id, identity_id, version, valid_from,
first_name, last_name, email, change_reason
)
VALUES (
$new_id, $identity_id, $next_version, $effective_at,
$first_name, $last_name, $email, $reason
);
-- link old row to new row
UPDATE person
SET superseded_by = $new_id
WHERE id = $current_row_id;
COMMIT;
The GREATEST call on step 1 probably looks weird. It did to me too. However, it does solve a very real problem: if two updates happen in rapid succession (say, within the same millisecond), $effective_at could equal valid_from on the current row, which would make valid_from = valid_to. That violates the check constraint and our transaction blows up. The microsecond guard keeps the old row’s validity window non-zero even in that degenerate case. It’s defensive but it avoids a class of this-only-happens-under-load bugs.
All three steps must be in the same transaction. If step 1 commits but step 2 fails, we’ve closed the only current row and now the person has no “current” state at all. Stale data is annoying. No current row is a broken system.
Read semantics
Once the write path is strict, reads are almost boring.
Current state:
SELECT *
FROM person
WHERE identity_id = $1
AND valid_to IS NULL;
Point-in-time lookup (“what did we know on March 3rd?”):
SELECT *
FROM person
WHERE identity_id = $1
AND valid_from <= $2
AND (valid_to > $2 OR valid_to IS NULL);
Full version history:
SELECT *
FROM person
WHERE identity_id = $1
ORDER BY version;
No special handling or joins against audit tables here, instead all our queries are against one table with a time range filter.
The real complexity
The person table itself was actually the easy part. The tricky stuff started once other tables began referencing person.id.
Remember that in this domain, a “person” can be a beneficial owner, a representative, a director etc. So you end up with something like a person_role table:
CREATE TABLE person_role (
id UUID PRIMARY KEY,
person_id UUID NOT NULL REFERENCES person(id),
role_type TEXT NOT NULL,
active_to TIMESTAMPTZ
);
This table reads as “person X has role Y”. The catch is that person_id points to a version row, not the person’s stable identity. So when we create a new person version, existing role rows still point at the old version.
That becomes a problem if our application joins roles to the current person row (where valid_to IS NULL). The role hasn’t been deleted. It’s still in the database. But it now points at a row that’s no longer current so it can disappear from our query results.
This is a real footgun I came across while working with this design.
My fix was straightforward: when a new person version is created, reassign active related rows to the new version ID, in the same transaction.
newPerson, err := personRepo.Update(ctx, params)
if err != nil { return err }
if newPerson.ID != currentPerson.ID {
err = personRepo.ReassignRolePerson(ctx, roleID, newPerson.ID, &actorID)
if err != nil { return err }
}
It’s a small bit of code with a big correctness implication. If we forget this step for even one related table, we get phantom data loss that only shows up in production when someone happens to update a person who also has roles attached.
An alternative is to have related tables reference identity_id instead of id, so they always point to the person concept rather than a specific version. I didn’t go that route because in some cases I actually want to know which version of the person a role was assigned to. But it’s a valid design and arguably simpler if we don’t need version-specific foreign keys.
Wrapping up
SCD2 ended up matching the shape of the problem: we can answer “what’s current?” and “what did we have on date X?” from the same table without keeping a separate audit model in sync.
It does add some overhead (and looks daunting). However, the difference between “works” and “mostly works” came down to being strict about a few database rules: one current row per identity, valid time windows and keeping the whole version-change flow inside one transaction.
Once that’s in place, reads stay simple and the model holds up under the messy reality of onboarding data arriving late and changing over time.