Incremental Identity Resolution: Keeping Your Entity Graph Current as Data Changes

Entity Resolution
April 9, 2026

A one-time entity resolution run is useful for a report. It's not useful for running a business.

In any active organization, entity data is constantly changing. Customers update their addresses. New accounts are opened. Mergers bring new supplier records into the system. Businesses expand to new channels and new source systems come online. The entity graph you build today will be incomplete by next week if there's no mechanism to keep it current.

This is the problem that incremental identity resolution solves.

What Changes, and Why It's Hard

When new or updated records arrive, several things can happen to the existing entity graph:

A new record matches an existing entity. A customer signs up on your website with a slightly different email. Their new record needs to join the existing cluster and inherit the entity's ZINGG_ID.

An updated record no longer matches its original cluster. A record was incorrectly captured and has been corrected. The corrected version no longer resembles the records it was grouped with. The cluster needs to split.

Two previously separate clusters now belong together. An update provides new information — a shared phone number, a corrected SSN — that links what appeared to be two separate entities into one. The clusters need to merge.

A new record matches no existing entity. It stands alone, and a new entity is created with a new ZINGG_ID.

An update reverses an earlier change. Data was modified and then reverted. The entity should end up exactly where it started.

Any combination of these can happen simultaneously across millions of records. Getting the cluster assignments right is complex. Getting the ZINGG_ID continuity right through all of them is even harder.

Why Full Re-Match Doesn't Scale

The conceptually simple answer is to re-run the full matching pipeline every time data changes. Load the complete updated dataset, re-match everything, generate new cluster assignments.

But this breaks down immediately for two reasons.

First, cost. Entity resolution comparisons grow quadratically with data size. A dataset that doubled in size doesn't take twice as long to match — it takes roughly four times as long. For enterprise datasets in the tens or hundreds of millions of records, a full re-match on every update cycle is not feasible.

Second, ZINGG_ID stability. A full re-match generates fresh cluster assignments. Nothing in a new run knows what the previous cluster IDs were. The ZINGG_IDs that downstream systems, applications, and models have been using become meaningless. Every system that touches the entity graph would need to be updated, retrained, or invalidated. That cost is enormous and largely invisible until you've already built the dependency.

How Zingg's Incremental Flow Works

Zingg Enterprise's runIncremental phase processes only the records that have changed — new additions and updates — against the existing entity graph.

Rather than re-matching the entire dataset, it:

  1. Matches incoming records against existing clusters using the trained model
  2. Determines whether each incoming record merges into an existing cluster, forms a new one, or triggers a split
  3. Updates the cross-reference table to reflect the new state
  4. Preserves existing ZINGG_IDs throughout — merging clusters inherit the appropriate ZINGG_ID, splitting clusters get a new one for the diverging group

The result: your identity graph stays current with the pace of your data, without the compute cost of re-processing everything, and without disrupting the downstream systems that depend on stable entity identifiers.

Cluster Merge and Split Mechanics

To make this concrete, here are the core scenarios Zingg's incremental flow handles:

New record matches an existing cluster: The record is added to the cluster. The ZINGG_ID of the cluster is assigned to the new record in the cross-reference table. No disruption downstream.

New record matches no existing cluster: A new cluster is created. A new ZINGG_ID is generated and assigned. Downstream systems don't know or care — there's simply a new entity they haven't seen before.

New record matches two existing clusters: This is a merge. Both clusters are combined into one. The surviving ZINGG_ID is determined by a consistent rule (the older, or higher-ranked, cluster's ID survives). The cross-reference table is updated to point all records to the surviving ZINGG_ID. Systems holding the old ZINGG_ID for the absorbed cluster will resolve correctly through the cross-reference.

Update causes a record to leave its cluster: The cluster splits. The departing record either joins another cluster or forms a new one with a new ZINGG_ID. The original cluster retains its ZINGG_ID.

These aren't edge cases — they happen constantly in live enterprise data. Building a production identity graph means handling all of them reliably.

Full Refresh Without Breaking Downstream

Sometimes a full re-match is unavoidable: a major schema change, a data quality remediation that affects the model's training data, a migration from one platform version to another.

In these cases, Zingg Enterprise's Reassign ZINGG_ID capability reconciles the new cluster assignments with existing ZINGG_IDs, maintaining continuity for downstream systems even after a full pipeline rebuild. The entity identifiers your applications already hold continue to resolve correctly without any changes on their end.

What This Enables

With incremental resolution running on a regular cadence, the ZINGG_ID becomes genuinely useful as operational infrastructure — not just an analytical artifact.

Customer service platforms can look up the full customer context by ZINGG_ID and trust that it's current. Fraud detection models can score entities by ZINGG_ID and catch patterns that span multiple account records. Marketing attribution can follow a customer across channels as their record set evolves. AI agents reasoning about customers, patients, or suppliers are working with entities that reflect the current state of the data.

The identity graph goes from a snapshot to a living system.

Further reading: - The ZINGG_ID: A Persistent Identifier for Your Entity Graph - Deterministic and Probabilistic Matching: Why You Need Both - Zingg Enterprise documentation - Contact us to discuss your incremental resolution requirements

Recent posts