The ZINGG_ID: A Persistent, Load-Bearing Identifier for Your Entity Graph

Entity Resolution

April 8, 2026

Most entity resolution tools will tell you which records belong together. What they don't tell you is how to refer to that group tomorrow, after new data has arrived and the matching has run again.

This is the problem the ZINGG_ID solves.

The Gap in Cluster-Based Matching

When Zingg Community matches records, it assigns a Z_CLUSTER field to each output row. Records with the same Z_CLUSTER value represent the same real-world entity. It's powerful — and it works well for batch analysis.

But Z_CLUSTER values are not stable across runs. If you re-run Zingg on updated data, the cluster assignments are recomputed. The same group of records that had Z_CLUSTER = 44210 last week might have a different value this week. There's no guarantee of continuity.

This creates a real problem the moment you try to use entity resolution output in anything downstream.

If a customer service platform stores the cluster ID as a customer reference key, it breaks on the next run. If a fraud detection system uses the cluster to link accounts, that linkage silently resets. If downstream dashboards use the cluster as a join key, they're joining on a value that may no longer mean what it meant when the data was loaded.

What every enterprise eventually asks for is not just "which records match" — but "give me a stable identifier I can use everywhere, permanently."

The ZINGG_ID in Zingg Enterprise

The ZINGG_ID is Zingg Enterprise's answer to this. It is a globally unique, persistent identifier assigned to each resolved entity — and it does not change between runs.

When new records arrive and Zingg processes them incrementally, existing ZINGG_IDs are preserved. If a new record matches an existing entity, it gets associated with that entity's ZINGG_ID. If a cluster splits because updated records no longer match, the surviving cluster keeps the original ZINGG_ID, and the new cluster gets a fresh one. If two clusters merge, the surviving ZINGG_ID is deterministically chosen.

The identity of an entity persists even as its constituent records change.

What a Cross-Reference Table Enables

Alongside the ZINGG_ID, Zingg Enterprise maintains a cross-reference table: a mapping between every source record and the ZINGG_ID of the entity it belongs to.

This cross-reference is the operational heart of your identity graph. It lets you:

Look up the full entity from any source record. A help desk agent pulling up a ticket can retrieve the customer's complete interaction history — purchases, support cases, email signups — by resolving through the cross-reference to the ZINGG_ID and back out across every source system.

Join across systems without shared keys. Two systems that have never shared a customer identifier can be joined through the cross-reference. The ZINGG_ID becomes the universal foreign key that doesn't exist anywhere in your source data but needs to exist in your analytics layer.

Build stable segmentation. Customer segments defined against ZINGG_IDs survive data updates. A customer moving from one cluster to another updates the cross-reference, but their history travels with them.

Power AI and agentic workflows reliably. A RAG system or AI agent querying customer data needs a stable entity to reason about. An entity that exists differently in every source, with no persistent identifier, produces inconsistent and hallucinated context. The ZINGG_ID gives AI systems a stable anchor.

Persistence Under Change

The hardest part of maintaining a persistent identifier isn't the initial assignment — it's keeping it correct as data changes. Records get updated. New records arrive that match existing entities. Sometimes an entity that appeared to be one person turns out to be two.

Zingg Enterprise's incremental flow handles all of these scenarios without requiring a full re-match. Clusters merge, split, and expand while the ZINGG_ID provides continuity throughout. The incremental flow post goes deeper on the mechanics of how cluster updates are handled.

Reassigning ZINGG_IDs After Full Refresh

There are cases where a full refresh is unavoidable — a schema change, a data quality remediation pass, a major source system migration. After a full re-match, the raw cluster assignments are new, and the existing ZINGG_IDs need to be reconciled with the new clustering.

Zingg Enterprise's Reassign ZINGG_ID capability handles this by mapping new cluster assignments back to existing ZINGG_IDs, preserving the continuity of downstream references even after a full pipeline rebuild. Systems that depend on ZINGG_IDs don't need to be notified or updated — the identifier they already hold continues to resolve correctly.

The Right Mental Model

Think of the ZINGG_ID the way you'd think of a Social Security Number or a passport number: an identifier that represents a person, not a record. Individual records come and go, get updated, get corrected. The person persists. The ZINGG_ID persists with them.

In a data architecture built for AI — where downstream models, agents, and applications need to reason about entities, not rows — this distinction matters enormously.

Want to see how ZINGG_ID works in your environment? Explore Zingg open source to understand the matching foundation, and contact us to discuss how Zingg Enterprise's persistent identifier fits into your data stack.

‍

The ZINGG_ID: A Persistent, Load-Bearing Identifier for Your Entity Graph

The Gap in Cluster-Based Matching

The ZINGG_ID in Zingg Enterprise

What a Cross-Reference Table Enables

Persistence Under Change

Reassigning ZINGG_IDs After Full Refresh

The Right Mental Model

Recent posts

Fuzzy Matching at Scale, Part 5: The Hardest Part — Incremental Flow and Living Clusters

Fuzzy Matching at Scale, Part 4: Thresholds, Scores, and Active Learning

Fuzzy Matching at Scale, Part 3: Blocking — Making Billion-Record Matching Tractable

Sign Up For Sonal's Newsletter And Be a Part Of Our Journey