Build vs Buy: Identity Resolution on BigQuery vs a Packaged CDP

Customer Data Platforms
April 20, 2026

In their December 2025 report Martech for 2026, Scott Brinker and Frans Riemersma describe what AI agents need more than anything else: context. Not more data in general, but the right data, unified, accessible, and ready to act on.

“The expectations for personalization are skyrocketing. Customers now expect every interaction to be contextually relevant and responsive to their immediate needs. AI makes this possible, but only if you have the infrastructure to process and act on data in real-time.”
Patrick Harrington
— from Martech for 2026

For GCP-native organizations, the infrastructure to process and act on data in real time is already largely in place. GA4 streams behavioral data directly to BigQuery. Google Ads performance data flows in. Looker queries it natively. Vertex AI trains models on it. The analytics-to-activation chain has never been more coherent for GCP-centric organizations.

Except for one foundational missing piece: knowing who your customer actually is. Knowing that the web session in GA4, the email click in your MAP, the purchase in your e-commerce system, and the support ticket in Zendesk all belong to the same person. That is the identity resolution problem, and it is the foundation on which every personalization, analytics, and compliance initiative rests.

The question is where to solve it. In BigQuery, where all the data already lives? Or in a packaged CDP, which requires extracting your data, resolving it in the vendor’s cloud, and importing the results back into the ecosystem you already have?

“People sometimes forget about how important campaign data is when talking about Customer Data Platforms. If you can unify all your customer and campaign data — I sometimes call it the C-squared Data Platform — you can be smarter in your decisions.”
Rick Schultz, CMO, Databricks
— from The New Martech “Stack” for the AI Age

Unifying customer and campaign data into a C-squared platform is exactly what Zingg on BigQuery enables. Before explaining how, let’s be precise about where packaged CDPs break down — because the vendor narrative has obscured the operational reality for too long.

The Real Problems with Packaged CDPs

1. They Are Not Fast to Deploy

The 60–90 day figure in CDP vendor marketing refers to getting a basic pipeline flowing — not a reliable identity graph. Enterprise CDP implementations — Salesforce Data Cloud, Adobe Real-Time CDP, Twilio Segment, mParticle — consistently take 12 to 24 months from contract to production-quality identity resolution. Data modeling alone — mapping your actual source systems to the CDP’s opinionated schema — is a multi-month project. Add schema alignment, data quality remediation, connector configuration, matching model tuning, and validation: you’re well into year two before you have something production-ready.

2. They Are Not Cheap

Enterprise CDP contracts range from $300,000 to over $2 million annually before implementation costs. Systems integrators add $500,000–$1.5 million in year-one professional services. The Martech for 2026 survey found integration remained a Top 3 challenge for the majority of martech respondents — despite CDPs being architected to solve it. CDPs shift the integration burden from between your internal GCP services to between BigQuery and the CDP’s cloud. The friction doesn’t disappear; it moves.

3. Your Data Has to Leave BigQuery

Every packaged CDP requires your customer data to be replicated to the vendor’s cloud. That means extraction pipelines out of BigQuery, synchronization overhead as schemas evolve, bi-directional sync of resolved identities back, and a new data residency surface in every compliance conversation. Your BigQuery environment is your source of truth, where your GA4 sessions, Ads data, and Looker models all live there. The CDP becomes a perpetual derivative that’s always fighting to stay synchronized with it, breaking the Google ecosystem flywheel you’ve invested in building.

4. Matching Is a Black Box

CDP vendors protect their matching algorithms as proprietary intellectual property, so you are required to accept their confidence thresholds and merging logic. When a match is wrong, and some percentage will always be wrong, you cannot audit the decision, retrain the model, or implement domain specific rules. For KYC and AML, patient identity, and government use cases, this lack of transparency becomes a compliance liability that you cannot fully explain or defend.

5. Per-Profile Pricing Penalizes Your Growth

CDP pricing scales directly with profiles, events, and destinations. As your customer base grows, your CDP bill grows proportionally, often non-linearly. BigQuery compute-based pricing scales with query workload, not customer count.

6. They Create Lock-In at Your Most Critical Layer

“We want to own the core construct of the data and infrastructure. If we change agencies, all we need to do is flip the activation layer at the top. The foundation stays with us.”
Kumar Ram, VP/Global Head of Marketing Data Sciences, HP
— from The New Martech “Stack” for the AI Age

Your customer identity graph is your most strategic data asset. Once it lives in a CDP’s data model — once your merge history, match confidence scores, and identity linkages are in their infrastructure — switching means re-resolving all your identities, remapping all downstream systems, and losing historical linkages that often cannot be reconstructed. CDP vendors have engineered this lock-in deliberately.

7. They Were Built for B2C — B2B Identity Is a Different Problem

Traditional CDPs were architected for resolving individual consumers across digital channels. B2B identity resolution requires resolving accounts (companies), contacts within accounts, and hierarchical relationships between subsidiaries, parent entities, and buying groups. A single enterprise account appears across Salesforce, ERP, ZoomInfo, Bombora intent, and GA4 sessions, under different names, domain variations, and DUNS numbers. Most B2B organizations running CDPs maintain separate account matching processes outside the CDP because the CDP simply cannot handle it.

Zingg on BigQuery: The Solution

Zingg on BigQuery closes every one of these gaps. It runs on Dataproc Serverless, reads from BigQuery via the BigQuery Storage API, and writes a ZINGG_ID — a persistent, unified identity key — back to BigQuery tables. No extraction. No vendor cloud. No black box. The ZINGG_ID is immediately available to GA4 exports, Google Ads Customer Match, DV360 audiences, Looker, and Vertex AI workflows, activating the full analytics-to-personalization chain that GCP-native organizations have been building.

Your Entire Enterprise Data Estate, In One Place

Zingg on BigQuery draws on your full enterprise data estate including finance records, ERP data, product telemetry, HR org structures, partner data via BigQuery Data Clean Rooms. The resulting ZINGG_ID connects a richer picture than any CDP ingesting only marketing data could produce.

Google’s Infrastructure Provides Data Quality — Zingg Provides Identity

BigQuery’s native column-level security, row-level access policies, and Data Catalog lineage — combined with your existing dbt transformations, Monte Carlo observability, or Great Expectations checks — ensure records are clean before Zingg processes them. Zingg adds the ML-based probabilistic matching layer on top.

The Google Ecosystem Flywheel

GA4 streams behavioral data directly to BigQuery. Google Ads can be targeted from BigQuery exports. Looker queries BigQuery natively. Vertex AI trains ML models on BigQuery data. When the ZINGG_ID lives in BigQuery, every link in this chain — behavioral data, unified identity, analytics, ML, and activation to Google marketing surfaces — operates without an extraction step. The C-squared Data Platform Rick Schultz describes lives natively in BigQuery.

B2B Account Resolution, Natively

Zingg on BigQuery resolves B2B accounts natively including account hierarchies, subsidiaries, buying groups powering ABM in Adobe Marketo Engage, Salesforce Account Engagement, and 6sense.

The Production Architecture

Customer records in BigQuery, cataloged with Data Catalog. Zingg runs on Dataproc Serverless. ZINGG_ID written back to BigQuery. Cloud Composer orchestrates incremental pipeline runs. Activation to Braze, Iterable, Klaviyo, Salesforce Marketing Cloud, and HubSpot flows through Hightouch or Census.

Conclusion

The Martech for 2026 report is clear: AI personalization at scale requires the infrastructure to process and act on unified, accessible data in real time. GCP native organizations have invested heavily in exactly that infrastructure. Zingg closes the remaining gap, enabling you to know who your customer is across all those data sources without extraction, without a vendor in the middle, and without a pricing model that penalizes your growth.

The ZINGG_ID in BigQuery is the identity foundation your composable, AI-powered marketing stack has been waiting for.

🔗 Zingg for BigQuery  |  Talk to the Team  |  Deployment Guides

📄 The New Martech “Stack” for the AI Age — Scott Brinker & Databricks (March 2026)
📄 Martech for 2026 — Scott Brinker & Frans Riemersma (December 2025)

Recent posts