Build vs Buy: Identity Resolution on AWS vs a Packaged CDP

Customer Data Platforms
April 20, 2026

Scott Brinker and Frans Riemersma’s December 2025 report Martech for 2026 opens with a question that cuts to the heart of what every enterprise marketing and data leader is wrestling with: not “which tools should we buy?” but “what is the right architecture for AI-driven marketing?”

The answer they arrive at is consequential for AWS native organizations. Data, Brinker argues, is no longer something that moves between systems, extracted, transformed, loaded, synced, and reverse synced. In the third age of martech, data becomes the shared substrate on which everything else runs. Not data as an asset to be moved around, but data as a common foundation that agents, applications, and humans all operate upon together. The natural home for that foundation is the platform where all enterprise data already lives.

For hundreds of large organizations, that platform is AWS. S3 is the de facto storage layer of the modern data stack. Glue catalogs it. EMR and Glue Serverless process it at scale. Redshift warehouses it. Lake Formation governs it. The data gravity in AWS is enormous, and that gravity means every argument for extracting customer data to a CDP is an argument for fighting physics.

“AI is only going to be impactful if you give it the right data resources to leverage. And you need to make sure that those resources are connected in a way that’s pragmatically accessible.”
Rebecca Corliss
— from Martech for 2026

This matters directly for identity resolution. Knowing who your customer is — unifying their records across Salesforce, your e-commerce platform, support systems, and behavioral data in S3 — is the prerequisite for every AI-driven personalization and analytics initiative. The question is whether to do that resolution in AWS, where all the data already lives, or extract it to a CDP vendor’s cloud.

Before explaining how to build this natively on AWS, let’s document exactly where packaged CDPs fall short because the operational reality has consistently diverged from the vendor narrative.

The Real Problems with Packaged CDPs

1. They Are Not Fast to Deploy

The 60–90 day figure in CDP vendor marketing refers to getting a basic pipeline flowing — not a reliable identity graph you can trust for production decisions. Enterprise CDP implementations — Salesforce Data Cloud, Adobe Real-Time CDP, Twilio Segment — consistently take 12 to 24 months from contract to production-quality identity resolution. Data modeling alone is a multi-month project. Add schema alignment, data quality remediation, connector configuration, matching model tuning, and validation: you’re well into year two before you have something production-ready.

2. They Are Not Cheap

Enterprise CDP contracts range from $300,000 to over $2 million annually before implementation costs. Systems integrators add $500,000–$1.5 million in year-one professional services. The Martech for 2026 survey found integration remained a Top 3 challenge even after CDP deployment. CDPs shift the integration burden from between your internal AWS services to between S3/Redshift and the CDP’s cloud. The friction doesn’t disappear; it moves, with a vendor now in the middle of your most critical data flow.

3. Your Data Has to Leave AWS

Every packaged CDP requires your customer data to be replicated to the vendor’s cloud. That means extraction pipelines out of S3 or Redshift, synchronization overhead as schemas evolve, bi-directional sync of resolved identities back, and a new data residency surface in every compliance conversation. For organizations with strict residency requirements such as EU data staying in eu-west, HIPAA data staying in HIPAA-eligible infrastructure, this is a material compliance problem, not an operational inconvenience. And for organizations with mature Lake Formation governance, the CDP sits entirely outside that governance perimeter.

4. Matching Is a Black Box

CDP vendors protect their matching algorithms as proprietary intellectual property. You accept their definitions of what constitutes a match, their confidence thresholds, and their merging logic. When a match is wrong and some percentage will always be wrong, you cannot audit the decision, retrain the model, or implement domain-specific rules. For KYC/AML, patient identity, and regulated B2B environments, this opacity is a compliance liability you cannot fully explain or defend to auditors or regulators.

5. Per-Profile Pricing Penalizes Your Growth

CDP pricing scales directly with profiles, events, and destinations. As your customer base grows — the entire point of marketing investment — your CDP bill grows proportionally, often non-linearly. Compute-based AWS pricing scales with workload, not customer count. That is a significant and compounding cost advantage as you grow.

6. They Create Lock-In at Your Most Critical Layer

“We want to own the core construct of the data and infrastructure. If we change agencies, all we need to do is flip the activation layer at the top. The foundation stays with us.”
Kumar Ram, VP/Global Head of Marketing Data Sciences, HP
— from The New Martech “Stack” for the AI Age

Your customer identity graph is your most strategic data asset. Once it lives in a CDP’s data model — once your merge history, match confidence scores, and identity linkages are in their infrastructure — switching means re-resolving all your identities, remapping all downstream systems, and losing historical linkages that often cannot be reconstructed. CDP vendors have engineered this lock-in deliberately. It is their business model.

7. They Were Built for B2C — B2B Identity Is a Different Problem

Traditional CDPs were architected for resolving individual consumers across digital channels. B2B identity resolution requires resolving accounts (companies), contacts within accounts, and hierarchical relationships between subsidiaries, parent entities, and buying groups. A single enterprise account appears across Salesforce, SAP, ZoomInfo, Bombora intent, and product usage data, under different names, domain variations, and DUNS numbers. Most B2B organizations running CDPs maintain separate account matching processes outside the CDP because the CDP simply cannot handle it.

Zingg on AWS: The Solution

Zingg on AWS closes every one of these gaps. It runs natively on Spark via AWS Glue Serverless or EMR, reads from S3, and writes a ZINGG_ID — a persistent, unified identity key — back to S3 and Redshift. No extraction. No vendor cloud. No black box. Everything stays in your account, under your IAM policies, in your Lake Formation governance perimeter.

“My view on build vs. buy has really changed. If you had asked me a year ago, I would have talked about a set of off-the-shelf SaaS tools. Now it’s all about what we can do with AI. We have 25 AI-native tools we’re immersed in and training our marketers on, plus 12 agentic AI use cases we’ve created that are unique to our business.”
Meagen Eisenberg, CMO, Samsara
— from The New Martech “Stack” for the AI Age

Building identity resolution natively on AWS, rather than buying a packaged CDP, gives your AI agents exactly what Brinker and Corliss describe: the right data resources, connected in a pragmatically accessible way — without the extraction tax, the vendor lock-in, or the black-box matching that CDPs impose.

Your Entire Enterprise Data Estate, In One Place

Identity resolution draws on your full enterprise data estate — finance records in Redshift, product telemetry in S3, ERP data via Glue, HR org structures, partner data, operational systems. The ZINGG_ID connects a richer, more complete picture than a CDP ingesting only marketing data could achieve.

AWS Provides Data Quality — Zingg Provides Identity

AWS Lake Formation’s fine-grained access control, AWS Glue Data Quality rules, and your existing dbt pipelines, Great Expectations checks, or Monte Carlo observability ensure records are clean before Zingg processes them. Zingg adds the ML-based probabilistic matching layer on top.

Full Auditability for Regulated Decisions

Every Zingg matching decision is traceable to the model’s confidence score, field-level signals, and labeled training examples. CloudTrail audit logging, IAM access controls, and AWS Macie for PII detection apply automatically to the ZINGG_ID when it lives in your account. A CDP’s matching algorithm is a vendor secret. Zingg on AWS is fully auditable.

B2B Account Resolution, Natively

Zingg on AWS resolves B2B accounts natively, account hierarchies, subsidiaries, buying groups, powering ABM in Adobe Marketo Engage, Salesforce Account Engagement, and 6sense.

The Production Architecture

Customer records in S3, cataloged by Glue Data Catalog, governed by Lake Formation. Zingg runs on AWS Glue Serverless for standard workloads; EMR for very large datasets. ZINGG_ID written back to S3 and Redshift. Step Functions or Airflow on MWAA orchestrate pipeline scheduling. Activation to Braze, Iterable, Klaviyo, Salesforce Marketing Cloud, and HubSpot flows through Hightouch or Census.

Conclusion

The data gravity in AWS is real. Fighting it — by extracting your customer data to a CDP’s cloud, resolving identities there, and importing the results back — is architecturally expensive, compliance-risky, and strategically shortsighted. The Martech for 2026 report is clear: AI will be impactful only if it has the right data resources, connected in a pragmatically accessible way. Zingg on AWS is that connection. The ZINGG_ID lives in your S3 and Redshift, governed by Lake Formation, enriched by your entire enterprise data estate, owned by your organization.

Work with gravity, not against it. Resolve where your data lives.

🔗 Zingg for AWS  |  Talk to the Team  |  Deployment Guides

📄 The New Martech “Stack” for the AI Age — Scott Brinker & Databricks (March 2026)
📄 Martech for 2026 — Scott Brinker & Frans Riemersma (December 2025)

Recent posts