When Customers Grow, We Grow: Redica’s Journey with Zingg

Engineering

May 17, 2025

Congratulations to Redica Systems on Their Recent Fundraise

This milestone is a well-deserved recognition of Redica Systems efforts in bringing intelligence, structure, and clarity to the complex world of regulatory compliance. Redica has been at the forefront of transforming life sciences data, consistently focusing on quality — in insight, engineering, and data management. We are honored to have contributed to this vision.

Redica Systems was among the earliest adopters of Zingg’s open source AI-powered entity resolution engine during its initial development stages. Over the past few years, we have had the privilege to work closely with their team — initially through open source collaboration and now via our enterprise edition on Snowflake. Their growth is especially meaningful to us because when our customers succeed, we succeed.

The User

Redica Systems is a leading data and analytics company serving the highly regulated life sciences sector, including pharmaceutical and MedTech industries. Their mission is to improve product quality, compliance, and regulatory monitoring by analyzing diverse external data sources such as inspection reports, regulatory guidelines, and health agency publications. Redica works with both structured and unstructured data, enabling clients to make informed decisions in an environment where data complexity and regulatory requirements are paramount.

The Problem: Unifying Diverse, Complex Data Sources

Redica pulls data from dozens of sources—global health agencies, inspection bodies, and proprietary regulatory datasets. Each source has its own quirks: formats vary, records are incomplete, and no global ID system exists to unify entities. The task? Deduplicate and reconcile over 10 million raw records to support high-stakes use cases like compliance tracking and vendor risk intelligence.

Challenges at a glance:

Number of sources: Dozens of global health and regulatory agencies
Data types: Structured (databases, spreadsheets) and unstructured (inspection reports, guidelines)
Volume: Over 10 million records requiring normalization and clustering

Key pain points:
- Country-specific data standards and formats
- No universal entity identifiers
- Requirement for extremely high precision to avoid compliance risks
- Hybrid mix of structured and free-text inputs

Why Identity Resolution Matters in This Context

To make this data usable, Redica had to create what is known as a “golden record”—a single, trustworthy profile for each entity, assigned a unique Redica ID. This global identifier powers:

Smarter risk intelligence for suppliers and vendors
Enhanced audit readiness and quality control
Continuous monitoring of regulatory compliance
Post-market and pre-market surveillance for medical devices

Put simply: solving identity resolution meant Redica’s customers could make smarter, faster, and more compliant decisions.

How Zingg Made It Possible

Redica chose Zingg as the backbone of their identity resolution engine—and the results were transformative.

1. Handling Scale and Redundancy

From an initial 10 million site records, Redica used a multi-phase pipeline to arrive at just 330,000 clusters—each representing a unique, validated entity:

Preprocessing and deduplication: Reduced to 1 million
Zingg-based resolution: Further condensed to ~400,000 clusters
Domain-based post-processing: Finalized to 330,000 golden records

2. Tackling Mixed Data Formats

Zingg’s flexible architecture allowed Redica to work with both structured fields (like address, organization name) and fuzzy, inconsistent free-text entries pulled from PDFs and inspection reports.

3. Adapting to Global Diversity

With global data comes edge cases—misspelled names, overlapping address formats, and varying agency terminology. Zingg’s support for fuzzy matching, normalization, and domain-specific rules made it easy to unify these edge cases reliably.

4. Accuracy Meets Automation

Compared to legacy rule-based systems, Zingg achieved over 90% improvement in match accuracy. Automation handled the bulk of record matching, while a human-in-the-loop workflow addressed ambiguous cases.

5. Scalable Deployment

Running on AWS EMR with Snowflake, Zingg now powers identity resolution at scale. What used to take 5–6 hours per run is now completed in under 45 minutes, with incremental processing and event-driven workflows in the pipeline.

Technical Stack Behind the Scenes

Data Warehouse: Snowflake
Object Storage: AWS S3
Transformations: dbt
Visualization: Sigma, GoodData
Application Stack: React frontend, migrating backend from PHP to Python

Why Zingg Was the Right Choice

This wasn’t a textbook Customer 360 problem. Redica’s data spanned industries, languages, and formats. It demanded a solution that was:

Open source and customizable
ML-powered yet domain-aware
Capable of scaling to millions of records
Built for messy, real-world data

As Redica CTO Arijit Saha put it:

“Zingg helped us create a global identifier, the Redica ID, which is crucial for unifying data and managing risk in life sciences. The problem’s complexity only highlights how powerful Zingg’s technology is.”

The Outcome: Faster, Reliable, Explainable Resolution

The Zingg-powered pipeline runs in under 45 minutes, twice a week, integrated into Redica’s Snowflake and AWS stack. It is now expanding beyond sites to include investigators and medical devices — bringing clarity to regulatory data and helping Redica deliver the single source of truth their customers depend on.

We worked together to help answer those questions in a way that was scalable, explainable, and trustworthy — all critical in regulated industries. As their platform evolved, Redica moved to the enterprise edition of Zingg on Snowflake, continuing to rely on the same AI-powered resolution foundation, just at a bigger scale.

What Stays With Us

One of Redica’s engineering leaders recently shared:

“Thank you Sonal Goyal and the Zingg team for building such a clean solution to a very complex data problem.
Open source Zingg helped Redica Systems lay a solid foundation for solving our key entity resolution problems a few years back, and now the Enterprise edition is helping us scale the solution to the next level.”
—Ayan Ghosh, Director of Engineering, AI, Redica Systems

Words like these are what we have built for. Not just adoption or expansion — but real trust.

Rajesh Pyne, Senior Data Engineer - II at Redica Systems, has also shared technical insights into Redica’s data journey in this excellent Medium post, outlining the challenges of regulatory data and the architecture behind their scalable data pipeline. In it, he highlights how using Zingg for entity resolution allowed Redica to unify fragmented data, automate high-precision clustering, and lay the groundwork for a global regulatory intelligence platform.

Beyond Users: A Collaborative Partnership

From the early days of testing models on fuzzy data to building integrations in Snowflake, the Redica team has been a true partner: clear in their goals, open in collaboration, and relentless about quality. We have learned a lot from working with them.

Arijit Saha, CTO of Redica Systems, shared how Zingg became a core part of their data journey—from open source to scaling with our enterprise edition. Read the full case study →

To Arijit Saha, Ayan Ghosh, Rajesh Pyne, and everyone behind Redica’s growth — congratulations. You have shown what is possible when foundational data work is done right. We are honored that Zingg is a part of that foundation.

And to Our Broader Community

Every customer who chooses Zingg — open source or enterprise — is building something bigger. You are creating systems that rely on accuracy and truth, often behind the scenes. We are just glad to be there with you.

Want to Learn More?

Zingg is open source, built for scale, and designed for teams that need clarity from chaos. Whether you are wrangling product catalogs, supplier data, or healthcare registries, Zingg helps you resolve entities with confidence.

Explore Zingg on GitHub or get in touch with us to see how we can help with your identity resolution needs.

‍

When Customers Grow, We Grow: Redica’s Journey with Zingg

Congratulations to Redica Systems on Their Recent Fundraise

The User

The Problem: Unifying Diverse, Complex Data Sources

Why Identity Resolution Matters in This Context

How Zingg Made It Possible

Technical Stack Behind the Scenes

Why Zingg Was the Right Choice

The Outcome: Faster, Reliable, Explainable Resolution

What Stays With Us

Beyond Users: A Collaborative Partnership

And to Our Broader Community

Want to Learn More?

Recent posts

Record Linkage vs. Entity Resolution vs. Data Deduplication: What's the Difference?

Entity Resolution: Build Vs Buy

When Customers Grow, We Grow: Redica’s Journey with Zingg

Sign Up For Sonal's Newsletter And Be a Part Of Our Journey