Most open source projects obsess over GitHub stars. We don't think much about them.
Not because stars don't matter at all — but because they're a very noisy signal for what we actually care about: understanding whether Zingg is helping people solve a real problem, which kinds of teams are using it, and what to build next.
This post describes how we think about measuring open source adoption for a product like Zingg, and the lightweight stack we use to get useful signals without a dedicated analytics team.
Commercial SaaS products have an advantage: users authenticate, sessions are tracked, and product analytics tools can tell you what features people actually use. You know who your users are.
Open source is harder. Zingg runs on-premise, in private cloud environments, in Databricks clusters, in Snowflake accounts, in air-gapped enterprise networks. Many deployments happen behind VPNs that block outbound telemetry. Some users run Zingg without ever joining the community Slack or emailing us.
This means traditional product analytics approaches either don't work or collect data that doesn't represent the full picture. A pull-based approach — measuring where Zingg is downloaded, who joins the community, who reaches out — is more reliable than assuming you can track all usage directly.
Our Slack community is the richest signal we have. When someone joins the Slack, asks a question, or describes what they're building, we learn more from that one conversation than from weeks of anonymous telemetry. The quality of the questions tells us where people are in the evaluation process. The use cases people describe — Customer 360, AML, KYC, healthcare record linkage, supplier master data — shape our roadmap more directly than any usage dashboard.
We pay close attention to: - New Slack joins and how they heard about Zingg - The problems people bring to the community - Where users get stuck (repeated questions signal documentation gaps or UX friction) - Which platforms and stack combinations come up most often
Docker Hub pulls, PyPI downloads for the Python API, and GitHub releases give us a rough proxy for active evaluation. These numbers don't tell us about successful deployments, but significant upticks correlate with content that lands well or conference appearances that drive discovery.
Some of our most valuable signals come from people who email us to describe what they've built. These unsolicited updates — "we're using Zingg to resolve 80 million supplier records on Databricks, here's what we found" — are gold. They tell us what's working in production, what edge cases people are hitting, and what the quality of our matching looks like in the wild.
We ask every person who reaches out where they heard about Zingg. Over time, this creates a reasonable picture of which content and communities are actually driving discovery.
Standard Google Analytics for website traffic, with particular attention to which documentation pages and guides drive the most engagement. Pages that get high traffic but short dwell time usually mean the content isn't answering the question people arrived with — a signal to improve rather than just publish more.
GitHub stars. We track them, but they're a weak proxy for anything. A star is not an install, not a user, not a customer. Projects can accumulate stars from people who will never run the software. We care more about the 700+ community members having real conversations than the star count.
Raw download numbers without context. A jump in Docker pulls could mean a viral Reddit thread from people who will never use Zingg seriously, or it could mean a Fortune 500 evaluation team doing parallel tests across five tools. The numbers look the same. The conversations that follow tell you which one it was.
Usage telemetry without explicit opt-in. We've thought about this, and for now the privacy-first approach wins. Many of our users are in regulated industries. They chose an on-premise, warehouse-native tool partly because data doesn't leave their environment. Shipping silent telemetry would be a strange choice for a product whose value proposition includes not doing that.
Every few weeks, we manually update a simple tracking sheet with the key numbers: community members, recent downloads, GitHub activity, new customer conversations. We chart the trend more than the absolute value — growth rate matters more than current size at this stage.
More importantly, we synthesize the qualitative signals into roadmap decisions. When five different community members in the same quarter ask about Zingg's behavior on data with high NULL rates, that's not a coincidence — it's a signal to improve how we handle NULL values in matching. When enterprise prospects keep asking the same question about incremental flows during demos, that's a feature that needs to be sharper in the product and clearer in the documentation.
We're transparent about the gaps. Dark adoption — enterprise teams running Zingg in air-gapped environments who never surface publicly — is real and underrepresented in our numbers. Some deployments resolving significant data volumes simply don't appear in any of our signals until someone decides to reach out.
This means our community-based metrics probably undercount actual usage. We're comfortable with that tradeoff. The signals we do have are high-quality and actionable. A metric we can't trust is worse than no metric.
The pattern we've arrived at: weight conversation-based signals over quantitative signals, and weight the trend over the absolute number. A community that's growing and asking increasingly sophisticated questions is a healthier signal than a large but quiet community.
The most valuable investment isn't in the analytics stack — it's in making sure that when someone has a problem with Zingg, they know where to go and get a response quickly. Everything else follows from that.
Using Zingg? We genuinely want to hear about it. Join the Zingg community on Slack, or reach out directly. Understanding how people use Zingg in production is the best product research we have.