Understanding household and person relationships using entity resolution on Snowflake

The Problem

Understanding donors and household relationships across multiple custom CRMs and aplication systems to power campaigns, data governance and compliance.

The User

The Orthodox Union (OU) is one of the largest Orthodox Jewish organizations in the United States. Founded in 1898, the OU supports a network of synagogues, youth programs, Jewish and Religious Zionist advocacy programs, programs for the disabled, localized religious study programs, and international units with locations in Israel and formerly in Ukraine. The OU maintains a kosher certification service, whose circled-U hechsher symbol, is found on the labels of many kosher commercial and consumer food products.  

Thank you, Shelomo, for agreeing to do this. It's been a pleasure working with you over the past few months. For the benefit of the audience, if you can please tell us a bit more about yourself and your background and what led you to data and analytics.

Sure. And it's been a real pleasure working with you and Vikas on this project.
It's been a lot of fun working together as we'll discuss later. My background is really interesting. I started my career in 2000 with a degree in Visual Communications which was around starting to look at website website design. So I got in there playing with Netscape Navigator - you know only us folks remember what that is and that led to front end development :-)  

I've always had an interest in in the development world and the combination of business UX and development led me to the position I am today with Orthodox Union as the Director of Product Development.

Something that's a little different about working within the Jewish nonprofit is because we are a smaller team but acting like a technology startup, all facets of the development world are under me. So, we have DevOps, UX, frontend and backend development, mobile development and data analytics. This really gives me a unique opportunity to really be involved with the full flow and really see how all the pieces fit together which is not something that everyone gets to do in this position.

Can you please share more details about Orthodox Union and the kind of applications you have?

The Orthodox Union is an umbrella organization that services the Jewish communities from age 0 till the end, and we go across all facets of the Jewish community and Jewish life. One of the main things that Orthodox Union is known for is kosher food certification. So all of our activities feed back into supporting the communities. Within Orthodox Union, every large part of the organization and every department has a custom CRM that we've built for them. We run over 40 websites. We have 5 mobile applications deployed. We really, really work like a technology startup.

Orthodox Union has a lot of good causes and a lot of data flowing in. Can you please share the data initiatives you have, especially the warehousing initiative that you have in place right now.

Sure. Actually data initiatives started like a pipe dream about 10 years ago for me while we were building system two and three out. We were trying to think how we can start to bring some of this data together on. Our first attempt back then was a centralized user database which didn't work out and we had to decouple because we were overriding user’s data and there were unique needs. There was a lot of thought over the next three years on how we could achieve something but the team just wasnt there yet. I think the technology wasn't there either in terms of something that that would address all our needs. With the current EVPs at the Orthodox Union, Josh Joseph made a huge push for data and being data-driven and he's really giving us the support needed to push this through. So we reinvigorated the data warehousing project.

Our goal is to take the pieces from all our different CRMS, from all the different websites and build out our golden records and that's the start of anyone's journey when it comes to working with data.


What kind of challenges prompted you to search for entity resolution and what really is the driver behind the entity resolution product project?

We have these different data sources and we wanted to be able to match people, find the common records against different applications, find who belongs with whom and how. We also have a very strong connection for households, not just individual people. Doing that across all systems and keeping the system still decoupled was always a challenge and we were looking at different tools – even some reverse ETL tools, but we didn't really find anything that fit.

At first we were going to start by writing our own matching logic and that's when we found Zingg open source. That was that was our first contact with your team during this research phase and so here we are.

Coming to Zingg - we've got the implementation going and there is a production flow in place. What's your experience been and what does the overall architecture look like?. How are you planning to consume this data.

We are taking a multi strategy approach. We have Airbyte which is bringing our application data from our databases like Postgres and Mongo into Snowflake.  We also have some external data coming in through fat pipe connections. Once everything is in Snowflake, we have a medallion process with the bronze, silver and gold data. We are doing our cleansing with DBT.  

Before we come to the output, that's where Zingg sits. Initially we thought that we would use Zingg Open Source which would take over whatever we couldn't match ourselves.

When we saw that you started offering an enterprise version sitting on top of Snowflake, we took a step back and we put Zingg first and you are the engine that's powering building out our golden records.

It's been quite a journey and a lot of building with all the feature requests and things that we've built together. Would you like to talk a bit about some of the cool things that you've driven us to build.

You know I love this. I love that I was able to take a part in the project. When we work with people, I think it's really important to build those connections and when you are able to build something out together, it's really a win, win for everyone. Some of the challenges along the way were within Snowflake just making sure the flow was optimized. Our first challenge was taking the runtime from like 12 hours to about 3 during the initial load. That went great.  

And I guess the biggest piece of all this really was working on the incremental piece. This was key. What we want is to run Zingg more than once a day and we don't want to have to look through everything every time we make a change to one of the data sets. Being able to batch 4 to 6 to 8 times a day and have those changes update the records is crucial and I think it's a crucial feature for Zingg. It's really going to set you apart and it's really exciting that we got here. We really did a lot of stress testing on it, but we're in an unbelievable place now because of it.

It's been an absolute thrill ride building this along with you - a lot of learning on the way as well and the product is definitely far more mature since we started. So thanks a lot for all that feedback and all the requests and ideas. Coming back to the problem, how are you planning to consume the Zing results going forward in your operational systems?

So the plan is once we have our production tables from Zingg, there's a couple of places we want to ship it to. We're shipping Zingg results to our BI tools. We want to do a reverse ETL into our existing systems. We'll set up a subscription based model where tool X could say, OK, I see something changed with that golden ID. If it's crucial, then we'll push it into the system. If it's something not crucial, we will ask the consumer do you want to update this entity with this piece of information. So we're going to leave a lot of choice in the hands of the businesses. We're also planning on outputting to OpenSearch, so we could have an enterprise wide search that could be embedded in to the application without having to build it out. It could also be peppered with data from each of the systems alongside that golden record.

So a lot, yeah, quite a lot.

Glad that things fit in so beautifully in with Zingg being a critical part of your data architecture. Coming to households - that's something very interesting and exciting as we built this along with you. Wanted to hear a bit more from your side about that.

After we build out the individual person's matching model, we started working on households and a lot of people are always interested in more than just the nuclear household. They're asking about, about grandparents, cousins, so on. So when we built that first model and we were reviewing it, we saw like households with 30 people and we started looking - is this correct? We had these huge family trees built by Zingg- so it worked a little better than I expected it to. Now we working on,restricting it a little bit to get only the nuclear households.  

But really, it just shows how powerful Zingg really is and it's just amazing what we're able to find there and the quality of the search.

Right, we could have constructed family trees out of models that we have created. Also wanted to get your views on the person model and the accuracy we've achieved there.

Oh yeah.  When I went to present some of the findings, I was really surprised. We thought earlier that after Zingg runs and we have all our matches, we are going to be looking for anything under like a 0.8 score and we will have to do a manual review of that data set. I thought because of the type of data we have and some of it being older and messier that we'd be in the thousands or the 10s of thousands in numbers of records we would manually review. But, with Zingg results, even matches between 0.65 - 0.8 score are really clean. We can actually pull back anything under 0.6 only and the number of records to manually review are somewhere in the 1000 range.

I was very surprised about how well we matched and how well the results are. It's just unbelievable. It far exceeded our expectations.  

I love this journey. I love the ride and we've talked about other collaborations in terms of UI work and I think it would be great for us to keep working together in this capacity and I'm really excited and invested in to see where Zingg goes from here.