Canadian Football League Builds Fan 360 Using AI Based Entity Resolution

The Problem

Building a single source of truth of fan data for marketing teams to deliver a consistent and personalized message

The User

The Canadian Football League is a professional sports league in Canada. The CFL is the highest level of competition in Canadian football. The league consists of nine teams, each located in a city in Canada. They are divided into two divisions: four teams in the East Division and five teams in the West Division. As of 2024, it features a 21-week regular season in which each team plays 18 games with three bye weeks. This season traditionally runs from mid-June to early November. Following the regular season, six teams compete in the league's three-week playoffs, which culminate in the Grey Cup championship game in late November.

The Grey Cup is one of Canada's largest annual sports and television events.

Can you please tell us about yourself and your current role?

My name is Dave Musambi. I am the Senior Director of Business Intelligence at the Canadian Football League. I would say in our words, business intelligence is a bit of a catch all for everything data related. I oversee a team of data engineers, analytics engineers and at times we kind of masquerade ourselves as data scientists as well really with the journey of modernizing how we think about data and technology across the entire Canadian Football League.

The Canadian Football League is the second largest Football League in the world.
We comprise of nine teams almost coast to coast from BC all the way to Montreal and hopefully a tenth theme.

In my role basically I oversee our league wide data platform that's inclusive of all important data assets across customer data from all of our nine teams in addition to the CFL league office as well. So that's a little bit about my role and also the Canadian Football League which is a very fun, exciting Football League here in Canada.

CFL is definitely one of the really interesting users that we see around Zingg and we are very happy to be a part of this journey with you. If you could please tell us about the current projects that you are working on, what are some of the key things that you're working on and what are the problems that you ae trying to solve with data right now.

Overall like this year has been a very interesting year for us. Last year we'd spend a lot of time just getting our platform up and running. We're on Snowflake. We also work in AWS and and really last year was a lot of data engineering work just to get data into our platform. This year the biggest focus and shift was really around activation. How do we start to use our first party data to better understand our fans, better segment our fans and better personalize their experience.

So with that initiative, a little bit of work needed to be done to really focus on how do we make it actionable and accessible across the organization and trustworthy.So our largest project this year has definitely been our customer 360 initiative, really thinking about it from a league wide, league wide standpoint, right? For league, there's not, like I said, there's nine different teams. There's league office as well. You can buy tickets to one team. You might buy tickets for another team as well. You might also interact with the league digital products, some of the products that are built for fan engagement in addition to buying merch, right.

So there's a lot of touch points that happen in a sports environment and we're very lucky to have fans and not just customers. And and really the the the difference there is that traditionally a customer might, you know, just transact with you. But our customers, they paint their faces, they they show up in uniform in a jersey and are very avid. And that's great from a data perspective because you learn a lot about your fans, the challenges stitching all of that information across all these various data sources that can come in from ticketing, that can come in from our e-mail engagement, that can come in from e-commerce, it can come in from our fan engagement products.

So there's so many touch points and so many ways to get into the funnel making it very difficult to very much consolidate that information and really understand and and measure and follow the customer journey. So our largest project has really been around our customer 360, how do we better understand our fans and we have a bias towards open source tools to do that.

So that's our largest project today, if I'm biased, I would say it's our customer 360 initiative.

Love the difference that you made between a customer! Can you please share some of the users of this data - who are the consumers of this data?

Primarily it's marketing, really understanding the contents of our data, what data we have available, who are fans and then being able to deliver custom custom messages to them. So our marketing teams being able to better understand our fans, better reach out to them. If you are a fan of the league and you've been around for the last 30 years versus someone that's a casual fan, who is just being introduced to the league for the first time. These are completely different personas, right? And we want to better understand what serves one group versus the other. The example that I often see in my inboxes, organizations that maybe don't do this to the best degree.And I've got my Gmail account.
I also have my CFL account. I also have my account and at times you get 3 different offers or three different messages in your inbox because well, humans are unpredictable.
You never know what what account you're going to use for, you know, whatever instance. There is no rhyme or reason as to why I might use my Gmail account or my previous school account.

So I think being able to deliver consistent message across all platforms are very important and ensuring that we have the right data to make informed decisions for a marketing team is really what's driving our use case for data.

Can you please share more details about the data stack at CFL?

It's definitely gone through an evolution our our data stack, but at the core, Snowflake has really been our core data platform. From an ELT perspective, we've done a little bit of everything. We've done Stitch, we've done Fivetran and this year we started migrating over to an open source solution via Meltano. So that was a very exciting project for us to complete earlier in the year. In addition to that, we use Rudderstack for a customer data platform. And then from a reporting standpoint, we use Tableau and our cloud platform is AWS.

So we have a hybrid data platform. We think about our data science use cases as something we could potentially leverage in AWS, in our data link in S3 or in Snowflake.

But we definitely wanted to have that flexibility once those more advanced use cases for data do come to fruition in the years to come.

We have been in close communication for more than an year now. Can you please share your Zingg journey with us?

We definitely go a while back. The common thing for me is finding new technologies on Reddit and I believe I was looking for a solution that would offer customer 360. I stumbled across a Reddit thread that mentioned Zingg AI.
It fit the use case of kind of what I like in terms of the optionality of building a completely open source solution from obviously through you with your cloud product.

And I knew that we had a very specific fuzzy logic based implementation that was required, right. Deterministic for us would would do well. But you know people that are in the sports industry and especially in the ticketing industry understand that there is a lot of dirty data and for us to fully understand our fans we need a little bit more sophistication with what we'd implement for a customer 360 model. So our journey has been very interesting. We've started using Zingg AI in conjunction with some other products as well for determining deterministic matching in addition to more fuzzy based matching.

Some of the results that we've seen so far are very promising profiles that otherwise wouldn't consolidate into one are consolidating into one. We have a lot more confidence in the results when leveraging something like Zingg AI compared to when  not.

I think a rough estimate, we've looked at the number of profiles that consolidate into or number of rows that consolidate into one profile that have differing emails and that's at least 10 to 15% of our records, right. So those are 10 to 15% of your records that would otherwise go unmatched, which is very exciting to see kind of the early results with Zing AI in terms of, you know, my biggest fear always is, is really around, you know, I think about simplicity, I think about cost and I think about scalability.I think a rough estimate, we've looked at the number of profiles that consolidate into or number of rows that consolidate into one profile that have differing emails and that's at least 10 to 15% of our records, right.So those are 10 to 15% of your records that would otherwise go unmatched, which is very exciting to see kind of the early results with Zing AI

I think about simplicity, I think about cost and I think about scalability. I read your article that you released on LinkedIn not not too long ago about the journey of optimizing Zingg AI from, you know, computing directly on Snowflake in over 24 hours to less than 20 minutes and we've seen those positive results as well.

So that has been a very important key milestone in our journey because you know, like I said, cost is very important, scalability is very important and you know with a lot of that being managed by Zingg AI, simplicity comes to mind and gets ticked off as well.

What are some things that you would like to see in Zingg AI?

Yeah, I think the journey with Zingg so far is fun - we're seeing a lot of a lot of you know improvements in new releases over time.

The thing that we're trying to crack at the moment is really around persistence, persistence of our IDs especially in these incremental runs. How do we ensure that new new rows of data, new users come into the ecosystem can be matched with a persistent ID in place so that we can leverage that ID across all of our marketing marketing tools.

One of the tools that I forgot to mention is Braze. Braze is our customer data platform or our customer engagement platform right sits on top of Snowflake, sits on top of Rudder stack and being able to have of that persistent ID across all of our activation channels is is something that we're looking to crack.

And of course we always welcome Zingg AI support in you know cracking that code because that's often a big challenge when you think about customer 360 is how do we ensure consistency and persistence of these IDs being generated.

But thus far you know our earliest win has been around performance and and that's what we're very happy with.

As a data leader, if you meet other data leaders who are thinking about customer 360, what are some key pieces from your journey that you would like to share as advice, lessons or just feedback?

Yeah, absolutely. I think for us our, our infrastructure was really set up to leverage something like Zingg AI.

When we think about the different layers that we have in our Snowflake data warehouse, we have our ingest layer, all our raw data, you know, untouched.

We've got our clean layer where we do a lot of our data cleaning, right. So when we're dealing with emails, you know, trimming white space, making emails lower case as well, first name in a cap, right. A lot of the work that we did directly within that clean layer help prepare us for something like Zingg AI.

There is data cleaning that comes out-of-the-box with Zingg AI, but to standardize your data as much as you possibly can in a clean or otherwise similar layer is very important
There is data cleaning that comes out-of-the-box with Zingg AI, but to standardize your data as much as you possibly can in a clean or otherwise similar layer is very important. So when thinking about implementing Customer 360, definitely think about data consistency as much as you possibly can.

Think about validation rules directly in a layer that it's applicable to because that will only improve some of the results that you have when you come to implementation.

There's a lot of discussion about a composable CDP and an off the shelf CDP and you didn't choose a packaged CDP in this case. As a data person who's not a vendor, what is your take on this topic? Composable or legacy or packaged CDP?

Well, I get excited with building things so I'd naturally have a bias to a composable CDP.

But when I think about from a CFL specific standpoint in terms of why we went composable versus using, you know some of the providers that have been around for a little while is that a lot of the services offered by CDP were redundant. In our use case, we're very much a unique setup in the in the sense that we have nine different organizations which are our member clubs with very different ownership structures.
So as an example, you might have one organization that is a single privately owned club.
You might have an organization that is owned by the community and you might have an organization that in addition to owning a CFL team, they're multi property. They might have a NBA team or a NHL team etcetera.

Now ELT is really a service offered by these CDP providers, right. You've got data in Shopify, let's pull everything out. But we are very data engineering focused within our team almost by default. A lot of the ELT tools out-of-the-box just don't work for our use cases because in the example of I'm dealing with a multi property organization, I can't just pull everything out of their system, I can't pull everything out of Shopify. One of the most complex engineering efforts that we had was our multi tenant Shopify integration.

You might have a product that's CFL related but also NBA related before that data lands.
In our data Lake in AWS, we are removing the non CFL related products. We are readjusting the total order of that value, the total quantity, the tax rate etcetera before it ever lands in for security reasons, right.

We want to protect that information as best as we can. You can't do that in an off the shelf CDP, right?

You're pulling everything, you're pulling nothing.

So by default our single source of truth is in our warehouse in Snowflake. The conversations that we had at that time trying to make an off the shelf CDP work really gave me concerns in terms of scalability in terms of real time aspect and being able to keep that in sync. So for organizations that have already made that investment in data engineering that have you know their data into a single repository be that Snowflake, Bigquery, Redshift etc, a natural extension of your infrastructure is further owning it and and going composable and utilizing tools like Zingg AI here for customer 360 to better leverage the data that you have.

Ownership is very important for us and scalability and avoiding a black box. Which often times that could be. CDPs have worked for a very long time - there’s a place for it as well. If you're don't have specialized data engineering talent and you want to get up and running, they still have a place in in the market.

But for organizations that have taken that path forward in terms of owning their infrastructure, Composable is a route that I'd recommend.

That is really a great piece of advice and you know how to go about things and I really hope this is going to help other people who are thinking along similar lines and facing the same problems.

I'm really excited to to be here to chat with you and hopefully this has been a good opportunity for other people to listen and hear the journey of the CFL.

We've moved very fast and Zingg AI is a very exciting product that we're excited to integrate directly into our platform.

So thank you for having me and as always you know hope to hear from you soon and chat with you in the days to come.