Q&A: Reelgood CEO David Sanderson Talks About Streaming Metadata & A New Deal With Philo

Reelgood is an example of how data drives much of what happens in the world of streaming television and movies. 

If you are familiar with Reelgood, it's likely because of its app and web site which allows consumers to enter all of the streaming services they subscribe to and then navigate a catalog of combined titles. Which allows you to search for a title and know precisely where you can watch it.

As you might imagine, it requires a lot of data to make a service like that work, and as a result Reelgood also has a growing streaming metadata offering that it licences to nearly all of the major (and minor) streaming services. The live TV streaming service Philo just announced a metadata deal with Reelgood and that was the impetus for my conversation Tuesday with CEO David Sanderson. 

But the conversation touched on a number of related topics, including a history of the company, the challenges of putting together an accurate set of metadata and how even deals such as Tubi's ChatGPT integration relies on data provided by ReelGood.

The following conversation has been lightly edited for clarity.

I'm curious, at what point in the evolution of Reelgood did it occur to you that the data you were collecting was as profitable or more profitable as the consumer-facing part of the business?

David Sanderson: It happened very organically. I can give you a bit of background and how we evolved to it. So when I started the company, we just set out to be this consumer streaming guide. Putting all your streaming services into one place. 

I vastly underestimated the data side of all of this. I thought that we would just focus on the front end. There's got to be great data providers in this market. That won't be a problem. I was completely wrong.

We started out with GraceNote. They were kind of the name brand. The data quality we were getting from them was absolutely killing our user trust and retention numbers, because we would tell users that this title's on Netflix, and then the user would click play, and it left a month ago.

Or the opposite. There would be a massive title that well-known to be on HBO Max. That streaming availability wouldn't be there.

And then there were issues with duplicates and missing data fields. There were just a lot of problems that we ran into. And we realized that if we don't solve this problem, the company is going to die, because no feature we could build would outweigh presenting bad data to our customers. Because they would click play, oh, it's not on Netflix, this app sucks, and then they were gone forever. 

So we built out our own, we tried every other data vendor in the market, found everyone was kind of at or below GraceNote's level, and then we basically necessarily, the mother of invention, built out our own data set, because we needed it. I had cut my teeth in the early days at Facebook and scaled some stuff using machine learning there, and took the same approach to this problem.

We built out our own machine learning model, invested probably $20 million in it, with over 100 million users on our platform helped training that model over eight years. And then we finally realized, great, we have the data set that we need for a consumer product. And at the same time that happened is when some of the biggest tech companies in the world were running into the same problem we did.

The big search engines and other biggest consumer tech companies ran into the same problems, came to us, and asked who licensed our data from. They couldn't solve it and after we explained that we built our own, they offered some sizeable checks we couldn't ignore.

And that is how we stumbled into the data licensing business. Since then, it's just grown organically, just all inbound requests.

That's how we've stumbled into a few different verticals. One market is consumer tech companies, the kind of big FANG companies, now some of the AI companies that need accurate real-time stream availability data and metadata. And the other is the media and entertainment verticals, like Warner Brothers, like the new deal with Philo, the various streaming services, they all use our data for a variety of use cases.

Let's talk about the data a bit. Generally speaking, how many data fields might be included for a single title? How deeply can you dig into that data and assemble whatever dataset a customer is hoping to develop?

David Sanderson: The way we think about our data is that obviously, A, it's the database we wish existed when we had started out, but it's kind of a one-stop shop. And because we ingest data from just about everywhere, for any given title, we are going to have a lot of duplicated data. In some fields, we may have up to 12 layers of redundancy. Our system can figure out "this is the best one to use."

This is the best synopsis of all the ones that we have. We'll pick out the best one and store that as a master entry. The metadata, as you can imagine, it's very rich, and we continue to enrich it as time goes on.

That moves a little bit more slowly when you compare it to streaming availability. So for that title, where it's available to stream, titles are coming and going from all the streaming services as we speak right now. So, that data would be changing in effectively real time, following where that piece of content is available to stream.

I spoke with someone recently at Fox, and their ad-supported streamer Tubi is the first streamer to have an app inside ChatGPT that allows viewers to find movies or shows to watch by using conversational phrases. Is that competition for you or do you see that as a complimentary offering?

David Sanderson: Fox and Tubi have been some of our oldest customers. And I think they're awesome flagship customers for us, because what they do with our data is very cool.

So that feature is coming out of the chat experience. But the base layer you need for that to work is again, accurate data. You need to have rich taxonomy for that data, things that describe, "is it about a coming-of-age story?"

It is about having detailed descriptions for that piece of data, accurate data about the synopsis, accurate cast and crew information, all of that stuff. That is the foundational layer to these type of experiences.

I guess if someone had unlimited money to throw at you, you could put together data for them that would allow their subscribers to parse the content however they wanted to parse it, or within reason, anyway.

David Sanderson: Yeah, that's exactly how it works. We give them the raw ingredients of the whole full data set, every show and movie and all the metadata you could ever need about it in a very clean package. And then you can go and build these types of experiences.

And we'll also be working for some of the big name AI companies. We'll be announcing one coming up in a bit. But they rely on this data because even those companies can't get this type of data.

There was one example we always talk about internally. It was some movie about Las Vegas. And when you asked one of the LLMs what the film was about, you were told something like "Oh, everyone is going to have a wild time in Las Vegas."

But actually, the movie was about the shooting that happened in Las Vegas. So that's a good example. You need accurate data behind it.

And the other thing is, with AI or chat experiences, if you want to know where a piece of content is available, a lot of their data is compiled from crawling the web. And they'll crawl a bunch of articles that say "Hey, this film is on Hulu."

But sites don't write a follow-up article that says "This film is now off Hulu.," And that can create incorrect data that the LLMs are presenting. We just did a study on this topic and found that when you use a standard issue LLM for questions like this, about half the time they're presenting incorrect data. 

How has the FAST explosion changed your business? Because that's the feedback that I get from readers sometimes the most is, I like watching FAST channels, but I never know what the hell's on any of the channels and where I can watch something.

David Sanderson: Yeah. There's so, so many. We can at least arm these companies with good metadata so that they can surface titles within their FAST, which is part of what this partnership with Philo is about. Providing the rich data, matching that with what your users watching. And very detailed viewing data. Not just "Oh, they like comedies." That's kind of a shotgun approach. But providing more a lasered approach. For instance, do they like coming-of-age movies from the medieval era that feature dragons? You can get very specific and that can help surface the right content or the right channels from this ever-growing list of fast content.

One of the ways that content discovery happens now is basically machine learning and in a recent piece I gave Netflix as an example. I watch a lot of non-English speaking content on their service. And over the last few months, the recommendation engine has learned that and now about 90 percent of what they offer me falls into that category. The problem is that there are plenty of other things I like to watch as well. But the system doesn't have a way of differentiating that. And I suspect that's just an inherent problem with machine learning. That people are unpredictable and that makes it difficult to surface the content they really want to watch at that moment.

David Sanderson: I think that's something that goes into the recommendation algorithm or model that all these companies are using. And what timeframe do they look at? The one that I always think about for this is YouTube.

I think YouTube is at the far end of that timeframe. Where I feel like if you watch one piece of content about one topic, then all of a sudden your whole YouTube just changes over to that thing. And you're like, well, no, I'm still interested in the other subjects or things that I like. So yeah, I would love to, I don't admittedly like have much data on that, but I've always been curious to look under the hood and see what that recency bias for these recommendation algorithms might be.

The only other thing I'd add to that is I actually think Spotify has always been a good sort of North Star for people to look at. I remember talking with their team years ago and they have a deep understanding that who you are as an individual changes throughout the day. So the music you want to listen to when you're driving to work versus coming home versus lunchtime versus nighttime.

Those are almost different personas and then having their algorithm understand that and present different stuff is key. I think to your point, if you're watching this one type of content, if they can understand there's different elements of Rick, how can we serve all of the elements of Rick's interests?

I wanted to talk specifically a bit about the deal with Philo. I'm wondering when you first had conversations with them, how did you make the case that "We can help with these problems. This is something we can do for you that would be helpful?

David Sanderson: What was awesome about it is they came to us and this has been, funnily enough, one of our biggest growth...I won't say levers, it's not even a strategy, but it's people that use our data at one company, they go to the next company and the first thing they do is they'll say "We need to get Reelgood data here. And that's what happened here.

They're a great example of the myriad of use cases that streaming services use us for. I was in the product side, they want to do machine learning and help recommend the right content from their catalog to their customers, as well as just make sure that they're presenting rich metadata on the consumer experience. So that's one part.

I think the problem that a lot of these companies run into is that they're using a data provider, or maybe a handful of data providers. They're getting feeds from the services or from the companies that they're licensing content from.

They're having to sort of mash it together. Then it's still got holes, they got to manually clean stuff, they got to check it, there's errors in it. I think what they liked about us in their previous role, and obviously now is it's just, we're a one-stop shop where they don't have to worry about any of that. The data is just nice, filled, clean, and matched in one nice package. They can go work on higher level stuff. 

The analogy I always give for that is it's like with a chef, your chef shouldn't be peeling onions. You should let a chef work their magic. What we do is we've sourced the best quality ingredients for you. We have them chopped and cleaned and sitting on your counter so you can go do cool stuff with them. So that's kind of what we're doing with Philo.

But then the other side of this deal is another part of Philo, which is more on the licensing and strategy side. And with our data, you can see everything from how does our catalog compare to our competitor's catalog? What is our competitor? What genres are our competitors selling? Do we seem to invest in or divest in it? Or if you're on the marketing team at Philo, do we have the most horror content? That's something we should talk about because that's unique with our catalog compared to our competitors. 

And then on the licensing side of things, there's obviously a high level strategy of thinking about what type of genres or tags or content should we be buying? And then also where has that content been available historically. That is a huge thing.

Because if you're looking to license a title and say it's been on Netflix for the past five years, that's going to be different than something where maybe you could be the first AVOD or first SVOD to bring that title to market or it's not available anywhere else. And you could sort of be the one holds that title.

As a journalist, it's interesting trying to cover a business like Reelgood, because, number one, you have a lot of granular information that would be useful to have. But also, I think readers have this idea that since the streaming business inherently is built on data, the streamers have so much useful data on people.  But as you know, there is just an amazing amount of garbage data out there.

Whether it's determining who's seen an ad or where it's running or why people are watching, you are one of the few places that I've talked to where I feel like. okay this is the useful data that's really helpful. And it's, as you said, a one-stop shop. And the industry needs more concrete data. A lot of times it feels as if streamers are making content decisions based on vibes. Along the lines of "This feels like it would be a good FAST channel for us to watch or add, but we can't quite articulate the reason why."

David Sanderson: I'm always happy to talk to people that sort of understand this. Because it's funny sometimes you talk to someone that's maybe new in their career in the space or they're a new entrant as a company into the space. And they make the same mistake I did. They're thinking "Oh, this has got to be simple."

And then you get your hands on the data and you see that you can have two different streaming services with the same title. Not only do they have their own unique ID for it, but you would think the runtime would be the same. It's not.

The release year, plus or minus a few years, the cast about 60% the same. It's shocking how just disparate and sort of messy and disconnected the data is. 

You're able to do a lot right now with the metadata. Your customers are able to do a lot with the metadata. Is there something that in the back of your mind, you're thinking, "You know, five years from now, it would be nice to be able to do this."

David Sanderson: I mean, global expansion has been a big thing for us over the past year. And we're continuing to roll that out.

That came up with one of our big customers. They wanted to go into India. And if you think the data is messy and disparate in the US, once you go and start to look across the world. India, huge market, huge amount of data, but not not organized and kind of disparate to find.

We were curious to see if our ML model could handle India. Luckily, it did with flying colors. And that unlocked for us where we realized "Oh, OK, it can sort of make light work of one of the most complicated markets."

We've been then running through a lot of other countries and expanding. So I'd say just an overall global expansion for our offering. And then the other thing that we're looking at is, you know, a lot of our customers, once they use our data, they we are everything in the on-demand umbrella.

AVOD, SVOD, DVOD, anything on demand we have and seem to be fully global for that. But as you mentioned, things like FAST and sports and other things, we're getting a lot of requests from our customers looking for us also put FAST and sports into a neat little package for them.

So that's another thing we're just kind of eyeballing right now and scoping out how could we serve our customers so that truly, no matter what type of content it is, they have like a one-stop shop solution.