Hi HN,
We’re Ahmed, Cedric, Matt, and Mike from Narrator (
https://www.narrator.ai).
We’ve built a data platform that transforms all data in a data warehouse into a single 11-column data model and provides tools for analysts to quickly build any table for BI, reporting, and analysis on top of that model.
Narrator initially grew out of our experience building a data platform for a team of 40 analysts and data scientists. The data warehouse, modeled as a star schema, grew to over 700 data models from 3000+ raw production tables. Every time we wanted to make a change or build a new analysis, it took forever as we had to deal with managing the complexity of these 700 different models. With all these layers of dependencies and stakeholders constantly demanding more data, we ended up making lots of mistakes (i.e. dashboard metrics not matching). These mistakes led to loss of trust and soon our stakeholders were off buying tools (Heap, Mixpanel, Amplitude, Wave Analytics, etc…) to do their own analysis.
With a star schema (also core to recently IPO-ed Snowflake), you build the tables you need for reporting and BI on top of fact tables (what you want to measure, i.e. leads, sales…) and dimension tables (how you want to slice your data, i.e. gender, company, contract size…). Using this approach, the amount of fact and dimension tables grow in size and complexity in relation to the number of questions / datasets / metrics that need to be answered by the business. Over time the rate of new questions increases rapidly and data teams spend more time updating models and debugging mismatched numbers than answering data questions.
What if instead of using the hundreds of fact and dimension tables in a star schema, we could use one table with all your customer data modeled as a collection of core customer actions (each a single source of truth), and combine them together to assemble any table at the moment the data analyst needs that table? Numbers would always match (single source of truth), any new question could be answered immediately without waiting on data engineering to build new fact and dimension tables (assembled when the data analyst needs it), and investigating issues would be easy (no nested dependencies of fact and dimension tables that depend on other tables). After several iterations, Narrator was born.
Narrator uses a single 11-column table called the Activity Stream to represent all the data in your data warehouse. It’s built from sql transformations that transform a set of raw production tables (for example, Zendesk data) into activities (ticket opened, ticket closed, etc). Each row of the Activity Stream has a customer, a timestamp, an activity name, a unique identifier, and a bit of metadata describing it.
Creating any table from this single model made up of activities that don’t obviously relate to each other is hard to imagine. Unlike star schema, we don’t use foreign keys (the direct relationships in relational databases that connect objects, like employee.company_id → company.id) because they don’t always exist when you’re dealing with data in multiple systems.
Instead each activity has a customer identifier which we use, along with time, to automatically join within the single table to generate datasets.
As an example, imagine you were investigating a single customer who called support. Did they visit the web site before that call? You’d look at that customer’s first web visit, and see if that person called before their next web visit.
Now imagine finding all customers who behaved this way per month -- you’d have to take a drastically different approach with your current data tools. Narrator, by contrast, always joins data in terms of behavior. The same approach you take to investigate a single customer applies to all of them. For the above example you’d ask Narrator’s Dataset tool to show all users who visited the website and called before the next visit, grouped by month.
We started as a consultancy to build out the approach and prove that this was possible. We supported eight companies per Narrator data analyst, and now we’re excited for more data folks to get their hands on it so y’all can experience the same benefits.
We’d love to hear any feedback or answer any questions about our approach. We’ve been using it ourselves in production for three years, but only launched it to the public last week. We’ll answer any comments on this thread and can also set up a video chat for anyone who wants to go more in-depth.
My unsolicited $0.02 - I think your approach is spot on.
As a company, you will never have one consistent data set and metrics if you keep building an individual model for each user / use case / etc. And I've seen the explosion of tables and models in real-time. They just keep growing. And how do you even know that the question you're asking in your dashboard is pulling the information from the correct table? I've yet to see a data team that didn't have to deal with drift. Plus, there's a real cost of storing all these stale tables that nobody is looking at anymore.
What your product is doing is what I see companies already trying to accomplish themselves [somewhat]. For the leading companies when it comes to working with data, the warehouse today is already the source of truth, with one dimension table that points back to the SaaS tool / dashboard via an S3 bucket. So the SaaS tool itself is really only the last mile and visualization layer. Run the model, create the table, offload the table to S3, point the tool to the S3 bucket with the table. Update every 4 hours, etc.
dbt wins in that world. (and I assume you're using something like dbt under the hood of narrator.ai?)
That approach is already commoditizing the SaaS tool down to the visualization layer and the opinionated way of displaying data. But that still means there's at least one model per tool, use case, etc. with one table - and you still don't see the entire journey of the user, that's something you either have to create for a single specific use case, or cobble it together ad-hoc. If instead you have one table that has it all - you can move soooo much faster with data, and take out all the friction that comes from having disparate data sets.
narrator.ai wins in that world.
Blinkist is Berlin is following a very similar approach to what you guys have built. This deck is a few years old, but I think the approach described will resonate with you:
https://www.slideshare.net/SebastianSchleicher/tracking-and-...
If I had to look into my Crystal Ball, I think one of your GTM challenges will be to convince existing data teams that everything they've built is somewhat redundant. On the flipside, I can see the same data teams say "OMG, finally!". I'm curious to hear the customer reactions so far.
I'm very excited about this product! I wouldn't be a direct user with my current role, but FWIW, I can share the bruises I got from working in this market.
Would love to hear more!