The Product is Literally Debugging Itself

The Accidentally Self-Healing Product

One of our employees ended up creating, completely by accident, a self-healing product. We figured this was a really good use case worth writing about - I wanted to explain how it works, what it does, and what the actual benefits have been. My hope is here is that it's just the start of how LLMs can truly in the background start improving and making tweaks little by little.

Before I get into the recipe, let me give you the ingredients.

Every company has product analytics tools - (PostHog, Amplitude, or Hotjar are good ones) attached to both their landing page AND the products. These track everything from what users clicked on, how long they spent on each page, what errors were seen, and more. In practice, product managers are supposed to look at this data. But because there is so much of it, for thousands of sessions an hour, that it's pretty impractical to do much more than creating usage funnels and understanding drop-off points. The data is used in aggregate, but the "sauce" actually has a ton of value.

In the customer experience world, teams have a way of understanding how the business is performing, and it's completely different from looking at how the customer views the product. Oftentimes, CX teams will use NPS (Net Promoter Score) for overall product sentiment and CSAT (Customer Satisfaction Score) for scoring individual interactions. Both work by surveying the customer — asking how they feel about the product as a whole, or how a specific conversation with support went.

We took that concept and thought, what if we could use the data we were already getting from PostHog and create an implied NPS score for each session? Effectively "guessing" what the response would be if we asked the user the question: "Did you get what you wanted out of our product? Would you come back again?"

This is where our self-healing product begins to take shape.

Implied NPS

Every session a user has on Cotera gets dumped into the data warehouse. Every error, click, how long they spent on each page, all the standard analytics. But instead of having a human look at that data, we start by passing the session into an LLM with a large prompt that describes the underlying data and how to read it.

The LLM looks at the session, understands the routes the user took, and grades the output. If they signed up, had a long session with our AI agent, built a workflow, and spent more than 15 minutes, that is a 9/10. If they hit an error with a tool, got frustrated, and left, that is a 1 or a 2.

The trick here is that we never actually asked the user how they felt about the product. We did not reach out to them. We did not give them a banner on the page saying, "would you recommend this to a friend?" Everything is done in the background.

But at the core, because we understand the route that the user came in through, we actually already understand the intent. We can have the LLM figure out, one, was Cotera actually the right product for them? And two, if it was but it did not work properly, what actually happened?

So the LLM reads every single user session, scores the output, and then writes a blurb on what went right and what went wrong. And that gets stored back in the data warehouse:

An example:

{
  "User_intent": "signed up to score accounts that are in hubspot",
  "User_success": "spent 15 minutes on the platform. Successfully connected hubspot, but API wasn't configured correctly, leading to a 403 error.",
  "Improvement": "improve API docs to explain what scopes are and how to permit them",
  "Department": "sales",
  "Score": 5
}

We call this layer of agents the context gatherers, or the CX AI agents. Their job is to observe, understand, and report.

The Product Managers

The data is great, but we don't have a self-healing loop yet, and that's where we want to get to. We've got the raw information, but there's no "action layer". Great, we know what customers are complaining about. What can we actually do about it? This is the common trap and pitfall for most survey driven forms of interaction. The company might spend time understanding the problem, but because product and CX sit under different organizations, they don't have a pathway to fix the issues. They just know what they are.

How do we solve for that? We have another agent at the end of the week that takes the outputs from the first AI, reads them all (every single session) and makes tickets - yes, it actually makes tickets in Linear for stuff we need to fix. It looks at the failures, groups them into patterns, checks our codebase on GitHub, and decides the best possible route for the implementation fix. If 40 sessions failed on the same issue, that gets a ticket, with a high priority. If two people had a weird edge case, it gets flagged but not ticketed.

Also - the tickets are not just bugs. Because there are improvement suggestions on the underlying data, there are genuine improvements that are more high level, like: "users are trying to upload large CSVs and programatically update them, but the site runs out of memory". These aren't immediately "fixable" and require further changes - but our team looks at them and tries to figure out what to prioritize.

The first one is basically a support employee. The second one is a product manager. And then the third one is an engineer.

Skynet

The last layer is the software engineers, or what we call Skynet. We call it Skynet affectionately because it is borderline insane. Skynet runs over the weekend.

This layer takes the tickets from the product managers and figures out which ones are actual bugs. Not feature requests, not "rethink the flow," just bugs. A tool that keeps timing out because it is taking longer than the standard set, or a JSON response that does not validate through the schema. These tickets then are automatically assigned as "Skynet" which means that they are able to be fixed automatically. A Cursor or a Devin agent picks them up, tries to fix them, and just like a junior engineer would, sends the PR to someone for review.

The second layer is more interesting - these tickets are improvement suggestions, and we don't want AI fixing them - mostly because sometimes the issues are issues by design, as it's impossible to give any AI 100% context. It can only be as good as the data itself. These "improvements" are looked at by the team weekly and used when we prioritize what to build and ship next.

Skynet takes that ticket, assigns it to itself, writes the code, and submits the PR for one of our human engineers to review.

Monday Morning

So: we have the three layers. We have our context gathering support agents that are watching all of the user sessions, we have our program managers that are putting that into context, and we have our software engineering layer to go through and actually solve the bugs.

Every Monday morning, our team sits down and reviews everything. What did last week look like? What were the bugs? The expensive, thought-requiring tasks from the product management layer get assigned to real humans to investigate. And the AI agents that worked through the weekend and fixed bugs? We review those PRs over coffee.

We step into the week with a ton of momentum and data. It lets us step into the week with a real win, and makes it so we're ensuring that our human employees are only spending time working on the most high leverage thing for our customers. If AI can do it, let AI do it.

Why Would You Not Do This?

Why would you not do this? The problem is in the architecture - it's hard to create true, qualified action points from a mixture of quantitative and qualitative data, difficult to parse that into specific tasks, and we don't have the luxury of 20 engineers with spare time on their hands.

I'll say this - Cotera is a great product to run this automation. All of the data for both the company AND the product is already in the data warehouse, the tools are hooked into the platform - all you have to do is give the system context and wire it up.

The future for this "self healing product" is to turn it into a "self healing business". Product is only one half of the story. We're going to attach this same feedback loop to both our customer success and sales teams:

What are Sales reps saying and doing that is working well? What isn't? For weekly 1:1s, can we give the Sales manager the top 1-2 things to give the rep to improve on?
What customers are upset? Which accounts are having the largest problems with both the product and their CSM? Who's at risk of churn?
And lastly - outside signals. Which current customer has new users signing up from organizations that are not currently in? Which customer has had a layoff?

All of the above teams have the same problem - there's a lot of data, and it's hard to turn it into actionable intelligence.

We've already seen a 200% productivity gain from Skynet. What can we do if we expand this same sort of thinking to the rest of the business?

The Product is Literally Debugging Itself

The Product is Literally Debugging Itself

Implied NPS

The Product Managers

Skynet

Monday Morning

Why Would You Not Do This?

For people who think busywork is boring