Intercom Monitors: Completing the Observability Loop for AI Support

Intercom's Monitors feature closes the Analyze phase of the Fin flywheel — bringing structured, repeatable QA to AI-powered support at scale. Here's what it means for your operation.

Chris

Chris

March 30, 2026 · 10 min read

Intercom Monitors: Completing the Observability Loop for AI Support
Opening the AI black box — Intercom Monitors brings structured QA to every Fin conversation

The headline of Intercom's announcement says it plainly: Opening the AI black box. That framing isn't marketing — it's a product philosophy, and Monitors is its clearest expression yet.

Most AI agent vendors keep the tuning inside the box. If something isn't performing, you file a ticket and someone on their team adjusts it for you. Intercom has taken the opposite position from the start: customers should be able to open the system themselves, understand what's happening, and change it on their own terms. As Paul Adams put it at Fin Labs Paris: "We want people — all of our customers — to be able to open up the box and actually change it per their requirements."

That philosophy underpins the Fin flywheel — a continuous Train → Test → Deploy → Analyze loop that the entire product is architected around. The idea is straightforward: every improvement cycle compounds the next. You train Fin on better content and clearer guidance, test changes before they reach customers, deploy to specific segments or channels, then analyze what's actually happening at scale. Rinse and repeat.

If you're unfamiliar with how Fin 3 works and why its architecture matters, our deep dive on the future of customer experience with Fin 3 covers the full picture — from workflows to AI-first design.

Monitors is the Analyze layer completing. Over the past year, Intercom had already shipped Insights — bringing CX Score, Topics, and Trends — and Recommendations, which surfaces one-click fixes for content gaps and configuration issues. Monitors, announced on March 25th at Fin Labs Paris, is the third and final piece: a structured, repeatable QA system that closes the loop between what you observe and what you actually fix.

Watch the full Fin Labs Paris announcement on Intercom Monitors — or explore the Analyze keynote on fin.ai

What Was Actually Missing

The distinction Intercom draws is precise and worth understanding. CX Score tells you how customers felt about a conversation. Monitors with Custom Scorecards tells you whether the conversation met your standards. These are different questions, and until now there was no scalable system to answer the second one.

Teams were still doing QA the old way: spreadsheets, manual sampling, spot checks against a tiny fraction of volume. As Fin scales — 8,000 customers, a 67% average resolution rate, 2 million queries resolved every single week including complex cases in regulated industries — knowing what Fin is actually doing behind the scenes becomes operationally critical.

The old infrastructure can't help: CSAT surveys cover roughly 8% of conversations, and ad-hoc sampling doesn't reliably surface your highest-risk edge cases or where quality is quietly starting to drift.

This is where many teams hit a wall — the gap between knowing something is off and knowing exactly what to fix. If you're building knowledge bases for AI-powered support, the quality of your content directly determines whether Fin's answers pass or fail a scorecard. Monitors makes that connection visible.

How Monitors Works

Monitors has two interlocking components: the monitor itself, which defines which conversations get reviewed, and Custom Scorecards, which define how each conversation is evaluated.

Conversation Selection

For conversation selection, you combine structured filters — channel, topic, resolution status, customer region, Fin-specific metrics — with natural language signals for the nuance that hard filters can't catch. A healthcare provider might target any conversation where a patient shows signs of financial distress. A SaaS company launching a new feature might create a time-bounded monitor to track how Fin handles questions about it in the first two weeks.

You can also run broad, consistent weekly samples simply to benchmark quality over time. The two approaches aren't mutually exclusive.

Custom Scorecards

Custom Scorecards let you define what quality means for your business specifically — criteria, weighting, scoring thresholds. Each criterion can be scored automatically by AI or assigned to a human reviewer, or both, within the same scorecard.

Intercom is explicit that this isn't about choosing between scale and judgment. A financial services company might have AI score for empathy and policy adherence, while keeping a human reviewer on accuracy for regulated topics. You weight what matters most, and can mark specific criteria as critical — a single failure on those fails the entire evaluation.

The Review Queue

When a conversation fails, it moves through a structured Review Queue: flagged, assigned, marked for a fix, tracked through to resolution. The loop that used to end at a score and get lost in a Slack thread or a spreadsheet export now ends at a documented improvement.

This is where Monitors ties directly into what we call proactive support with AI — the shift from reactive firefighting to systematic, anticipatory quality management.

The Real-World Validation

Two customer examples from the Paris panel illustrate why the coverage piece matters as much as the scoring.

🏥

Newman — Digital Healthcare

Newman, a UK digital healthcare company, started by manually reviewing 100% of Fin interactions to satisfy clinical governance requirements. Over time, as confidence grew, they brought that down to a 5% monthly sample — still a significant manual overhead. Rydian from their team was direct: Monitors gives them a path back to 100% coverage without the manual cost, while still satisfying the safety and compliance requirements that matter in a regulated environment.

🚀

Glean — Enterprise AI

Glean went from 41% to 100% Fin involvement rate while holding resolution rates above 80%. At that scale and velocity — their product cycle moves fast, they support industries including healthcare, public sector, and financial services — the ability to set up targeted monitors per product launch or policy change, and have a structured view of quality across all of it, is exactly the kind of operational control they need.

The Reporting Layer

Everything flows into Intercom's custom reporting. QA scores from Monitors sit alongside CX Score, resolution rate, and involvement rate in one place. The patterns this unlocks are genuinely new:

  • A quality dip that correlates with a recent knowledge base change
  • A specific topic that consistently underperforms
  • A team whose scores are improving week on week

QA stops being a periodic exercise in a separate tool and becomes a continuous signal integrated into how the whole operation is monitored.

For teams exploring how to connect these signals with broader automation, our guide on SaaS automation workflows covers how platforms like Intercom fit into a larger operational stack — and how to avoid the common traps.

What's Coming Next

Monitors currently covers Fin conversations. What's on the roadmap extends its reach significantly:

  • Human agent QA — extending the same scorecard framework to your human team, giving one unified quality system across the entire support operation
  • Real-time alerts — for high-risk conversations as they happen, not after the fact
  • Knowledge base evaluation — scoring responses directly against your latest policies and documentation, with clear rationale linked to the relevant source
  • CX Score benchmarking — showing your score relative to other companies in your industry, giving teams still calibrating what "good" looks like much-needed context

The knowledge base evaluation piece is particularly interesting if you're already investing in optimising how Fin handles quick questions — Monitors will soon tell you exactly where your content is falling short, per conversation, with evidence.

The Practical Implication

The flywheel only works if every phase works. Training and Testing have had significant investment recently. Monitors completes Analyze.

With Insights giving you the why behind performance, Recommendations telling you what to fix, and Monitors telling you whether your standards are being met conversation by conversation at full scale, the loop is genuinely closed.

The question worth asking now isn't whether to set up Monitors — it's which monitors to build first.

If you need help designing a QA strategy for your AI support operation — from scorecard design to knowledge base structure to the monitoring cadence that fits your risk profile — that's exactly what we do at dot2.solutions.

The AI black box is open. The question is what you do with what you find inside.

Need help setting up Monitors and QA for your Fin instance?

We design monitoring strategies tailored to your risk profile — from custom scorecards to knowledge base audits to escalation workflows. If you want Fin's quality to match your brand standards, we can help.

Explore Our Intercom Services →
IntercomAI AgentsFin 3Quality AssuranceCustomer ExperienceObservabilityCX ScoreAutomationSupport Operations

Domande frequenti

Hai altre domande?

Contattaci

Share this article

Need Help with Intercom & Fin?

As an Intercom Silver Partner, we specialise in Fin deployment, knowledge base architecture, and AI support automation for Swiss SMEs.

No commitment required • Free 30-minute consultation • Expert guidance