Changelog

What we shipped

A timeline of features, improvements, and integrations across the Converra platform.

June 2026

improvementJun 17

Conversation Insights: every finding shows its receipts

Conversation Insights now grounds every finding in evidence. Each issue is checked against what the agent actually ran — the exact prompt and runtime configuration, the specific transcript turns it's based on, and deterministic URL checks — and the judge marks a claim unverified instead of asserting it when the source isn't in the captured trace. The result is a step-change in accuracy over the past two weeks: fabricated and misattributed findings are largely gone, and every surfaced issue links to the precise step and rule behind it.

improvementJun 17

Conversation Insights, redesigned

The conversation view now leads with a plain-language summary and unifies every issue into one set of finding cards, grouped by the output field they affect. Open any finding to see its evidence — the transcript turns, the prompt rule, and the runtime context it rests on — so the judgment shows its work inline. Cleaner density, local-time dates, and the analyzed date on every insight.

improvementJun 16

Hardened Conversation Insights generation

Conversation Insights now generates completely and reliably. Behavioral and absence findings — “skipped qualification”, “exceeded the length cap”, “ignored prior context” — that don't map to a single verbatim quote now ground against a window of transcript turns instead of being dropped, and the summary and root cause keep every finding they can support rather than collapsing to a generic fallback when one reference doesn't resolve. Over-length or partially malformed model output is repaired and recorded instead of discarded, and batch generation is provider-neutral — so an insight is no longer lost over a few characters or a single unresolved reference.

featureJun 14

Mark a finding wrong, and it stays fixed

You can now flag any finding as a false positive or as by-design behavior straight from the conversation view. The verdict is recorded against that finding and carries into how the agent's future conversations are judged — so a correction sticks instead of resurfacing.

improvementJun 8

Readable agent and prompt names

Agents and prompts now show human-readable names across the app instead of hash-qualified identifiers. Logical role labels — orchestrator, responder, and named agents — replace raw content hashes in conversation views, scorecards, and finding context, and known brand/prompt names are surfaced where we can resolve them, so you can tell at a glance which agent a finding is about.

integrationJun 5

LangChain tool calls captured from LangSmith

Tool-calling analysis now works for LangChain agents imported via LangSmith. A LangChain trace serializes tool calls differently from raw OpenAI or Anthropic, so a tool-using agent previously imported with zero tool calls — silently hiding its tool usage. Converra now reads LangChain's message format, capturing both the tool calls and the bound tool definitions, so tool-call grading lights up automatically for LangSmith customers.

improvementJun 4

PR Impact is honest about confounded results

PR Impact no longer credits a fix when its effect can't be isolated. A deployment whose change landed alongside other changes — so the before/after is confounded — is flagged and held back from the impact rollup instead of being reported as a clean win. PR Impact is now also available over the MCP API, so you can pull a deployment's measured effect programmatically.

improvementJun 3

Success score measures execution, not outcome

The conversation success score now grades how well the agent executed — did it answer accurately and advance or qualify the conversation — rather than whether the user happened to convert. An agent isn't penalized for an outcome it doesn't control, but a passive “answers the question and stops” agent still scores low, because answering isn't advancing. Scores were recalibrated under a stamped rubric version.

featureJun 2

State of the Agent

The agent page now opens with a State of the Agent verdict — a plain-language read on how the agent is performing, what's working, what isn't, and the top fixes to do next, sitting right above the scorecard tiles. It surfaces the same agent-level analysis Converra already generates, promoted out of a collapsed panel so the verdict is the first thing you see.

featureJun 2

Output Quality Rollups

Output Quality now rolls up from conversation-level artifact checks into a per-agent scorecard on the agent page — output score, issue and concern counts, affected conversations, weakest dimensions, and a direct conversation receipt for each surfaced example.

featureJun 2

Tool Calling Scorecard

A new Tool Calling page scores how well each agent actually uses its tools — tool success rate, argument validity, missed tools, and per-tool health across description, parameters, and response — then surfaces the exact conversations where a tool call went wrong. Import an agent's tool definitions (paste an OpenAI or Anthropic tools array) to unlock argument-validity and schema-quality grading; or skip the paste entirely — the Converra AI SDK middleware captures them automatically, and the Node and Python SDKs now forward them too.

integrationJun 1

SDKs Forward Tool Definitions

The Converra AI SDK middleware (v0.2.1) now captures the tools array you pass to the model — including AI SDK v5/v6 `params.tools` — and forwards it as tool definitions, with matching support in the Python SDK (v0.3.4). Tool-calling grading — argument validity and description / parameter quality — lights up automatically for instrumented agents, with no manual import.

improvementJun 1

Unified, Evidence-Backed Conversation Findings

Every issue flagged for a conversation — insight, tool-call, and output-quality findings — now lives in one Conversation Findings panel instead of scattered sections. Each finding anchors to the exact transcript turns it's based on, and cited URLs are checked against the captured runtime context, so the judgment shows its receipts inline. Cleaner row layout, no claim without evidence.

improvementJun 1

Pattern Prevalence in PR Impact

The PR Impact view now shows how prevalent each targeted failure pattern was across a deployment's conversations, with a drilldown into the specific patterns — so you can see not just that a fix shipped, but how much of the fleet it actually touched.

improvementJun 1

Faster Agents & Prompts Pages

Agent and prompt list views now read each agent's current production score from a maintained value instead of recomputing every agent's 30-day rolling score on every page load. The score refreshes the moment a conversation is scored, with a nightly pass for window aging — so large fleets that previously took several seconds to load now render quickly, with no change to how scores are calculated.

May 2026

featureMay 31

Cost Efficiency Lens

A new Cost page — and a lens on the Agents view — tracks spend, token usage, and cost efficiency per agent, including cost per conversation and per resolution, ranked by measured spend. A model-switch simulator projects what moving an agent to a cheaper model would save, so you can weigh the trade-off before changing anything. Powered by token usage captured from LangSmith, Langfuse, or the API.

improvementMay 31

90-Day Time Window

Fleet and agent views now offer a Last 90 days option in the time filter, alongside 24h / 7d / 30d / all — useful for slower-moving trends and seasonality that a 30-day window misses.

improvementMay 29

On-Page Nav for Agent Analysis

The agent analysis view (`/agents/:id`) now has a sticky "On this page" nav that groups its sections — Overview, Trace Analysis, Content Analysis, Tools, Optimization & Versions — with scrollspy highlighting and hash deep-links. The page was previously one long scroll with no way to jump between analysis types.

improvementMay 29

Look Up Conversations by Session ID in MCP

The `get_conversation` MCP tool now accepts an external session id — your LangSmith, Langfuse, or Converra SDK `sessionId` — as an alternative to a Converra `conversationId`, resolving it through the canonical by-session route. Coding agents can pull a conversation straight from the id they already hold.

featureMay 29

Runtime Trace Lineage

Conversations now carry end-to-end trace lineage from your runtime. A new `GET /api/v1/traces/[traceId]/lineage` endpoint resolves the full chain behind a conversation, and the SDK surfaces it — so a Converra conversation links back to the exact trace that produced it.

featureMay 28

Tool Execution Analysis

Tool and function calls in a conversation now render in an always-visible Tool Execution panel — tools available vs. called, uncalled tools, and a per-call input→result drilldown. Tool execution is normalized into a single redacted read model, projected to the v1 API and MCP, and the SDK exposes nested tool traces. Tool-using conversations ingested via SDK/API no longer show empty tool data.

featureMay 28

Agent Output Scoring

Evals now detect the outputs an agent produces — structured output or substantive tool-call results — and score them against the agent's own instructions across structure and methodology. Generic by design, so it works for outlines, plans, form specs, configs, or queries. Shown as Output Quality in the conversation view, and on `ConversationInsights.artifacts` via the v1 insights API and the `get_conversation_insights` MCP tool.

featureMay 28

Filter Conversations by User & Metadata

The conversations list can now be filtered by end-user and by any metadata you attach at ingest, so you can slice to a single customer or cohort without exporting.

improvementMay 28

Claude Opus 4.8 in Model Catalog

Claude Opus 4.8 is now available across the model catalog, runtime registry, and per-model guidance — 1M context, 128k output, and meaningfully more reliable at catching code flaws. Opus 4.7 is retained for stored-document compatibility; no defaults were changed.

improvementMay 28

Sharper Conversation Diagnosis

Transcript diagnosis is now anchored to the insights LLM's parsed root cause instead of running blind, and it diagnoses clean user/assistant turns rather than a system-prompt-padded slice — so the diagnosis matches the rest of the insight. Step diagnosis is also overwritten on regeneration instead of preserving stale results.

improvementMay 27

User-Facing Conversation Summary

Insights can now generate an optional plain-language summary aimed at your end users, configured per customer via the v1 settings API or the `update_settings` MCP tool. Off by default; when enabled it's surfaced everywhere the technical insight is — auth + v1 routes, MCP, the `insights.generated` webhook, the SSE stream, and the conversation view.

featureMay 25

PR Impact View

New `/fleet/pr-impact` page — also embedded inline in Fleet — tracks each merged PR's outcome on production: conversations counted, score delta, verdict, and a per-PR drilldown into the affected agents.

featureMay 25

Session-ID Conversation Lookup

New `GET /api/v1/conversations/by-session/[sessionId]` resolves a Converra conversation directly from your own session id — no MCP roundtrip required. Appending a message to an unknown session id auto-creates the conversation.

improvementMay 25

Claim Evidence on Conversation Findings

Every LLM-judged finding in the Conversation Findings panel now anchors its claims to the specific transcript turns it references. The judgment shows its receipts inline instead of asking you to trust it.

improvementMay 24

Runtime URL Evidence Checks

Insights now validate every URL they cite against the captured runtime context. Fabricated link claims are flagged at generation time instead of slipping into reports.

integrationMay 21

SDKs Republished — Canonical Host + Envelope Unwrap

Node SDK 0.5.0, Python SDK 0.3.1, and the Vercel AI SDK middleware 0.1.2 all point at the canonical `https://converra.ai/api/v1` base URL. The Node SDK now unwraps the response envelope by default — `conv.id` reads directly from single-resource gets instead of `response.data.id`. Middleware flushes events immediately on capture instead of batching to 10.

improvementMay 21

Agent Metrics Surfaced in MCP

`get_agent` and `list_agents` now return real performance scores, conversation counts, and last-analyzed timestamps from PromptMetrics — not fake-zero placeholders. Missing scores render as null, matching how `get_fleet_overview` reports them.

featureMay 20

Fleet Movement in MCP

`get_fleet_overview` now answers "is the fleet getting better or worse, and where?" — agents and patterns are tagged with movement direction and confidence, so coding agents can triage what changed without a separate query.

improvementMay 19

Deployment Verification — Verdict First

Optimization detail now leads with the deployment verdict and surfaces a before/after diff of the prompt change, so the outcome and what shipped are visible without scrolling.

improvementMay 18

Production A/B Test SDK Hardening

Node SDK 0.4.1 closes traffic-assignment gaps, preserves SDK-side test assignments across rollouts, and hardens the production A/B test rollout path.

featureMay 15

Prompt Conformance Gate for Generated Variants

Generated variants must conform to the prompt contract before entering simulation. Rejections are recovered as warnings and fed back into the next generation pass instead of failing the optimization.

improvementMay 15

Head-to-Head Selection Integrity

Optimization selection, evidence, and insights now flow only through head-to-head pairs between baseline and each variant. Sub-1pp lifts are preserved, diagnosis-first variant generation is enforced, and intent constraints are aligned across every trigger path.

featureMay 13

Fleet Intelligence Redesign

Status-led triage rows lead with an impact-magnitude delta chip, action patterns replace the attention card, and rows drill into pattern evidence. New triage model splits needs_fix into untriaged and fix_ready, with auto-demotion on regression. Start fix opens inline from the row.

improvementMay 11

Regression Gate Overhaul

Regression gating now uses score-delta with variance-derived slack instead of pass-rate flips, eliminating false-positive demotions. Added aggregate demotion safety net, llmParametersHash in semantic-cache provenance with re-qualification on mismatch, and a configurable maxRegressions limit.

featureMay 7

Production A/B Tests for Agent Fixes

Test a recommended fix on a controlled slice of live SDK traffic before full rollout. Configure traffic, compare baseline vs. candidate fix on scored production conversations, then promote or stop based on lift and confidence.

featureMay 6

Agentic Readiness on Audit Reports

Audit reports now test how an agent handles AI-mediated traffic: identity acknowledgement, structured intent, multi-step delegation, and bot-block transparency.

improvementMay 4

Fixed Findings on Re-Audits

Re-audits now highlight issues that fired in the previous run but no longer reproduce, with validated/likely-fixed confidence labels and a dedicated Fixed tab.

improvementMay 4

Weekly Agent Report

Weekly emails now summarize what changed across the fleet: verified wins, regressions, unfixed patterns, recent PRs, fleet score, failure rate, and conversations analyzed.

infrastructureMay 4

Verified Public Audit Queue

Public audit submissions now use bot checks, daily capacity limits, and email verification before execution, keeping free audits reliable without requiring a full account upfront.

featureMay 3

Public Agent Audit

Paste any AI chat agent's URL, get a graded audit at a shareable token URL in under a minute. No account required. Save the report to a Converra account to track changes over time, or share it publicly to settle 'is this agent any good?' debates.

Learn more
featureMay 3

Pricing — 4-Tier Ladder with PAYG

New pricing structure: Free, Pay-as-you-go ($9 per audit), Pro ($299/mo with 15 audits + overage), Enterprise. Annual/monthly toggle. PAYG runs through a single Stripe-hosted checkout — no contract, no commitment.

Learn more
featureMay 3

Custom Metrics on Audit Reports

Audit reports include an agent-specific custom metrics section — a medical intake bot is graded against different criteria than a returns assistant. The default rubric still applies; custom metrics layer on top.

featureMay 2

Before/After Diff on Re-Audits

Re-audit a saved report and see what actually changed: score deltas per dimension, scenarios that flipped pass/fail, and a verdict shift banner at the top. Tracks agent improvement over time without manual comparison.

featureMay 1

Magic-Link Auth + Save Flow

Save any audit report to a Converra account with a single magic-link click — no password setup. Anonymous reports you ran before signing up automatically attach to the new account by browser attribution.

improvementMay 1

Eval Set Surfaced on Reports

Reports now show the eval set used for scoring. A 73 on Eval Set v3 is comparable across re-audits — the rubric isn't silently shifting underneath you.

April 2026

featureApr 29

Optimization Confidence, Reasoning, and Per-Scenario Regression in MCP

Three new fields on optimization results explain why a winner was picked: confidence score, plain-language selection reasoning, and per-scenario regression results — so you can see how each variant performed against each test case. Plus a new validate_variant MCP tool for running ad-hoc validations.

improvementApr 29

Optimization Status SSOT

Optimization process status now flows from a single derived source — eliminating cases where the UI showed 'running' after the backend had finished, and where auto-restart fired on terminal failures. The status displayed in dashboards and PRs is always the canonical one.

featureApr 27

Role-Family Awareness for Multi-Tenant Fleets

If you run the same agent (Discovery, Support, etc.) across multiple end-customers, Converra now treats the whole role as a family. PRs surface a 'Role family' section listing every customer the change touches, and API + MCP responses include agentType, customer, and instanceCount so any consumer immediately sees the 5-agents-across-30-customers shape instead of a flat list.

featureApr 24

Worst-Conversation Landing with One-Click Fix

Jump straight to the conversation that's hurting your agent the most, see the diagnosis, and ship the fix in one click. No digging through dashboards to figure out where to start.

integrationApr 24

Slack Notifications for Flagged Conversations

Get a Slack ping the moment Converra flags a bad conversation — with a link straight to the diagnosis. Stop finding out about quality issues from your customers.

featureApr 24

Incremental Conversation Streaming

New append endpoint lets you stream turns into an existing conversation instead of waiting for it to end and posting the full transcript. Lower memory, real-time insights, and better fit for long-running voice and chat sessions.

Learn more
featureApr 20

Team Invite Links

Share a single link to bring teammates into your Converra account. No more one-by-one email invites or waiting for approvals.

improvementApr 20

Critical Issues Surfaced First

Pending-deployment and template aggregate pages now lead with critical issues instead of burying them under noise. The things that actually break your agent show up at the top.

improvementApr 20

Quiet-Period Scoring for Low-Volume Agents

Live score window widened from 7 to 30 days, and the minimum sample size dropped from 15 to 5. Agents with modest traffic now keep a stable, meaningful score instead of going dark between bursts.

featureApr 18

converra.send() — Simplified Conversation Logging

One method to log conversations. Pass your messages and Converra handles the rest — create a new conversation or append turns to an existing one. Each message can optionally carry model, tool calls, token usage, and latency so you get trace-level detail without a tracing pipeline. Available in both Node.js and Python SDKs.

Learn more
improvementApr 17

Accurate Per-Agent Diagnosis in Multi-Agent Prompts

When a prompt orchestrates multiple sub-agents (greeter, researcher, closer), failures are now attributed to the specific agent that caused them instead of defaulting to the primary. Cleaner diagnosis, targeted fixes.

featureApr 9

MCP Performance & Impact Toolkit

Six new MCP tools that answer 'how is Converra helping my agents?' Fleet overview with real fleet score and failure rate. Daily score timeline with deployment markers. Cumulative impact summary. Verification evidence — the actual conversations behind every claim. Fixability breakdown showing what Converra can fix vs what needs engineering.

featureApr 9

Verification Evidence Trail

Every verification claim now has receipts. When Converra says '82% reduction,' you can drill into the actual pre-fix and post-fix conversations that were counted. Evidence is captured at verification time and retrievable via MCP or API.

improvementApr 9

Fixability Classification in Failure Diagnosis

Step failure aggregation now shows what percentage of failures Converra can fix autonomously vs what needs engineering work. Matches the Failure Triage card on the Fleet page.

featureApr 7

Cascade Detection

When a fix resolves one failure, conversations survive longer and may hit new issues downstream. Converra now detects these cascade effects and surfaces them on the fleet card — so you know the fix worked even when the overall score doesn't move.

featureApr 5

Production Fix Verification

Deployed fixes are now verified against production conversations. Fleet cards show before/after failure rates with 'Fixed — X% reduction' badges. Verification is graduated: monitoring → likely fixed → verified — so you see early signals before waiting for full statistical confidence.

featureApr 3

Converra on Converra

When an optimization doesn't find a winner, Converra now classifies why it failed, learns from the outcome, and automatically restarts with an adapted strategy. The same improvement loop we run on your agents now runs on ours.

featureApr 3

Agent Versioning & Sibling Lineage

Agent instructions now have full version history with lineage tracking. When one agent in a sibling group gets optimized, the others show staleness indicators so you know which agents are falling behind.

featureApr 1

PR Preview Dialog

Preview the full pull request before creating it — the diff, metrics comparison, diagnosed issues, regression results, and conversation replay. Review everything in one place before committing to the PR.

featureApr 1

Claude Code-Ready PRs

Optimization PRs are now structured for AI code reviewers. Full context, metrics, evidence, and before/after comparisons are embedded inline so Claude Code, Copilot, and other AI reviewers can evaluate the change without needing access to Converra.

March 2026

featureMar 31

Before/After Conversation Replay

Every optimization now shows side-by-side comparisons of how the agent responded before and after the change. See the actual improvement in context, not just a score delta.

improvementMar 31

Agent-Level Alerts

Instead of a separate email for every diagnosed conversation, you now get one agent-level alert that summarizes all issues across conversations. Less noise, same signal.

improvementMar 31

Copy Fix with Full Context

The copy fix button now includes testing methodology, regression results, real production conversation quotes, and review guidance — everything a reviewer needs without opening Converra.

featureMar 30

Organization Evaluation Rules

Define evaluation rules at the organization level that apply across all your agents. Custom rules are plumbed directly into evaluation prompts so every optimization and insight respects your business-specific quality standards.

featureMar 29

Multi-Agent Secondary Agent Insights

Each agent in a multi-agent conversation now gets its own scoped insights. Secondary agents surface their own failure patterns and performance metrics instead of being rolled into the primary agent's analysis.

featureMar 28

Business Impact Analysis

Agent and fleet insights now show behavior-specific failure patterns with real conversation counts — not generic buckets. Each pattern links directly to the affected conversations. Cost and upside cards estimate the business impact of fixing the top issues.

featureMar 26

Unified Issue Intelligence

Agent issues are now deep business insights, not metric labels. Each issue includes a headline, evidence from diagnosed conversations, a recommended fix, the team that owns it, and whether Converra can auto-fix it via prompt optimization.

integrationMar 24

GitHub Integration — Dependabot for AI Agents

Converra now creates pull requests in your GitHub repos when optimizations find improvements. Connect your GitHub, and optimization winners are automatically sent as PRs with metrics, evidence, and a one-click merge path. Supports auto-PR on completion, manual PR creation, merge-back sync, and Python agent file detection.

Learn more
featureMar 24

Auto-PR for Model Benchmarks

When a benchmark finds a better model, Converra automatically opens a GitHub PR to switch the model config in your repo — complete with a comparison table showing quality score, cost, and latency across difficulty levels.

featureMar 24

Dedicated Benchmarks Page

Model benchmarks now live on their own page with a dedicated nav entry. Browse all benchmark runs, view inline conversation scores, and launch new comparisons without leaving context.

featureMar 22

Fleet Page

See every agent's health at a glance. A single dashboard with optimization progress over time, a scoreboard ranking agents by performance, failure distribution, improvement potential, and pending deploys — everything you need to decide what to work on next.

improvementMar 22

Onboarding Redesign

The guided tour now starts with the Fleet page and walks through connecting your first agent. A faster path from signup to value.

featureMar 20

Zero-Code Instrumentation

Import converra/auto and every LLM call in your app is captured automatically — no wrapper functions, no code changes. Works with OpenAI, Anthropic, and Vercel AI SDK.

integrationMar 20

Langfuse & OTel Integration Parity

Langfuse and OpenTelemetry integrations now match LangSmith feature-for-feature — async sync triggers, pre-flight validation, usage limit checks, and import metrics.

featureMar 20

SDK v0.2.0 — LLM Client Wrapping

Wrap your OpenAI, Anthropic, or Vercel AI SDK client with one line. Every conversation is captured automatically. Multi-agent tracing links orchestrator and sub-agent calls into a single execution graph. A/B variant swapping tests optimized prompts against real traffic.

Learn more
featureMar 20

Python SDK Published to PyPI

pip install converra — Python SDK with sync/async/streaming support for OpenAI and Anthropic. LangChain callback handler included.

infrastructureMar 19

SDK Server Endpoints

New API endpoints for SDK integration — prompt matching by content hash, active variant lookup for A/B testing, bulk SDK configuration endpoint. Testing mode setting (proxy/simulation) added to dashboard.

integrationMar 19

Generic Trace API & SDK

Send traces directly to Converra via the SDK (converra.traces.create) — no LangSmith, Langfuse, or OTel pipeline required. The fastest path from your agent to Converra.

Learn more
featureMar 19

Human & Agent Feedback in Optimization

Give the optimizer direct feedback. Thumbs up/down from the UI or programmatic feedback via MCP tools — both feed into the optimization agent's planning so it learns from your judgment, not just metrics.

improvementMar 18

Benchmark Score Visibility

Benchmark comparisons now show actual per-conversation scores inline. See exactly how each model performed, not just a summary.

improvementMar 18

Results Page Redesign

Redesigned optimization results with an activity card and deploy banner. Clearer post-optimization experience so you can review and deploy faster.

featureMar 16

In-App Notifications

Get notified when optimizations complete or conversation syncs finish. Real-time alerts in the app so you never miss a result.

featureMar 7

Diagnosis for All Conversations

Step-level failure diagnosis now runs on every conversation — not just low-scoring multi-agent traces. Every agent gets actionable root cause analysis regardless of score or architecture.

featureMar 5

Open Signup

Waitlist removed. Sign up with email or Google and start connecting agents immediately.

featureMar 2

Free Tier

Start using Converra at no cost. The free tier includes conversation imports, insights, and a limited number of optimizations so you can evaluate before committing.

featureMar 10

Intent-Aware Optimization

Focus optimization on what matters most. Choose from 24 built-in focus areas or define custom goals — simulations, evaluations, and variant generation all align to your intent.

featureMar 6

Re-Optimization with Issue Carry-Forward

Re-optimize agents that are already in monitoring state. Unresolved issues from prior runs carry forward automatically so the optimizer picks up where it left off.

featureMar 1

Root Cause Categories

Failures across your agents are now grouped by root cause category in the Systems view. Quickly spot whether issues stem from hallucinations, instruction gaps, tool errors, or context limits.

February 2026

featureFeb 28

Per-Prompt Failure Patterns

See recurring failure patterns for individual prompts. Identify which failure types affect each agent so you can prioritize the highest-impact fixes.

improvementFeb 26

Conversation Messages in Step Diagnosis

Step-level failure diagnosis now shows the actual conversation messages exchanged during the failing step, giving you full context without leaving the diagnosis view.

integrationFeb 24

OpenTelemetry Integration

Import production conversations from any OpenTelemetry-compatible tracing pipeline. Connect Axiom or other OTel backends to automatically sync your agent's traces.

Learn more
featureFeb 24

Source Feedback in Evaluation

Production user feedback is now surfaced in conversation insights and factored into evaluation scores. See what real users thought alongside AI analysis.

featureFeb 22

Auto-Trigger & Auto-Deploy Optimization

Optimization automatically triggers when step diagnosis detects fixable failures. Winners can auto-deploy with settings-gated controls — no manual intervention needed.

featureFeb 20

Automatic Regression Testing for Agents

Winners are automatically tested against a golden set of scenarios before deployment. Catch regressions before they reach production.

Learn more
featureFeb 19

Custom Evaluation Criteria

Define business-specific metrics beyond the built-in evaluation suite. Measure what matters most for your agent's domain.

featureFeb 19

LLM Benchmark Comparisons

Compare model performance side-by-side. Run the same scenarios across different LLMs to find the best fit for your agent.

improvementFeb 19

Conversation Insights Redesign

Redesigned conversation insights with above-the-fold metrics, prompt links, and consolidated qualitative sections.

featureFeb 18

Orchestration-Aware Simulations

Multi-agent simulations now inject synthetic orchestrator context for higher fidelity. Simulated conversations reflect how your agents actually interact in production.

Learn more
featureFeb 17

Fix-Targeted Optimizations

Optimizations now target the specific agent step responsible for diagnosed failures, instead of optimizing blindly.

featureFeb 17

Agent Failure Patterns

See recurring failure types across all your agents at a glance. Spot systemic issues before they become customer-facing problems.

featureFeb 16

Platform Launch

Converra launches. Autonomous agent optimization with simulation testing, real-time performance tracking, and continuous prompt improvement.

featureFeb 14

Step-Level Agent Failure Diagnosis

Pinpoints which step in a multi-agent conversation caused a failure. See the execution flow, identify the responsible agent, and get actionable fix recommendations.

Learn more
featureFeb 4

Prompt Structure Optimization

The optimizer now detects and resolves contradictions, redundancy, and formatting issues in your agent's instructions — not just metric-driven changes.

January 2026

featureJan 20

Variables & Sibling Deployment

Extract variables from agent instructions and deploy optimized variants across sibling agents that share the same structure.

featureJan 15

MCP Server for AI-Native Integration

Converra is now accessible as an MCP server — manage agents, run simulations, and trigger optimizations from any MCP-compatible client.

Learn more
featureJan 12

Segment Metrics

Break down agent performance by segment. See which parts of your agent's instructions contribute most to success or failure.

improvementJan 10

Protected Zones in Variant Generation

Mark sections of your agent's instructions as protected so the optimizer preserves them during variant generation.

integrationJan 2

Langfuse Integration

Import production conversations from Langfuse with continuous sync. Supports self-hosted instances and multi-agent trace detection.

Learn more

December 2025

featureDec 20

Head-to-Head Winner Selection

Variant selection now uses persona-level head-to-head comparisons as the single source of truth, eliminating false positives from aggregated scores.

integrationDec 15

LangSmith Integration

Import production conversations directly from LangSmith. Connect your existing tracing pipeline to Converra without code changes.

Learn more
featureDec 10

Multi-Agent Discovery

Automatically detect multi-agent architectures from ingested conversations. See which agents participate and how they hand off.

featureDec 1

Bulk Conversation Management

Archive and delete conversations in bulk. Filter, select, and clean up your conversation data at scale.

Want to try these features?

Connect your agent and start seeing improvements in minutes.

Start for free