Changelog

What we shipped

A timeline of features, improvements, and integrations across the Converra platform.

May 2026

featureMay 20

Fleet Movement in MCP

`get_fleet_overview` now answers "is the fleet getting better or worse, and where?" — agents and patterns are tagged with movement direction and confidence, so coding agents can triage what changed without a separate query.

improvementMay 19

Deployment Verification — Verdict First

Optimization detail now leads with the deployment verdict and surfaces a before/after diff of the prompt change, so the outcome and what shipped are visible without scrolling.

improvementMay 18

Production A/B Test SDK Hardening

Node SDK 0.4.1 closes traffic-assignment gaps, preserves SDK-side test assignments across rollouts, and hardens the production A/B test rollout path.

featureMay 15

Prompt Conformance Gate for Generated Variants

Generated variants must conform to the prompt contract before entering simulation. Rejections are recovered as warnings and fed back into the next generation pass instead of failing the optimization.

improvementMay 15

Head-to-Head Selection Integrity

Optimization selection, evidence, and insights now flow only through head-to-head pairs between baseline and each variant. Sub-1pp lifts are preserved, diagnosis-first variant generation is enforced, and intent constraints are aligned across every trigger path.

featureMay 13

Fleet Intelligence Redesign

Status-led triage rows lead with an impact-magnitude delta chip, action patterns replace the attention card, and rows drill into pattern evidence. New triage model splits needs_fix into untriaged and fix_ready, with auto-demotion on regression. Start fix opens inline from the row.

improvementMay 11

Regression Gate Overhaul

Regression gating now uses score-delta with variance-derived slack instead of pass-rate flips, eliminating false-positive demotions. Added aggregate demotion safety net, llmParametersHash in semantic-cache provenance with re-qualification on mismatch, and a configurable maxRegressions limit.

featureMay 7

Production A/B Tests for Agent Fixes

Test a recommended fix on a controlled slice of live SDK traffic before full rollout. Configure traffic, compare baseline vs. candidate fix on scored production conversations, then promote or stop based on lift and confidence.

featureMay 6

Agentic Readiness on Audit Reports

Audit reports now test how an agent handles AI-mediated traffic: identity acknowledgement, structured intent, multi-step delegation, and bot-block transparency.

improvementMay 4

Fixed Findings on Re-Audits

Re-audits now highlight issues that fired in the previous run but no longer reproduce, with validated/likely-fixed confidence labels and a dedicated Fixed tab.

improvementMay 4

Weekly Agent Report

Weekly emails now summarize what changed across the fleet: verified wins, regressions, unfixed patterns, recent PRs, fleet score, failure rate, and conversations analyzed.

infrastructureMay 4

Verified Public Audit Queue

Public audit submissions now use bot checks, daily capacity limits, and email verification before execution, keeping free audits reliable without requiring a full account upfront.

featureMay 3

Public Agent Audit

Paste any AI chat agent's URL, get a graded audit at a shareable token URL in under a minute. No account required. Save the report to a Converra account to track changes over time, or share it publicly to settle 'is this agent any good?' debates.

Learn more
featureMay 3

Pricing — 4-Tier Ladder with PAYG

New pricing structure: Free, Pay-as-you-go ($9 per audit), Pro ($299/mo with 15 audits + overage), Enterprise. Annual/monthly toggle. PAYG runs through a single Stripe-hosted checkout — no contract, no commitment.

Learn more
featureMay 3

Custom Metrics on Audit Reports

Audit reports include an agent-specific custom metrics section — a medical intake bot is graded against different criteria than a returns assistant. The default rubric still applies; custom metrics layer on top.

featureMay 2

Before/After Diff on Re-Audits

Re-audit a saved report and see what actually changed: score deltas per dimension, scenarios that flipped pass/fail, and a verdict shift banner at the top. Tracks agent improvement over time without manual comparison.

featureMay 1

Magic-Link Auth + Save Flow

Save any audit report to a Converra account with a single magic-link click — no password setup. Anonymous reports you ran before signing up automatically attach to the new account by browser attribution.

improvementMay 1

Eval Set Surfaced on Reports

Reports now show the eval set used for scoring. A 73 on Eval Set v3 is comparable across re-audits — the rubric isn't silently shifting underneath you.

April 2026

featureApr 29

Optimization Confidence, Reasoning, and Per-Scenario Regression in MCP

Three new fields on optimization results explain why a winner was picked: confidence score, plain-language selection reasoning, and per-scenario regression results — so you can see how each variant performed against each test case. Plus a new validate_variant MCP tool for running ad-hoc validations.

improvementApr 29

Optimization Status SSOT

Optimization process status now flows from a single derived source — eliminating cases where the UI showed 'running' after the backend had finished, and where auto-restart fired on terminal failures. The status displayed in dashboards and PRs is always the canonical one.

featureApr 27

Role-Family Awareness for Multi-Tenant Fleets

If you run the same agent (Discovery, Support, etc.) across multiple end-customers, Converra now treats the whole role as a family. PRs surface a 'Role family' section listing every customer the change touches, and API + MCP responses include agentType, customer, and instanceCount so any consumer immediately sees the 5-agents-across-30-customers shape instead of a flat list.

featureApr 24

Worst-Conversation Landing with One-Click Fix

Jump straight to the conversation that's hurting your agent the most, see the diagnosis, and ship the fix in one click. No digging through dashboards to figure out where to start.

integrationApr 24

Slack Notifications for Flagged Conversations

Get a Slack ping the moment Converra flags a bad conversation — with a link straight to the diagnosis. Stop finding out about quality issues from your customers.

featureApr 24

Incremental Conversation Streaming

New append endpoint lets you stream turns into an existing conversation instead of waiting for it to end and posting the full transcript. Lower memory, real-time insights, and better fit for long-running voice and chat sessions.

Learn more
featureApr 20

Team Invite Links

Share a single link to bring teammates into your Converra account. No more one-by-one email invites or waiting for approvals.

improvementApr 20

Critical Issues Surfaced First

Pending-deployment and template aggregate pages now lead with critical issues instead of burying them under noise. The things that actually break your agent show up at the top.

improvementApr 20

Quiet-Period Scoring for Low-Volume Agents

Live score window widened from 7 to 30 days, and the minimum sample size dropped from 15 to 5. Agents with modest traffic now keep a stable, meaningful score instead of going dark between bursts.

featureApr 18

converra.send() — Simplified Conversation Logging

One method to log conversations. Pass your messages and Converra handles the rest — create a new conversation or append turns to an existing one. Each message can optionally carry model, tool calls, token usage, and latency so you get trace-level detail without a tracing pipeline. Available in both Node.js and Python SDKs.

Learn more
improvementApr 17

Accurate Per-Agent Diagnosis in Multi-Agent Prompts

When a prompt orchestrates multiple sub-agents (greeter, researcher, closer), failures are now attributed to the specific agent that caused them instead of defaulting to the primary. Cleaner diagnosis, targeted fixes.

featureApr 9

MCP Performance & Impact Toolkit

Six new MCP tools that answer 'how is Converra helping my agents?' Fleet overview with real fleet score and failure rate. Daily score timeline with deployment markers. Cumulative impact summary. Verification evidence — the actual conversations behind every claim. Fixability breakdown showing what Converra can fix vs what needs engineering.

featureApr 9

Verification Evidence Trail

Every verification claim now has receipts. When Converra says '82% reduction,' you can drill into the actual pre-fix and post-fix conversations that were counted. Evidence is captured at verification time and retrievable via MCP or API.

improvementApr 9

Fixability Classification in Failure Diagnosis

Step failure aggregation now shows what percentage of failures Converra can fix autonomously vs what needs engineering work. Matches the Failure Triage card on the Fleet page.

featureApr 7

Cascade Detection

When a fix resolves one failure, conversations survive longer and may hit new issues downstream. Converra now detects these cascade effects and surfaces them on the fleet card — so you know the fix worked even when the overall score doesn't move.

featureApr 5

Production Fix Verification

Deployed fixes are now verified against production conversations. Fleet cards show before/after failure rates with 'Fixed — X% reduction' badges. Verification is graduated: monitoring → likely fixed → verified — so you see early signals before waiting for full statistical confidence.

featureApr 3

Converra on Converra

When an optimization doesn't find a winner, Converra now classifies why it failed, learns from the outcome, and automatically restarts with an adapted strategy. The same improvement loop we run on your agents now runs on ours.

featureApr 3

Agent Versioning & Sibling Lineage

Agent instructions now have full version history with lineage tracking. When one agent in a sibling group gets optimized, the others show staleness indicators so you know which agents are falling behind.

featureApr 1

PR Preview Dialog

Preview the full pull request before creating it — the diff, metrics comparison, diagnosed issues, regression results, and conversation replay. Review everything in one place before committing to the PR.

featureApr 1

Claude Code-Ready PRs

Optimization PRs are now structured for AI code reviewers. Full context, metrics, evidence, and before/after comparisons are embedded inline so Claude Code, Copilot, and other AI reviewers can evaluate the change without needing access to Converra.

March 2026

featureMar 31

Before/After Conversation Replay

Every optimization now shows side-by-side comparisons of how the agent responded before and after the change. See the actual improvement in context, not just a score delta.

improvementMar 31

Agent-Level Alerts

Instead of a separate email for every diagnosed conversation, you now get one agent-level alert that summarizes all issues across conversations. Less noise, same signal.

improvementMar 31

Copy Fix with Full Context

The copy fix button now includes testing methodology, regression results, real production conversation quotes, and review guidance — everything a reviewer needs without opening Converra.

featureMar 30

Organization Evaluation Rules

Define evaluation rules at the organization level that apply across all your agents. Custom rules are plumbed directly into evaluation prompts so every optimization and insight respects your business-specific quality standards.

featureMar 29

Multi-Agent Secondary Agent Insights

Each agent in a multi-agent conversation now gets its own scoped insights. Secondary agents surface their own failure patterns and performance metrics instead of being rolled into the primary agent's analysis.

featureMar 28

Business Impact Analysis

Agent and fleet insights now show behavior-specific failure patterns with real conversation counts — not generic buckets. Each pattern links directly to the affected conversations. Cost and upside cards estimate the business impact of fixing the top issues.

featureMar 26

Unified Issue Intelligence

Agent issues are now deep business insights, not metric labels. Each issue includes a headline, evidence from diagnosed conversations, a recommended fix, the team that owns it, and whether Converra can auto-fix it via prompt optimization.

integrationMar 24

GitHub Integration — Dependabot for AI Agents

Converra now creates pull requests in your GitHub repos when optimizations find improvements. Connect your GitHub, and optimization winners are automatically sent as PRs with metrics, evidence, and a one-click merge path. Supports auto-PR on completion, manual PR creation, merge-back sync, and Python agent file detection.

Learn more
featureMar 24

Auto-PR for Model Benchmarks

When a benchmark finds a better model, Converra automatically opens a GitHub PR to switch the model config in your repo — complete with a comparison table showing quality score, cost, and latency across difficulty levels.

featureMar 24

Dedicated Benchmarks Page

Model benchmarks now live on their own page with a dedicated nav entry. Browse all benchmark runs, view inline conversation scores, and launch new comparisons without leaving context.

featureMar 22

Fleet Page

See every agent's health at a glance. A single dashboard with optimization progress over time, a scoreboard ranking agents by performance, failure distribution, improvement potential, and pending deploys — everything you need to decide what to work on next.

improvementMar 22

Onboarding Redesign

The guided tour now starts with the Fleet page and walks through connecting your first agent. A faster path from signup to value.

featureMar 20

Zero-Code Instrumentation

Import converra/auto and every LLM call in your app is captured automatically — no wrapper functions, no code changes. Works with OpenAI, Anthropic, and Vercel AI SDK.

integrationMar 20

Langfuse & OTel Integration Parity

Langfuse and OpenTelemetry integrations now match LangSmith feature-for-feature — async sync triggers, pre-flight validation, usage limit checks, and import metrics.

featureMar 20

SDK v0.2.0 — LLM Client Wrapping

Wrap your OpenAI, Anthropic, or Vercel AI SDK client with one line. Every conversation is captured automatically. Multi-agent tracing links orchestrator and sub-agent calls into a single execution graph. A/B variant swapping tests optimized prompts against real traffic.

Learn more
featureMar 20

Python SDK Published to PyPI

pip install converra — Python SDK with sync/async/streaming support for OpenAI and Anthropic. LangChain callback handler included.

infrastructureMar 19

SDK Server Endpoints

New API endpoints for SDK integration — prompt matching by content hash, active variant lookup for A/B testing, bulk SDK configuration endpoint. Testing mode setting (proxy/simulation) added to dashboard.

integrationMar 19

Generic Trace API & SDK

Send traces directly to Converra via the SDK (converra.traces.create) — no LangSmith, Langfuse, or OTel pipeline required. The fastest path from your agent to Converra.

Learn more
featureMar 19

Human & Agent Feedback in Optimization

Give the optimizer direct feedback. Thumbs up/down from the UI or programmatic feedback via MCP tools — both feed into the optimization agent's planning so it learns from your judgment, not just metrics.

improvementMar 18

Benchmark Score Visibility

Benchmark comparisons now show actual per-conversation scores inline. See exactly how each model performed, not just a summary.

improvementMar 18

Results Page Redesign

Redesigned optimization results with an activity card and deploy banner. Clearer post-optimization experience so you can review and deploy faster.

featureMar 16

In-App Notifications

Get notified when optimizations complete or conversation syncs finish. Real-time alerts in the app so you never miss a result.

featureMar 7

Diagnosis for All Conversations

Step-level failure diagnosis now runs on every conversation — not just low-scoring multi-agent traces. Every agent gets actionable root cause analysis regardless of score or architecture.

featureMar 5

Open Signup

Waitlist removed. Sign up with email or Google and start connecting agents immediately.

featureMar 2

Free Tier

Start using Converra at no cost. The free tier includes conversation imports, insights, and a limited number of optimizations so you can evaluate before committing.

featureMar 10

Intent-Aware Optimization

Focus optimization on what matters most. Choose from 24 built-in focus areas or define custom goals — simulations, evaluations, and variant generation all align to your intent.

featureMar 6

Re-Optimization with Issue Carry-Forward

Re-optimize agents that are already in monitoring state. Unresolved issues from prior runs carry forward automatically so the optimizer picks up where it left off.

featureMar 1

Root Cause Categories

Failures across your agents are now grouped by root cause category in the Systems view. Quickly spot whether issues stem from hallucinations, instruction gaps, tool errors, or context limits.

February 2026

featureFeb 28

Per-Prompt Failure Patterns

See recurring failure patterns for individual prompts. Identify which failure types affect each agent so you can prioritize the highest-impact fixes.

improvementFeb 26

Conversation Messages in Step Diagnosis

Step-level failure diagnosis now shows the actual conversation messages exchanged during the failing step, giving you full context without leaving the diagnosis view.

integrationFeb 24

OpenTelemetry Integration

Import production conversations from any OpenTelemetry-compatible tracing pipeline. Connect Axiom or other OTel backends to automatically sync your agent's traces.

Learn more
featureFeb 24

Source Feedback in Evaluation

Production user feedback is now surfaced in conversation insights and factored into evaluation scores. See what real users thought alongside AI analysis.

featureFeb 22

Auto-Trigger & Auto-Deploy Optimization

Optimization automatically triggers when step diagnosis detects fixable failures. Winners can auto-deploy with settings-gated controls — no manual intervention needed.

featureFeb 20

Automatic Regression Testing for Agents

Winners are automatically tested against a golden set of scenarios before deployment. Catch regressions before they reach production.

Learn more
featureFeb 19

Custom Evaluation Criteria

Define business-specific metrics beyond the built-in evaluation suite. Measure what matters most for your agent's domain.

featureFeb 19

LLM Benchmark Comparisons

Compare model performance side-by-side. Run the same scenarios across different LLMs to find the best fit for your agent.

improvementFeb 19

Conversation Insights Redesign

Redesigned conversation insights with above-the-fold metrics, prompt links, and consolidated qualitative sections.

featureFeb 18

Orchestration-Aware Simulations

Multi-agent simulations now inject synthetic orchestrator context for higher fidelity. Simulated conversations reflect how your agents actually interact in production.

Learn more
featureFeb 17

Fix-Targeted Optimizations

Optimizations now target the specific agent step responsible for diagnosed failures, instead of optimizing blindly.

featureFeb 17

Agent Failure Patterns

See recurring failure types across all your agents at a glance. Spot systemic issues before they become customer-facing problems.

featureFeb 16

Platform Launch

Converra launches. Autonomous agent optimization with simulation testing, real-time performance tracking, and continuous prompt improvement.

featureFeb 14

Step-Level Agent Failure Diagnosis

Pinpoints which step in a multi-agent conversation caused a failure. See the execution flow, identify the responsible agent, and get actionable fix recommendations.

Learn more
featureFeb 4

Prompt Structure Optimization

The optimizer now detects and resolves contradictions, redundancy, and formatting issues in your agent's instructions — not just metric-driven changes.

January 2026

featureJan 20

Variables & Sibling Deployment

Extract variables from agent instructions and deploy optimized variants across sibling agents that share the same structure.

featureJan 15

MCP Server for AI-Native Integration

Converra is now accessible as an MCP server — manage agents, run simulations, and trigger optimizations from any MCP-compatible client.

Learn more
featureJan 12

Segment Metrics

Break down agent performance by segment. See which parts of your agent's instructions contribute most to success or failure.

improvementJan 10

Protected Zones in Variant Generation

Mark sections of your agent's instructions as protected so the optimizer preserves them during variant generation.

integrationJan 2

Langfuse Integration

Import production conversations from Langfuse with continuous sync. Supports self-hosted instances and multi-agent trace detection.

Learn more

December 2025

featureDec 20

Head-to-Head Winner Selection

Variant selection now uses persona-level head-to-head comparisons as the single source of truth, eliminating false positives from aggregated scores.

integrationDec 15

LangSmith Integration

Import production conversations directly from LangSmith. Connect your existing tracing pipeline to Converra without code changes.

Learn more
featureDec 10

Multi-Agent Discovery

Automatically detect multi-agent architectures from ingested conversations. See which agents participate and how they hand off.

featureDec 1

Bulk Conversation Management

Archive and delete conversations in bulk. Filter, select, and clean up your conversation data at scale.

Want to try these features?

Connect your agent and start seeing improvements in minutes.

Start for free