A timeline of features, improvements, and integrations across the Converra platform.
`get_fleet_overview` now answers "is the fleet getting better or worse, and where?" — agents and patterns are tagged with movement direction and confidence, so coding agents can triage what changed without a separate query.
Optimization detail now leads with the deployment verdict and surfaces a before/after diff of the prompt change, so the outcome and what shipped are visible without scrolling.
Node SDK 0.4.1 closes traffic-assignment gaps, preserves SDK-side test assignments across rollouts, and hardens the production A/B test rollout path.
Generated variants must conform to the prompt contract before entering simulation. Rejections are recovered as warnings and fed back into the next generation pass instead of failing the optimization.
Optimization selection, evidence, and insights now flow only through head-to-head pairs between baseline and each variant. Sub-1pp lifts are preserved, diagnosis-first variant generation is enforced, and intent constraints are aligned across every trigger path.
Status-led triage rows lead with an impact-magnitude delta chip, action patterns replace the attention card, and rows drill into pattern evidence. New triage model splits needs_fix into untriaged and fix_ready, with auto-demotion on regression. Start fix opens inline from the row.
Regression gating now uses score-delta with variance-derived slack instead of pass-rate flips, eliminating false-positive demotions. Added aggregate demotion safety net, llmParametersHash in semantic-cache provenance with re-qualification on mismatch, and a configurable maxRegressions limit.
Test a recommended fix on a controlled slice of live SDK traffic before full rollout. Configure traffic, compare baseline vs. candidate fix on scored production conversations, then promote or stop based on lift and confidence.
Audit reports now test how an agent handles AI-mediated traffic: identity acknowledgement, structured intent, multi-step delegation, and bot-block transparency.
Re-audits now highlight issues that fired in the previous run but no longer reproduce, with validated/likely-fixed confidence labels and a dedicated Fixed tab.
Weekly emails now summarize what changed across the fleet: verified wins, regressions, unfixed patterns, recent PRs, fleet score, failure rate, and conversations analyzed.
Public audit submissions now use bot checks, daily capacity limits, and email verification before execution, keeping free audits reliable without requiring a full account upfront.
Paste any AI chat agent's URL, get a graded audit at a shareable token URL in under a minute. No account required. Save the report to a Converra account to track changes over time, or share it publicly to settle 'is this agent any good?' debates.
Learn moreNew pricing structure: Free, Pay-as-you-go ($9 per audit), Pro ($299/mo with 15 audits + overage), Enterprise. Annual/monthly toggle. PAYG runs through a single Stripe-hosted checkout — no contract, no commitment.
Learn moreAudit reports include an agent-specific custom metrics section — a medical intake bot is graded against different criteria than a returns assistant. The default rubric still applies; custom metrics layer on top.
Re-audit a saved report and see what actually changed: score deltas per dimension, scenarios that flipped pass/fail, and a verdict shift banner at the top. Tracks agent improvement over time without manual comparison.
Save any audit report to a Converra account with a single magic-link click — no password setup. Anonymous reports you ran before signing up automatically attach to the new account by browser attribution.
Reports now show the eval set used for scoring. A 73 on Eval Set v3 is comparable across re-audits — the rubric isn't silently shifting underneath you.
Three new fields on optimization results explain why a winner was picked: confidence score, plain-language selection reasoning, and per-scenario regression results — so you can see how each variant performed against each test case. Plus a new validate_variant MCP tool for running ad-hoc validations.
Optimization process status now flows from a single derived source — eliminating cases where the UI showed 'running' after the backend had finished, and where auto-restart fired on terminal failures. The status displayed in dashboards and PRs is always the canonical one.
If you run the same agent (Discovery, Support, etc.) across multiple end-customers, Converra now treats the whole role as a family. PRs surface a 'Role family' section listing every customer the change touches, and API + MCP responses include agentType, customer, and instanceCount so any consumer immediately sees the 5-agents-across-30-customers shape instead of a flat list.
Jump straight to the conversation that's hurting your agent the most, see the diagnosis, and ship the fix in one click. No digging through dashboards to figure out where to start.
Get a Slack ping the moment Converra flags a bad conversation — with a link straight to the diagnosis. Stop finding out about quality issues from your customers.
New append endpoint lets you stream turns into an existing conversation instead of waiting for it to end and posting the full transcript. Lower memory, real-time insights, and better fit for long-running voice and chat sessions.
Learn moreShare a single link to bring teammates into your Converra account. No more one-by-one email invites or waiting for approvals.
Pending-deployment and template aggregate pages now lead with critical issues instead of burying them under noise. The things that actually break your agent show up at the top.
Live score window widened from 7 to 30 days, and the minimum sample size dropped from 15 to 5. Agents with modest traffic now keep a stable, meaningful score instead of going dark between bursts.
One method to log conversations. Pass your messages and Converra handles the rest — create a new conversation or append turns to an existing one. Each message can optionally carry model, tool calls, token usage, and latency so you get trace-level detail without a tracing pipeline. Available in both Node.js and Python SDKs.
Learn moreWhen a prompt orchestrates multiple sub-agents (greeter, researcher, closer), failures are now attributed to the specific agent that caused them instead of defaulting to the primary. Cleaner diagnosis, targeted fixes.
Six new MCP tools that answer 'how is Converra helping my agents?' Fleet overview with real fleet score and failure rate. Daily score timeline with deployment markers. Cumulative impact summary. Verification evidence — the actual conversations behind every claim. Fixability breakdown showing what Converra can fix vs what needs engineering.
Every verification claim now has receipts. When Converra says '82% reduction,' you can drill into the actual pre-fix and post-fix conversations that were counted. Evidence is captured at verification time and retrievable via MCP or API.
Step failure aggregation now shows what percentage of failures Converra can fix autonomously vs what needs engineering work. Matches the Failure Triage card on the Fleet page.
When a fix resolves one failure, conversations survive longer and may hit new issues downstream. Converra now detects these cascade effects and surfaces them on the fleet card — so you know the fix worked even when the overall score doesn't move.
Deployed fixes are now verified against production conversations. Fleet cards show before/after failure rates with 'Fixed — X% reduction' badges. Verification is graduated: monitoring → likely fixed → verified — so you see early signals before waiting for full statistical confidence.
When an optimization doesn't find a winner, Converra now classifies why it failed, learns from the outcome, and automatically restarts with an adapted strategy. The same improvement loop we run on your agents now runs on ours.
Agent instructions now have full version history with lineage tracking. When one agent in a sibling group gets optimized, the others show staleness indicators so you know which agents are falling behind.
Preview the full pull request before creating it — the diff, metrics comparison, diagnosed issues, regression results, and conversation replay. Review everything in one place before committing to the PR.
Optimization PRs are now structured for AI code reviewers. Full context, metrics, evidence, and before/after comparisons are embedded inline so Claude Code, Copilot, and other AI reviewers can evaluate the change without needing access to Converra.
Every optimization now shows side-by-side comparisons of how the agent responded before and after the change. See the actual improvement in context, not just a score delta.
Instead of a separate email for every diagnosed conversation, you now get one agent-level alert that summarizes all issues across conversations. Less noise, same signal.
The copy fix button now includes testing methodology, regression results, real production conversation quotes, and review guidance — everything a reviewer needs without opening Converra.
Define evaluation rules at the organization level that apply across all your agents. Custom rules are plumbed directly into evaluation prompts so every optimization and insight respects your business-specific quality standards.
Each agent in a multi-agent conversation now gets its own scoped insights. Secondary agents surface their own failure patterns and performance metrics instead of being rolled into the primary agent's analysis.
Agent and fleet insights now show behavior-specific failure patterns with real conversation counts — not generic buckets. Each pattern links directly to the affected conversations. Cost and upside cards estimate the business impact of fixing the top issues.
Agent issues are now deep business insights, not metric labels. Each issue includes a headline, evidence from diagnosed conversations, a recommended fix, the team that owns it, and whether Converra can auto-fix it via prompt optimization.
Converra now creates pull requests in your GitHub repos when optimizations find improvements. Connect your GitHub, and optimization winners are automatically sent as PRs with metrics, evidence, and a one-click merge path. Supports auto-PR on completion, manual PR creation, merge-back sync, and Python agent file detection.
Learn moreWhen a benchmark finds a better model, Converra automatically opens a GitHub PR to switch the model config in your repo — complete with a comparison table showing quality score, cost, and latency across difficulty levels.
Model benchmarks now live on their own page with a dedicated nav entry. Browse all benchmark runs, view inline conversation scores, and launch new comparisons without leaving context.
See every agent's health at a glance. A single dashboard with optimization progress over time, a scoreboard ranking agents by performance, failure distribution, improvement potential, and pending deploys — everything you need to decide what to work on next.
The guided tour now starts with the Fleet page and walks through connecting your first agent. A faster path from signup to value.
Import converra/auto and every LLM call in your app is captured automatically — no wrapper functions, no code changes. Works with OpenAI, Anthropic, and Vercel AI SDK.
Langfuse and OpenTelemetry integrations now match LangSmith feature-for-feature — async sync triggers, pre-flight validation, usage limit checks, and import metrics.
Wrap your OpenAI, Anthropic, or Vercel AI SDK client with one line. Every conversation is captured automatically. Multi-agent tracing links orchestrator and sub-agent calls into a single execution graph. A/B variant swapping tests optimized prompts against real traffic.
Learn morepip install converra — Python SDK with sync/async/streaming support for OpenAI and Anthropic. LangChain callback handler included.
New API endpoints for SDK integration — prompt matching by content hash, active variant lookup for A/B testing, bulk SDK configuration endpoint. Testing mode setting (proxy/simulation) added to dashboard.
Send traces directly to Converra via the SDK (converra.traces.create) — no LangSmith, Langfuse, or OTel pipeline required. The fastest path from your agent to Converra.
Learn moreGive the optimizer direct feedback. Thumbs up/down from the UI or programmatic feedback via MCP tools — both feed into the optimization agent's planning so it learns from your judgment, not just metrics.
Benchmark comparisons now show actual per-conversation scores inline. See exactly how each model performed, not just a summary.
Redesigned optimization results with an activity card and deploy banner. Clearer post-optimization experience so you can review and deploy faster.
Get notified when optimizations complete or conversation syncs finish. Real-time alerts in the app so you never miss a result.
Step-level failure diagnosis now runs on every conversation — not just low-scoring multi-agent traces. Every agent gets actionable root cause analysis regardless of score or architecture.
Waitlist removed. Sign up with email or Google and start connecting agents immediately.
Start using Converra at no cost. The free tier includes conversation imports, insights, and a limited number of optimizations so you can evaluate before committing.
Focus optimization on what matters most. Choose from 24 built-in focus areas or define custom goals — simulations, evaluations, and variant generation all align to your intent.
Re-optimize agents that are already in monitoring state. Unresolved issues from prior runs carry forward automatically so the optimizer picks up where it left off.
Failures across your agents are now grouped by root cause category in the Systems view. Quickly spot whether issues stem from hallucinations, instruction gaps, tool errors, or context limits.
See recurring failure patterns for individual prompts. Identify which failure types affect each agent so you can prioritize the highest-impact fixes.
Step-level failure diagnosis now shows the actual conversation messages exchanged during the failing step, giving you full context without leaving the diagnosis view.
Import production conversations from any OpenTelemetry-compatible tracing pipeline. Connect Axiom or other OTel backends to automatically sync your agent's traces.
Learn moreProduction user feedback is now surfaced in conversation insights and factored into evaluation scores. See what real users thought alongside AI analysis.
Optimization automatically triggers when step diagnosis detects fixable failures. Winners can auto-deploy with settings-gated controls — no manual intervention needed.
Winners are automatically tested against a golden set of scenarios before deployment. Catch regressions before they reach production.
Learn moreDefine business-specific metrics beyond the built-in evaluation suite. Measure what matters most for your agent's domain.
Compare model performance side-by-side. Run the same scenarios across different LLMs to find the best fit for your agent.
Redesigned conversation insights with above-the-fold metrics, prompt links, and consolidated qualitative sections.
Multi-agent simulations now inject synthetic orchestrator context for higher fidelity. Simulated conversations reflect how your agents actually interact in production.
Learn moreOptimizations now target the specific agent step responsible for diagnosed failures, instead of optimizing blindly.
See recurring failure types across all your agents at a glance. Spot systemic issues before they become customer-facing problems.
Converra launches. Autonomous agent optimization with simulation testing, real-time performance tracking, and continuous prompt improvement.
Pinpoints which step in a multi-agent conversation caused a failure. See the execution flow, identify the responsible agent, and get actionable fix recommendations.
Learn moreThe optimizer now detects and resolves contradictions, redundancy, and formatting issues in your agent's instructions — not just metric-driven changes.
Extract variables from agent instructions and deploy optimized variants across sibling agents that share the same structure.
Converra is now accessible as an MCP server — manage agents, run simulations, and trigger optimizations from any MCP-compatible client.
Learn moreBreak down agent performance by segment. See which parts of your agent's instructions contribute most to success or failure.
Mark sections of your agent's instructions as protected so the optimizer preserves them during variant generation.
Import production conversations from Langfuse with continuous sync. Supports self-hosted instances and multi-agent trace detection.
Learn moreVariant selection now uses persona-level head-to-head comparisons as the single source of truth, eliminating false positives from aggregated scores.
Import production conversations directly from LangSmith. Connect your existing tracing pipeline to Converra without code changes.
Learn moreAutomatically detect multi-agent architectures from ingested conversations. See which agents participate and how they hand off.
Archive and delete conversations in bulk. Filter, select, and clean up your conversation data at scale.
Connect your agent and start seeing improvements in minutes.
Start for free