Skip to content

How Optimization Works

Converra generates targeted fixes for your agents and proves every change in simulation before it ships — connecting directly to where your agents live. For multi-step workflows, it evaluates and improves agent behavior in context.

The Full Lifecycle

                            CONVERRA
    ┌───────────────────────────────────────────────────────────────────────┐
    │                                                                       │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
    │  │ Analyze  │->│ Generate │->│ Simulate │->│Regression│->│  Select  │ │
    │  │  Prompt  │  │ Variants │  │          │  │   Test   │  │  Winner  │ │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
    │                                                                       │
    └───────────────────────────────────────────────────────────────────────┘
           ↑                                             │
           │                                             ↓
    ┌──────┴──────┐                               ┌──────┴──────┐
    │  IMPORT     │                               │   DEPLOY    │
    │  prompts    │                               │   winner    │
    └──────┬──────┘                               └──────┬──────┘
           │                                             │
           ↑                                             ↓
╔══════════╧═════════════════════════════════════════════╧══════════╗
║                     YOUR PRODUCTION STACK                         ║
║                                                                   ║
║   ┌───────────┐    ┌───────────┐    ┌───────────┐                ║
║   │Observabil-│    │  Manual   │    │  Custom   │                ║
║   │ity tools  │    │  paste    │    │   API     │                ║
║   │(LangSmith,│    └───────────┘    └───────────┘                ║
║   │ Langfuse) │                                                  ║
║   └───────────┘                                                  ║
║                                                                   ║
╚═══════════════════════════════════════════════════════════════════╝

The key insight: Your agents' prompts don't live in Converra — they live in your production systems. Converra connects to where they already are, improves them, and puts the proven versions back.

Agent Systems (V3)

If your production workflow uses multiple prompts (for example, a router handing off to specialists), Converra can discover an agent system from imported traces and evaluate prompts in system context.

What changes compared to single-prompt optimization:

  • Simulations include realistic handoff context from earlier steps.
  • Results can be grouped by path (the prompt sequence taken) for fair comparisons.
  • System metrics are diagnostic; winner selection stays apples-to-apples within comparable paths.

Import: Where Agents Come From

Converra pulls agents from where they already live:

SourceHow It Works
LangSmith / LangfuseImport agents + conversation traces from your observability data
SDKWrap your LLM client — agents and conversations captured automatically
APIPush agents programmatically from your deployment pipeline
ManualPaste prompts directly for quick testing

See Integrations for setup details.

The Optimization Loop

1. Analyze Agent

Converra analyzes your agent's prompt to understand:

  • Structure and formatting
  • Goals and constraints
  • Potential improvement areas

2. Generate Variants

AI creates alternative versions of your agent's prompt:

  • Each variant targets specific improvements
  • Variants maintain your core requirements
  • Typically 3-5 variants are tested

3. Simulate

Each variant is tested against simulated personas:

  • Diverse user types (frustrated, technical, new, etc.)
  • Multiple conversation scenarios
  • Realistic interaction patterns

4. Regression Test

When a leading variant emerges, the system automatically tests it against a "golden set" of scenarios:

  • Golden set: Scenarios your baseline agent handles reliably (auto-generated)
  • Short exchanges: 2-3 turns per scenario for fast validation
  • Pass/fail: Each scenario must maintain baseline performance

If regressions are found, you see the tradeoff: "Improved X but regressed on Y. Apply anyway?"

See Regression Testing for details.

5. Select Winner

Performance is evaluated across metrics:

  • Task completion rate
  • Response quality
  • User sentiment
  • Goal achievement
  • Regression test results

The best-performing variant is identified.

Deploy: CI/CD for Agent Improvements

Deployment works like CI/CD for your agents. Every fix is simulation-tested and regression-checked before it ships — the same way code passes tests before it merges. Deploy automatically, via GitHub PR, or with manual review. Most teams start with PRs and move to auto-deploy within a week.

DestinationHow It Works
Auto-deployWinners that pass regression testing ship automatically. Instant rollback if any metric regresses
GitHub PRAuto-creates a PR with the improved prompt, metrics, and evidence. Merge to deploy
API/WebhookNotify your systems to pull the new version automatically
DashboardReview and apply winners manually

The goal is a closed loop: agents flow from your stack → through optimization → back to your stack. GitHub PR integration works like Dependabot for AI agents — connect your repo, and Converra creates PRs when improvements are proven.

What Gets Optimized

AspectExample Improvement
ClarityClearer instructions, better structure
ToneMore appropriate formality level
EfficiencyShorter responses that still work
CompletenessBetter coverage of edge cases
ConsistencyMore predictable behavior

Optimization Modes

Exploratory Mode

Best for: Finding improvements quickly

  • Fewer simulations per variant
  • Faster results (minutes)
  • Good for iteration

Validation Mode

Best for: Production decisions

  • More simulations per variant
  • Statistical confidence
  • Takes longer but more reliable

Replay Mode

Best for: Verifying fixes on real failures

  • Tests variants against imported production traces (offline)
  • Confirms fixes work on the exact cases that failed
  • Available when you've imported traces from LangSmith

What Stays the Same

Converra preserves your:

  • Core purpose and role
  • Key constraints and boundaries
  • Required output formats
  • Brand voice fundamentals

Simulation Personas

Your agents are tested against diverse simulated users:

PersonaTests
Frustrated CustomerDe-escalation, empathy
Technical UserAccuracy, depth
New UserClarity, onboarding
Impatient UserConciseness
Confused UserPatience, explanation

You can also create custom personas matching your actual users.

Metrics Evaluated

Primary Metrics

  • Task Completion - Did the AI help the user achieve their goal?
  • Response Quality - Was the response accurate and helpful?
  • User Sentiment - How would the user feel about the interaction?

Secondary Metrics

  • Conciseness - Appropriate length for the context
  • Consistency - Similar situations handled similarly
  • Safety - Stayed within appropriate boundaries

Example Optimization

Original Prompt:

You are a customer support agent. Help users with their questions.

Optimized Variant (Winner):

You are a customer support agent for TechCorp. Your goal is to
resolve issues quickly while maintaining a friendly tone.

When helping users:
1. Acknowledge their issue
2. Ask clarifying questions if needed
3. Provide a clear solution
4. Confirm the issue is resolved

If you can't resolve an issue, offer to escalate to a specialist.

Improvement: +34% task completion, +28% user satisfaction

Next Steps