How Optimization Works

Converra generates targeted fixes for your agents and proves every change in simulation before it ships — connecting directly to where your agents live. For multi-step workflows, it evaluates and improves agent behavior in context.

The Full Lifecycle

                            CONVERRA
    ┌───────────────────────────────────────────────────────────────────────┐
    │                                                                       │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐ │
    │  │ Analyze  │->│ Generate │->│ Simulate │->│Regression│->│  Select  │ │
    │  │  Prompt  │  │ Variants │  │          │  │   Test   │  │  Winner  │ │
    │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────┘ │
    │                                                                       │
    └───────────────────────────────────────────────────────────────────────┘
           ↑                                             │
           │                                             ↓
    ┌──────┴──────┐                               ┌──────┴──────┐
    │  IMPORT     │                               │   DEPLOY    │
    │  prompts    │                               │   winner    │
    └──────┬──────┘                               └──────┬──────┘
           │                                             │
           ↑                                             ↓
╔══════════╧═════════════════════════════════════════════╧══════════╗
║                     YOUR PRODUCTION STACK                         ║
║                                                                   ║
║   ┌───────────┐    ┌───────────┐    ┌───────────┐                ║
║   │Observabil-│    │  Manual   │    │  Custom   │                ║
║   │ity tools  │    │  paste    │    │   API     │                ║
║   │(LangSmith,│    └───────────┘    └───────────┘                ║
║   │ Langfuse) │                                                  ║
║   └───────────┘                                                  ║
║                                                                   ║
╚═══════════════════════════════════════════════════════════════════╝

The key insight: Your agents' prompts don't live in Converra — they live in your production systems. Converra connects to where they already are, improves them, and puts the proven versions back.

Agent Systems (V3)

If your production workflow uses multiple prompts (for example, a router handing off to specialists), Converra can discover an agent system from imported traces and evaluate prompts in system context.

What changes compared to single-prompt optimization:

Simulations include realistic handoff context from earlier steps.
Results can be grouped by path (the prompt sequence taken) for fair comparisons.
System metrics are diagnostic; winner selection stays apples-to-apples within comparable paths.

Import: Where Agents Come From

Converra pulls agents from where they already live:

Source	How It Works
LangSmith / Langfuse	Import agents + conversation traces from your observability data
SDK	Wrap your LLM client — agents and conversations captured automatically
API	Push agents programmatically from your deployment pipeline
Manual	Paste prompts directly for quick testing

See Integrations for setup details.

The Optimization Loop

1. Analyze Agent

Converra analyzes your agent's prompt to understand:

Structure and formatting
Goals and constraints
Potential improvement areas

2. Generate Variants

AI creates alternative versions of your agent's prompt:

Each variant targets specific improvements
Variants maintain your core requirements
Typically 3-5 variants are tested

3. Simulate

Each variant is tested against simulated personas:

Diverse user types (frustrated, technical, new, etc.)
Multiple conversation scenarios
Realistic interaction patterns

4. Regression Test

When a leading variant emerges, the system automatically tests it against a "golden set" of scenarios:

Golden set: Scenarios your baseline agent handles reliably (auto-generated)
Short exchanges: 2-3 turns per scenario for fast validation
Pass/fail: Each scenario must maintain baseline performance

If regressions are found, you see the tradeoff: "Improved X but regressed on Y. Apply anyway?"

See Regression Testing for details.

5. Select Winner

Performance is evaluated across metrics:

Task completion rate
Response quality
User sentiment
Goal achievement
Regression test results

The best-performing variant is identified.

Deploy: CI/CD for Agent Improvements

Deployment works like CI/CD for your agents. Every fix is simulation-tested and regression-checked before it ships — the same way code passes tests before it merges. Deploy automatically, via GitHub PR, or with manual review. Most teams start with PRs and move to auto-deploy within a week.

Destination	How It Works
Auto-deploy	Winners that pass regression testing ship automatically. Instant rollback if any metric regresses
GitHub PR	Auto-creates a PR with the improved prompt, metrics, and evidence. Merge to deploy
API/Webhook	Notify your systems to pull the new version automatically
Dashboard	Review and apply winners manually

The goal is a closed loop: agents flow from your stack → through optimization → back to your stack. GitHub PR integration works like Dependabot for AI agents — connect your repo, and Converra creates PRs when improvements are proven.

What Gets Optimized

Aspect	Example Improvement
Clarity	Clearer instructions, better structure
Tone	More appropriate formality level
Efficiency	Shorter responses that still work
Completeness	Better coverage of edge cases
Consistency	More predictable behavior

Optimization Modes

Exploratory Mode

Best for: Finding improvements quickly

Fewer simulations per variant
Faster results (minutes)
Good for iteration

Validation Mode

Best for: Production decisions

More simulations per variant
Statistical confidence
Takes longer but more reliable

Replay Mode

Best for: Verifying fixes on real failures

Tests variants against imported production traces (offline)
Confirms fixes work on the exact cases that failed
Available when you've imported traces from LangSmith

What Stays the Same

Converra preserves your:

Core purpose and role
Key constraints and boundaries
Required output formats
Brand voice fundamentals

Simulation Personas

Your agents are tested against diverse simulated users:

Persona	Tests
Frustrated Customer	De-escalation, empathy
Technical User	Accuracy, depth
New User	Clarity, onboarding
Impatient User	Conciseness
Confused User	Patience, explanation

You can also create custom personas matching your actual users.

Metrics Evaluated

Primary Metrics

Task Completion - Did the AI help the user achieve their goal?
Response Quality - Was the response accurate and helpful?
User Sentiment - How would the user feel about the interaction?

Secondary Metrics

Conciseness - Appropriate length for the context
Consistency - Similar situations handled similarly
Safety - Stayed within appropriate boundaries

Example Optimization

Original Prompt:

You are a customer support agent. Help users with their questions.

Optimized Variant (Winner):

You are a customer support agent for TechCorp. Your goal is to
resolve issues quickly while maintaining a friendly tone.

When helping users:
1. Acknowledge their issue
2. Ask clarifying questions if needed
3. Provide a clear solution
4. Confirm the issue is resolved

If you can't resolve an issue, offer to escalate to a specialist.

Improvement: +34% task completion, +28% user satisfaction

Next Steps

Running Optimizations - Start an optimization
Understanding Results - Interpret results
Regression Testing - How regressions are detected
Best Practices - Write optimization-ready agent prompts

How Optimization Works ​

The Full Lifecycle ​

Agent Systems (V3) ​

Import: Where Agents Come From ​

The Optimization Loop ​

1. Analyze Agent ​

2. Generate Variants ​

3. Simulate ​

4. Regression Test ​

5. Select Winner ​

Deploy: CI/CD for Agent Improvements ​

What Gets Optimized ​

Optimization Modes ​

Exploratory Mode ​

Validation Mode ​

Replay Mode ​

What Stays the Same ​

Simulation Personas ​

Metrics Evaluated ​

Primary Metrics ​

Secondary Metrics ​

Example Optimization ​

Next Steps ​