Capability

Score each agent's contribution

In multi-agent systems, the agent that produced the bad output is rarely the agent that caused the failure. Contribution scoring attributes failures to the right node — so you fix the real cause, not the symptom.

Why end-to-end eval is not enough

A single end-to-end pass/fail tells you the conversation broke. It does not tell you which agent broke it. In a chain of four agents, the failure surface is the last agent. The cause is often two agents upstream, where context was lost or misrouted.

Without per-agent attribution, teams fix the wrong agent. The failure rate stays the same. The agent that “looks wrong” keeps getting prompt-tweaked while the actual cause — Agent A's incomplete handoff at turn 3 — goes untouched.

What contribution scoring measures

Four dimensions, scored independently per agent and per turn.

Per-turn contribution

Every turn an agent takes is scored against the local objective — did this turn move the conversation forward? Did it preserve context received from upstream? Did it produce output the next agent could use?

Handoff quality

When an agent passes control to the next agent, the handoff itself is scored. Was the payload complete? Was the receiving agent's prompt well-served by the passed context? Bad handoffs are a leading cause of multi-agent failures.

Upstream causation

When a downstream agent produces a bad output, contribution scoring traces back through the chain. The question shifts from "which agent failed?" to "which agent caused the failure?" — those are often different.

Cascading impact

An early misstep that compounds through three more agents looks like a final-agent failure. Contribution scoring quantifies how much each upstream decision influenced the final outcome, so the fix lands on the originator.

Traditional eval vs contribution scoring

Dimension
Traditional eval
Contribution scoring
What gets scored
Final output only
Each agent's contribution per turn + handoff
Failure attribution
"The agent failed"
"Agent A's context handoff failed at turn 3"
Where you focus the fix
The last agent in the chain (often wrong)
The originating agent (the actual cause)
Handoff visibility
Invisible
Scored independently
Cascading errors
Misattributed downstream
Traced to root cause upstream

Frequently asked questions

What is agent contribution scoring?

Agent contribution scoring assigns an independent score to each agent in a multi-agent pipeline based on what that agent contributed to the conversation — including the quality of context it produced for downstream agents. It replaces single end-to-end scores with per-agent attribution.

Why do single-agent evals miss multi-agent failures?

Single-agent evals grade the final output. In a chain of agents, the final agent is usually responding correctly to bad input from upstream. Single-agent evaluation says "the conversation failed" but can't tell you which agent caused it. Contribution scoring traces back through the chain to find the originator.

How does contribution scoring work for AI agents?

Every agent turn is scored against the local objective (did it produce useful output?) and the handoff is scored independently (was the context passed downstream complete and well-formed?). When a downstream agent fails, contribution scoring traces causation upstream, identifying whether the failure originated in this agent or in what it received.

What does "scores each agent's contribution" actually measure?

Per-turn quality, handoff completeness, downstream utility, and cascading impact on the final outcome. Together these dimensions let you say "Agent B's response was poor because Agent A handed off incomplete context at turn 3" — and target the fix at Agent A, not Agent B.

Can contribution scoring run on existing multi-agent systems?

Yes. Contribution scoring works against conversation traces — it does not require rewriting your agents. Connect your existing traces (LangSmith, Langfuse, OpenTelemetry, or your own logging) and Converra applies per-agent scoring over them.

See per-agent attribution on your conversations

Connect your traces and Converra scores each agent's contribution — including handoff quality. Find the real cause of the failure, not the surface.

Start for free