I was responsible for shipping AI into production at Totango—and at scale, every release felt like a gamble — and a time sink.
The process was familiar: adjust a prompt, manually test expected cases, scan logs, deploy, then wait for real users to surface edge cases. I could validate what I anticipated—but production behavior was always messier, more diverse, and harder to predict.
Learning happened after deployment, not before it. Improvements were inferred from downstream signals, not proven upfront. Confidence came only after issues appeared—or didn't.
After leaving Totango, I advised a founder building an AI product already seeing real usage. Different team, different stack, same failure mode. Manual testing. Log-driven analysis. Post-hoc fixes. The pattern repeated.
That's when it became clear this wasn't a team problem or a tooling gap.
It was structural.
As systems scale, we stop relying on intuition:
But production AI agents—systems expected to reason, adapt, and interact with humans—are still improved through manual iteration and hindsight.
That approach doesn't scale.
Every AI team now has observability—logs, traces, dashboards. Most have evals too. LangSmith, Langfuse, Braintrust all added evaluation capabilities. Visibility isn't the gap anymore.
The gap is: who does the work?
Teams see the problem (dashboards), measure the problem (evals), then manually hypothesize fixes, manually test, manually decide what ships. The loop is still open. Improvement happens in sprints, when someone has time, when there's a fire.
The teams with autonomous optimization will compound faster. Everyone else falls behind.
This isn't about fixing what's broken. It's about getting better every day—quality, speed, cost—without manual effort. The company plugged into an autonomous loop improves continuously. The company doing it by hand improves occasionally.
Over time, that gap becomes insurmountable.
These questions matter. But answering them isn't enough. The bottleneck is acting on the answers—generating the next variant, validating it, deciding it's safe to deploy.
Frameworks help generate variants. Observability explains outcomes. Evals measure quality. Converra closes the loop—it decides which changes should ship.
Converra is the performance layer for production AI agents—built for teams shipping to real users. It closes the loop between simulation, evaluation, and deployment so agent behavior can evolve safely, deliberately, and with confidence.
The goal isn't more prompts.
It's fewer wrong decisions.
One founder. Full-stack ownership. Every decision—architecture, design, product, UX—ships immediately.
AI-augmented development changes what's possible. Direct control over every layer means faster iteration, tighter feedback loops, and no translation loss between insight and implementation.
The result is a system designed with end-to-end coherence—where the evaluation layer, simulation engine, and deployment logic all evolve together.
Previously founded Buildup, a construction tech company later acquired by Stanley Black & Decker.
As VP Product Growth at Totango, owned AI end-to-end—design through production—while re-accelerating ARR growth from ~10% to 43% YoY.
Computer Science, Technion. Based in New York City.
Ready to see how Converra can help your team ship with confidence?