29 October 2026 11:00 - 11:30
Taming non-determinism: A framework for evaluation and observability in autonomous agent trajectories
Deploying agentic AI in production introduces a unique engineering challenge: debugging non-deterministic execution paths.
Unlike traditional software, an agent's "code" is a dynamic interplay of prompt context, model weights, and external tool outputs. This talk presents a rigorous engineering methodology for agents, focusing on quantifying agent performance and analyzing trajectories.
Key takeways:
→Trajectory evaluations: Analyzing the "Reasoning Trace" using a secondary judge model to detect hallucinated logic steps or tool misuse.
→ Cost-latency trade-offs: Optimizing token usage via dynamic context compression and speculative execution of tool calls.
→ Sandboxing & side-effects: Technical implementation of ephemeral execution environments to safely contain agentic code execution.