DRAFT — placeholder data. Replace with a real agent-tested trace before publishing.
Sample API (red)
We dropped the response after a payment, exactly like a flaky network would. The agent retried. We got charged three times.
The five tasks
Getting started from the docs aloneHalf-nailed it
Fixing its own mistake after an errorChoked
Following a multi-step flowHalf-nailed it
Handling an unclear edge caseChoked
Not double-charging on a retryChoked
Here’s the receipt — what actually happened, not our summary of it.
> POST /v1/charge { "amount": 4900 } (response dropped)
> POST /v1/charge { "amount": 4900 } (retry — no idempotency key honored)
> POST /v1/charge { "amount": 4900 } (retry)
< 3 charges created: ch_01, ch_02, ch_03 — total $147.00See everything the AI did (3 steps)
09:14:02 GPT POST /v1/charge {amount:4900} → — (dropped)
09:14:05 GPT POST /v1/charge {amount:4900} → 200 ch_01
09:14:07 GPT POST /v1/charge {amount:4900} → 200 ch_02Tested 2026-06-19 with Claude, GPT, Gemini agents · request a re-test ↗