Benchmarks
Claude Haiku 4.5 logo

anthropic

Claude Haiku 4.5

4/5 with a faceplant on the easiest benchmark — wrote "I'll help you research..." then logged off.

4/5
benchmarks passed
$0.941
spent in total
17
tool calls

What it is.

Anthropic's Haiku 4.5 — the smallest, fastest, cheapest Claude. Pitched as a workhorse for high-volume agent loops where Sonnet's bill is too high.

What it does well.

  • Where it didn't fail, it was fine. Reddit, CX, Stripe, and Apollo all passed cleanly with reasonable cost.
  • Most economical Coding pass in the Anthropic family — Opus and Sonnet both spent more than Haiku to write the same webhook verifier.
  • Total spend was $0.94 — competitive with the cheaper 5/5 models on price, if you can route around the one failure mode.

How it broke.

  • Failed Sales by writing "I'll help you research Hightouch..." as the entire response. No tool calls, no JSON, no answer — just an opening line. Cost $0.007 to produce nothing.
  • This is a known small-Claude pattern: when the system prompt is polite-helper-shaped, Haiku occasionally treats it like a customer-service exchange instead of an agent loop. The rubric punished it; production would too.
  • Stripe cost $0.632 — anomalously expensive for a Haiku, more than every Coding pass except Sonnet and Opus. The model went thinking-heavy on a task it should have breezed.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Failed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

✗ The faceplant: 1 tool call, $0.007. Wrote a single sentence of intent then stopped before doing anything. The whole benchmark in 10 words of useless preamble.

$0.0071 tool call
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

4 tool calls, $0.125. Clean pass once Haiku decided to engage.

$0.1254 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

4 tool calls, $0.120. Pulled reviews, grouped complaints, returned three with verbatim quotes.

$0.1194 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

5 tool calls, $0.632. Anomalously expensive Haiku run. Final code passed the rubric but you paid for thinking that didn't help.

$0.6325 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

3 tool calls, $0.058. All four Apollo tiers, clean.

$0.0583 tool calls