anthropic

Claude Haiku 4.5

4/5 with a faceplant on the easiest benchmark — wrote "I'll help you research..." then logged off.

4/5

benchmarks passed

$0.941

spent in total

tool calls

What it is.

Anthropic's Haiku 4.5 — the smallest, fastest, cheapest Claude. Pitched as a workhorse for high-volume agent loops where Sonnet's bill is too high.

What it does well.

Where it didn't fail, it was fine. Reddit, CX, Stripe, and Apollo all passed cleanly with reasonable cost.
Most economical Coding pass in the Anthropic family — Opus and Sonnet both spent more than Haiku to write the same webhook verifier.
Total spend was $0.94 — competitive with the cheaper 5/5 models on price, if you can route around the one failure mode.

How it broke.

Failed Sales by writing "I'll help you research Hightouch..." as the entire response. No tool calls, no JSON, no answer — just an opening line. Cost $0.007 to produce nothing.
This is a known small-Claude pattern: when the system prompt is polite-helper-shaped, Haiku occasionally treats it like a customer-service exchange instead of an agent loop. The rubric punished it; production would too.
Stripe cost $0.632 — anomalously expensive for a Haiku, more than every Coding pass except Sonnet and Opus. The model went thinking-heavy on a task it should have breezed.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Failed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

✗ The faceplant: 1 tool call, $0.007. Wrote a single sentence of intent then stopped before doing anything. The whole benchmark in 10 words of useless preamble.

MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

4 tool calls, $0.125. Clean pass once Haiku decided to engage.

CXCustomer Insights Analyst

Find what customers are complaining about

Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

4 tool calls, $0.120. Pulled reviews, grouped complaints, returned three with verbatim quotes.

CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

5 tool calls, $0.632. Anomalously expensive Haiku run. Final code passed the rubric but you paid for thinking that didn't help.

Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

3 tool calls, $0.058. All four Apollo tiers, clean.

All models Run Claude Haiku 4.5 on your data