Get financials on a company
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
6 tool calls, $0.100. Standard Sonnet shape: clean retrieval, careful verification, no detours.
anthropic
5/5 every time, billed like a senior engineer. Worth it when correctness is non-negotiable; brutal at scale.
Anthropic's Sonnet 4.6 — the mid-tier in the Claude 4.X family, sitting between Haiku and Opus. The Anthropic crowd's default daily driver for production agent work.
Results by agent
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
6 tool calls, $0.100. Standard Sonnet shape: clean retrieval, careful verification, no detours.
Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.
$1.82 — the most expensive single benchmark run in the matrix on a task that passed. 5 tool calls; the cost is reasoning + long-context input.
Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.
4 tool calls, $0.723. Pulled reviews, grouped by recurrence, returned three verbatim complaints with quotes ≥10 chars.
Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.
10 tool calls, $1.11. The most thorough Stripe-docs read in the matrix; resulting code is also the most explicit.
Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.
2 tool calls, $0.084. The one task where Sonnet's bill was reasonable.