Get financials on a company
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
3 tool calls, $0.086. Found the slug, hit Crunchbase, returned JSON. Pro behaving like Flash, in a good way.
Pro-tier Gemini got the answer every time. CX cost $1.48 on a benchmark Mistral nailed for $0.066.
Google's Gemini 3.1 Pro Preview — the larger, reasoning-heavy sibling of Flash. Marketed as a frontier reasoning model; priced like one.
Results by agent
Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.
3 tool calls, $0.086. Found the slug, hit Crunchbase, returned JSON. Pro behaving like Flash, in a good way.
Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.
5 tool calls, $0.666. The reasoning step is where the budget goes; the retrieval is normal.
Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.
15 tool calls, $1.48. Pro decided the AirPods complaints task warranted reading nearly every review on the page. Output was correct.
Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.
7 tool calls, $0.388. Solid, working webhook code with the right HMAC + timing-safe primitives.
Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.
2 tool calls, $0.108. All four Apollo tiers extracted.