google

Gemini 3.1 Pro

Pro-tier Gemini got the answer every time. CX cost $1.48 on a benchmark Mistral nailed for $0.066.

5/5

benchmarks passed

$2.73

spent in total

tool calls

What it is.

Google's Gemini 3.1 Pro Preview — the larger, reasoning-heavy sibling of Flash. Marketed as a frontier reasoning model; priced like one.

5/5 pass with no schema drift, no JSON-envelope leakage, no "### Output:" prefacing.
Most disciplined Marketing run from the Gemini family: 5 tool calls, all five tools with mention counts.
Crunchbase was $0.086 — the second-cheapest Sales pass in the matrix.

CX cost $1.48 — more than every other 5/5 model and 22x what Mistral spent on the same task. Pro Gemini's thinking budget went somewhere; the answer didn't need it.
Marketing was $0.666 — solidly more expensive than GPT-5.5 ($0.435) for the same correct output. You're paying for reasoning the rubric didn't ask for.
Total spend was $2.73 — second-most-expensive 5/5 model in the matrix, behind only Sonnet and Opus. Reach for Pro when you've measured that the median Flash answer isn't good enough.

Results by agent

SalesSales Research Analyst

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

3 tool calls, $0.086. Found the slug, hit Crunchbase, returned JSON. Pro behaving like Flash, in a good way.

MarketingMarketing Research Analyst

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

5 tool calls, $0.666. The reasoning step is where the budget goes; the retrieval is normal.

CXCustomer Insights Analyst

Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

15 tool calls, $1.48. Pro decided the AirPods complaints task warranted reading nearly every review on the page. Output was correct.

CodingSenior Engineer Assistant

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

7 tool calls, $0.388. Solid, working webhook code with the right HMAC + timing-safe primitives.

Web ScrapingSales Outreach Specialist

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

2 tool calls, $0.108. All four Apollo tiers extracted.