Benchmarks
Gemini 3.1 Pro logo

google

Gemini 3.1 Pro

Pro-tier Gemini got the answer every time. CX cost $1.48 on a benchmark Mistral nailed for $0.066.

5/5
benchmarks passed
$2.73
spent in total
32
tool calls

What it is.

Google's Gemini 3.1 Pro Preview — the larger, reasoning-heavy sibling of Flash. Marketed as a frontier reasoning model; priced like one.

What it does well.

  • 5/5 pass with no schema drift, no JSON-envelope leakage, no "### Output:" prefacing.
  • Most disciplined Marketing run from the Gemini family: 5 tool calls, all five tools with mention counts.
  • Crunchbase was $0.086 — the second-cheapest Sales pass in the matrix.

Where to be careful.

  • CX cost $1.48 — more than every other 5/5 model and 22x what Mistral spent on the same task. Pro Gemini's thinking budget went somewhere; the answer didn't need it.
  • Marketing was $0.666 — solidly more expensive than GPT-5.5 ($0.435) for the same correct output. You're paying for reasoning the rubric didn't ask for.
  • Total spend was $2.73 — second-most-expensive 5/5 model in the matrix, behind only Sonnet and Opus. Reach for Pro when you've measured that the median Flash answer isn't good enough.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

3 tool calls, $0.086. Found the slug, hit Crunchbase, returned JSON. Pro behaving like Flash, in a good way.

$0.0863 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

5 tool calls, $0.666. The reasoning step is where the budget goes; the retrieval is normal.

$0.6655 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

15 tool calls, $1.48. Pro decided the AirPods complaints task warranted reading nearly every review on the page. Output was correct.

$1.4815 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

7 tool calls, $0.388. Solid, working webhook code with the right HMAC + timing-safe primitives.

$0.3887 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

2 tool calls, $0.108. All four Apollo tiers extracted.

$0.1072 tool calls