Benchmarks
Gemini 3.5 Flash logo

google

Gemini 3.5 Flash

Newer Flash, almost double the bill of the model it replaced. We'd still pick the older one.

5/5
benchmarks passed
$1.41
spent in total
58
tool calls

What it is.

Google's Gemini 3.5 Flash — the December 2026 update to the Flash family. Pitched as smarter than Gemini 3 Flash; in our matrix it was slower on the same tasks and ~2x the cost.

What it does well.

  • Five for five, like its older sibling. Schema compliance was perfect.
  • Marketing benchmark was actually cheaper here than on Gemini 3 Flash ($0.118 vs $0.218). The newer model is better at deciding when to stop reading Reddit.

Where to be careful.

  • 36 tool calls on Coding. Read that again. 3.5 Flash decided the Stripe webhook task was a research project and kept poking at the docs until $0.487 ticked over.
  • CX was $0.623 — nearly 2x the previous-gen Flash. The model is more thorough; the rubric didn't ask for more thorough.
  • Total spend was $1.41 — more than GPT-5.5 and within striking distance of Gemini 3 Pro Preview. Use the cheaper Flash unless you've measured a quality gap on your workload.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

10 tool calls, $0.148 — 5x more tool calls than 3 Flash for an equivalent answer. The 'verify everything twice' instinct showed up here first.

$0.14810 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

5 tool calls, $0.118. The one benchmark where 3.5 Flash genuinely improved on the previous generation.

$0.1185 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

5 tool calls, $0.623 — nearly 2x the 3 Flash cost for the same correct answer.

$0.6235 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

36 tool calls, $0.487. Read every docs page it could find. Final code was correct but you could have written it yourself in less time.

$0.48636 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

Two tool calls, $0.035 — the model's most disciplined moment.

$0.0352 tool calls