Benchmarks
Gemini 3 Flash logo

google

Gemini 3 Flash

Cheap on the easy ones, ate $0.34 on AirPods complaints. Flash that occasionally remembers it's Pro.

5/5
benchmarks passed
$0.758
spent in total
25
tool calls

What it is.

Google's Gemini 3 Flash — the cost-tier sibling of Gemini 3 Pro. Optimized for latency and price; reasoning depth is supposed to be the tradeoff. Hosted on Vertex.

What it does well.

  • Three tool calls and $0.032 on the Crunchbase brief — the cheapest pass on Sales in the entire matrix.
  • Five for five, no JSON drama. The native Gemini structured-output mode handles the schema constraint without prompting acrobatics.
  • Total spend was $0.76 — still well below every Claude model and both GPT-5.5 and Gemini 3.5 Flash.

Where to be careful.

  • Pricing wobbled. The CX benchmark hit $0.34 (13 tool calls) — 10x more than Crunchbase. Flash's instinct to over-call when the task is open-ended is real.
  • Marketing ($0.22) cost more than 5x Mistral's run. Reddit threads with long comment chains burn a lot of input tokens at Flash's per-token rate.
  • We didn't see any reasoning failures, but the model is documented to drop schema fields under high-context load. Worth verifying before production.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

Three tool calls, $0.032 — cheapest Sales pass in the matrix. Found the slug, hit Crunchbase, returned JSON. No second-guessing.

$0.0323 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

Five tool calls, $0.22. Reddit threads bloat input context; Flash's per-token price is low but you still feel it on long threads.

$0.2175 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

13 tool calls, $0.34 — the splurge of the run. Flash decided to read more reviews than the rubric required. Result was correct.

$0.33913 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

Two tool calls, $0.093, clean implementation. Right-sized.

$0.0922 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

Two tool calls, $0.077. All four pricing tiers, three features each, in order.

$0.0772 tool calls