Benchmarks
GPT-5 Mini logo

openai

GPT-5 Mini

GPT-5 Mini quietly did the matrix in $0.38 — bring receipts the next time someone says "mini is too dumb."

5/5
benchmarks passed
$0.376
spent in total
23
tool calls

What it is.

OpenAI's mini-tier in the GPT-5 family. Pitched as the cost-sensible option when full GPT-5 is overkill; in our matrix it earned that pitch.

What it does well.

  • 5/5 pass at $0.38 total — the second-cheapest perfect score in the matrix.
  • Aggressive but productive tool use on the Crunchbase brief: 10 tool calls, every one chained to the next, no thrashing.
  • Coding and Web Scraping rubrics passed under $0.04 each — the cheap-and-correct quadrant most production agents should live in.

Where to be careful.

  • Don't confuse this with full GPT-5. The mini variant trades raw reasoning for cost; tasks that need long planning chains will hit a wall before they hit budget.
  • We didn't observe failures, but it's also the model in our suite with the least public documentation on context-window behavior over multi-hour chats.
  • All the same caveats as the rest of the matrix: short-horizon tasks, no adversarial inputs, no multi-tenant stress.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

Ten tool calls — the most of any 5/5 model — but every one chained productively. Looked up Hightouch via Google, paged through Crunchbase results, verified against the company page. Got it right for $0.146.

$0.14610 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Passed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

Four tool calls and out: one Google query, one Reddit thread pull, then summarized. Clean.

$0.0564 tool calls
Passed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

Returned all three complaints with verbatim quotes for $0.108. Reasonable middle of the pack.

$0.1084 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Passed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

$0.037 — second-cheapest Stripe pass behind Mistral. Read docs, wrote the function, moved on.

$0.0372 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Passed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

Three tool calls, all four Apollo tiers extracted, $0.029. Nearly as cheap as Mistral.

$0.0293 tool calls