Benchmarks
Grok 4.1 Fast logo

xai

Grok 4.1 Fast

Calls the tools, reads the responses, and then forgets to write anything down. 1/5.

1/5
benchmarks passed
$0.089
spent in total
22
tool calls

What it is.

xAI's Grok 4.1 Fast — the cost-tier variant in the Grok 4.1 family. Tested via xAI's first-party API.

What it does well.

  • Sales benchmark passed cleanly: 6 tool calls, $0.035. Lowest-cost Sales pass in the matrix that involved actual verification.

How it broke.

  • Empty final message after tool calls on Marketing (7 tools), CX (9 tools), and Apollo (0 tools). Pattern is: Grok runs the agent loop, processes tool responses, then closes the run without composing the final answer.
  • Coding was 0 tool calls and no output — Grok didn't even attempt to engage with the task.
  • Total spend was $0.090, but four out of five runs produced nothing. You're paying for tool retrieval the model couldn't be bothered to write up.

Results by agent

Five real jobs.

SalesSales Research Analyst

Get financials on a company

Passed

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

6 tool calls, $0.035. The one Grok pass: cleaned Hightouch's funding number out of Crunchbase, returned JSON. Possibly the most productive thing Grok did all benchmark.

$0.0356 tool calls
MarketingMarketing Research Analyst

Find what customers are recommending on Reddit

Failed

Read a real r/sales thread on enrichment tools and rank the top 5 by how many people recommended them.

✗ 7 tool calls, $0.053. Read Reddit successfully — then closed the run without writing the JSON. "Stops without writing the answer" pattern.

$0.0527 tool calls
Failed

Pull Google Shopping reviews for AirPods Pro 2 (USB-C) and return the top 3 recurring complaints, with verbatim quotes.

✗ 9 tool calls, $0.000. Most expensive empty-final in this set. Read every Google Shopping result; wrote nothing.

$0.0009 tool calls
CodingSenior Engineer Assistant

Read API docs and write working code

Failed

Read Stripe's official docs and write a real, working webhook-verification function in TypeScript.

✗ 0 tool calls, $0.000. Grok didn't even attempt the task.

$0.0000 tool calls
Web ScrapingSales Outreach Specialist

Scrape a competitor's pricing page

Failed

Scrape Apollo.io's pricing page and return every tier (Free, Basic, Professional, Organization) with name, price, and top 3 features.

✗ 0 tool calls, $0.002. Same — no engagement, empty final.

$0.0020 tool calls