Sales·Sales Research Analyst

Get financials on a company

We asked an AI to get funding data for a sales prospect. 18 of 24 models found the right number.

18/24

models passed

$0.022

cheapest pass

$0.151

avg cost of passes

$1.91

costliest fail

Why this benchmark exists.

Every sales rep before a discovery call needs the same three facts: what does this company do, how much money do they have, and who gave it to them. The data lives in Crunchbase, but models often default to their own training data and hallucinate the wrong answer. Researching prospects before calls is the cheapest and most common agent job our customers run.

What we asked it.

Find Hightouch on Crunchbase and return their total funding, last round type, and a one-line description.

System prompt

You are a sales research analyst. Your job is to prepare research briefs for sales reps before customer calls. ALWAYS use the available tools to verify facts; your training data is stale and may hallucinate funding amounts. Prefer Crunchbase for funding data; use Google search to first locate the right slug. If a tool call returns no data, retry once with a different identifier before giving up.

Tools available

Google Search
Site Scraper
Crunchbase

How it's graded

Did it find the right funding amount?
Did it get the right round (Series E)?
Did it return a link to real Crunchbase data, not made up from memory?

What we saw.

We asked a sales research analyst agent to research a company called Hightouch and get the data for their last funding round. The rubric checks to see if there's a real funding number, the actual series raised, a real description, and a Crunchbase URL that mentions the company.

12 models passed. The ones that failed were really interesting: two of them bailed before calling any tools, one answered from memory, and one burned a lot of money thinking without producing anything.

What worked.

The cheapest passes used three to five tool calls. Gemini, Grok, GLM, and Mistral all did the correct thing: found Hightouch's Crunchbase entry with one Google search, then hit the Crunchbase tool, and returned cleanly. This is the perfect pass.
The more expensive models ended up trying to verify using other sources, which is good but ends up costing money, especially when the underlying credits are fairly expensive.

How it broke.

A lot of models said they were going to do the work, but then never actually did it. They might not be powerful enough, or they might have been waiting for a follow-up response before continuing.
Llama 4 didn't actually use any tool calls. It hallucinated the funding amount based on what it thought was the right answer from its own training memory.
Qwen 3.6 thought a lot and used a ton of tool calls, but never actually returned any output.

Results by model

24 models, ranked.

Passes first, sorted cheap → expensive. Failures last, sorted by how much budget they burned producing nothing.

GPT-5.6 Terra

$0.0222 toolsPassed

GGLM 5.2

$0.0302 toolsPassed

Gemini 3 Flash

$0.0323 toolsPassed

3 tool calls, $0.032. Cheapest pass. Google → Crunchbase → JSON. The minimal-viable agent chain.

Grok 4.1 Fast

$0.0356 toolsPassed

GPT-5.6 Sol

$0.0362 toolsPassed

Nemotron 3 Super 120B

$0.0454 toolsPassed

GGLM 5.1

$0.0483 toolsPassed

Mistral Large 3

$0.0587 toolsPassed

7 tool calls, $0.058. Used Google to find the Crunchbase slug, hit Crunchbase, scraped the page to verify the funding number. The reference shape for what "good" looks like on this task.

$0.0863 toolsPassed

$0.0893 toolsPassed

$0.1006 toolsPassed

$0.1045 toolsPassed

$0.1047 toolsPassed

$0.14610 toolsPassed

10 tool calls, $0.146. The most thorough successful run; every call chained productively without thrashing.

$0.14810 toolsPassed

$0.1633 toolsPassed

$0.25420 toolsPassed

$1.2217 toolsPassed

17 tool calls, $1.22. Opus cross-checked Crunchbase against the Hightouch blog and a press release before committing. Correct answer at the highest passing price.

Qwen 3.6 Plus

$1.919 toolsFailed

✗ 9 tool calls, $1.91. The single most expensive failure across the entire matrix. Qwen engaged, retrieved data, thought about it for $1.91 worth of tokens, then closed the run with no JSON.

Nemotron 3 Nano 30B

$0.0323 toolsFailed

Right data, wrong format — returned lastRoundType "series_d" instead of "Series D".

Nemotron 3 Ultra 550B

$0.0121 toolFailed

Emitted a tool call as plain text instead of a real call; the agent never got a final answer.

Claude Haiku 4.5

$0.0071 toolFailed

✗ Wrote "I'll help you research Hightouch..." as the entire output. 1 tool call, $0.007. The polite-customer-service failure mode in plain view.

DeepSeek V3.1

$0.0010 toolsFailed

✗ 0 tool calls, $0.0005. "Let me search for..." — and nothing else. DeepSeek bails on Sales when the system prompt is persona-heavy.

Llama 4 Maverick

$0.0010 toolsFailed

✗ 0 tool calls, $0.0005. Wrote a prose paragraph about Hightouch's funding from training memory. The hallucination case the rubric is built to catch.

All agents Build your own agent