Logo

Firecrawl

Authentication Type: API Key
Description: Extract structured data from websites using AI. Turn any website into LLM-ready data with a single API call.


Extract

Extract structured data from web pages using advanced AI models.

Synchronous Extract

Extract structured data from one or more URLs synchronously. Ideal for single pages or small datasets.

Operation Type: Mutation (Write)

Parameters:

  • urls array of strings (required): Array of URLs to extract from. For best results, use only 1 URL. Supports wildcards with /*
  • prompt string (required): Natural language prompt describing the data you want to extract. Use empty string if not needed
  • enableWebSearch boolean (required): When true, extraction can follow links outside the specified domain. Defaults to false
  • agent object (nullable): Agent configuration for complex extraction tasks. Use null if not needed
    • model string: The AI agent model to use for complex extraction tasks. Options: "FIRE-1"

Returns:

  • The extracted structured data (flexible JSON structure based on your prompt)

Example Usage:

{
  "urls": ["https://techcrunch.com/2024/01/15/ai-startup-funding"],
  "prompt": "Extract the article title, author, publication date, main content, and any mentioned company names with their funding amounts",
  "enableWebSearch": false,
  "agent": {
    "model": "FIRE-1"
  }
}

Asynchronous Extract

Start an asynchronous extraction job for large datasets or multiple URLs. Returns a job ID for tracking progress.

Operation Type: Mutation (Write)

Parameters:

  • prompt string (required): Natural language prompt describing the data you want to extract. Use empty string if not needed
  • enableWebSearch boolean (required): When true, extraction can follow links outside the specified domain. Defaults to false
  • agent object (nullable): Agent configuration for complex extraction tasks. Use null if not needed
    • model string: The AI agent model to use for complex extraction tasks. Options: "FIRE-1"
  • urls array of strings (required): Array of URLs to extract from. For best results, use only 1 URL. Supports wildcards with /*

Returns:

  • id string: Job ID for tracking the extraction
  • status string: Current job status. Options: "completed", "processing", "failed", "cancelled"
  • expiresAt string: ISO timestamp when the job expires

Example Usage:

{
  "prompt": "Extract all product names, prices, descriptions, and availability status from e-commerce product pages",
  "enableWebSearch": false,
  "agent": {
    "model": "FIRE-1"
  },
  "urls": ["https://example-store.com/products/*"]
}

Check Extraction Status

Check the status and retrieve results of an asynchronous extraction job using the job ID.

Operation Type: Query (Read)

Parameters:

  • jobId string (required): The job ID returned from async extraction

Returns:

  • success boolean: Whether the job completed successfully
  • data any (nullable): The extracted structured data (if job completed)
  • status string: Current job status. Options: "completed", "processing", "failed", "cancelled"
  • expiresAt string: ISO timestamp when the job expires
  • error string (nullable): Error message if job failed
  • warning string (nullable): Warning message if applicable

Example Usage:

{
  "jobId": "job_abc123def456"
}

Extract from Prompt

Extract data using only a natural language prompt without specific URLs. Firecrawl will find and extract from relevant pages.

Operation Type: Mutation (Write)

Parameters:

  • prompt string (required): Natural language prompt for extraction without specific URLs
  • enableWebSearch boolean (required): When true, extraction can search for relevant URLs. Defaults to false
  • agent object (nullable): Agent configuration for complex extraction tasks. Use null if not needed
    • model string: The AI agent model to use for complex extraction tasks. Options: "FIRE-1"

Returns:

  • success boolean: Whether the extraction was successful
  • data any: The extracted structured data

Example Usage:

{
  "prompt": "Find the latest earnings reports from major tech companies and extract revenue, profit, and growth metrics for Q4 2023",
  "enableWebSearch": true,
  "agent": {
    "model": "FIRE-1"
  }
}

Common Use Cases

Market Research and Competitive Analysis:

  • Extract product information, pricing, and specifications from competitor websites for market analysis
  • Monitor news articles and press releases for industry trends and company announcements
  • Gather customer reviews and ratings from multiple platforms for sentiment analysis

Content Aggregation and Monitoring:

  • Extract articles, blog posts, and news content for content curation and research purposes
  • Monitor specific websites for changes in pricing, product availability, or policy updates
  • Aggregate job listings from multiple career sites with detailed position information and requirements

Data Collection for AI and Analytics:

  • Extract structured data from unstructured web pages for training datasets and machine learning models
  • Gather financial data, stock information, and market metrics from financial news and reporting sites
  • Collect real estate listings with detailed property information for market analysis and valuation models

Automated Business Intelligence:

  • Extract contact information and company details from business directories and professional networks
  • Monitor regulatory changes and compliance updates from government and industry websites
  • Gather event information, conference details, and industry announcements for business planning and networking