Logo

Fireworks

Authentication Type: API Key
Description: Fast AI inference with Fireworks AI models, including audio transcription with speaker diarization using Whisper models.


Authentication

To authenticate, you'll need a Fireworks AI API key. Learn how to create one in the Fireworks AI documentation.


Whisper

Audio transcription with speaker diarization using Whisper models.

Transcribe Audio

Transcribe audio from a URL with speaker diarization. Returns timestamped segments with speaker labels.

Operation Type: Query (Read)

Parameters:

  • audioUrl string (required): URL to the audio file to transcribe (max 1GB)
  • language string (nullable): Target language for transcription (ISO-639-1 format, e.g., "en", "es"). If not provided, language will be auto-detected.
  • prompt string (nullable): Optional prompt to guide transcription style. E.g., "Um, here's, uh, what was recorded." to include filler words.
  • model string (nullable): Model to use. Options: "whisper-v3-turbo" (default) for fast processing, "whisper-v3" for best quality.
  • minSpeakers number (nullable): Minimum number of speakers to detect for diarization
  • maxSpeakers number (nullable): Maximum number of speakers to detect for diarization

Returns:

  • segments array of objects: Array of transcribed segments with speaker diarization
    • id number: Segment ID
    • start number: Start timestamp in seconds
    • end number: End timestamp in seconds
    • text string: Transcribed text for this segment
    • speakerId string (nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
  • text string (nullable): Full transcribed text
  • duration number (nullable): Duration of the audio in seconds
  • language string (nullable): Detected language of the audio

Example Usage:

{
  "audioUrl": "https://example.com/audio/meeting-recording.mp3",
  "language": "en",
  "model": "whisper-v3",
  "minSpeakers": 2,
  "maxSpeakers": 5
}

Common Use Cases

Meeting Transcription:

  • Transcribe recorded meetings with speaker identification
  • Generate meeting notes with timestamps and speaker labels
  • Create searchable archives of recorded conversations

Content Creation:

  • Transcribe podcasts and video content for accessibility
  • Generate subtitles with accurate speaker attribution
  • Convert audio interviews into written content

Analysis & Research:

  • Transcribe customer calls for quality analysis
  • Process research interviews with speaker segmentation
  • Analyze multi-speaker audio content for insights