Fireworks

Authentication Type: API Key
Description: Fast AI inference with Fireworks AI models, including audio transcription with speaker diarization using Whisper models.

Authentication

To authenticate, you'll need a Fireworks AI API key. Learn how to create one in the Fireworks AI documentation.

Whisper

Audio transcription with speaker diarization using Whisper models.

Transcribe Audio

Transcribe audio from a URL with speaker diarization. Returns timestamped segments with speaker labels.

Operation Type: Query (Read)

Parameters:

audioUrl string (required): URL to the audio file to transcribe (max 1GB)
language string (nullable): Target language for transcription (ISO-639-1 format, e.g., "en", "es"). If not provided, language will be auto-detected.
prompt string (nullable): Optional prompt to guide transcription style. E.g., "Um, here's, uh, what was recorded." to include filler words.
model string (nullable): Model to use. Options: "whisper-v3-turbo" (default) for fast processing, "whisper-v3" for best quality.
minSpeakers number (nullable): Minimum number of speakers to detect for diarization
maxSpeakers number (nullable): Maximum number of speakers to detect for diarization

Returns:

segments array of objects: Array of transcribed segments with speaker diarization
- id number: Segment ID
- start number: Start timestamp in seconds
- end number: End timestamp in seconds
- text string: Transcribed text for this segment
- speakerId string (nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
text string (nullable): Full transcribed text
duration number (nullable): Duration of the audio in seconds
language string (nullable): Detected language of the audio

Example Usage:

{
  "audioUrl": "https://example.com/audio/meeting-recording.mp3",
  "language": "en",
  "model": "whisper-v3",
  "minSpeakers": 2,
  "maxSpeakers": 5
}

Common Use Cases

Meeting Transcription:

Transcribe recorded meetings with speaker identification
Generate meeting notes with timestamps and speaker labels
Create searchable archives of recorded conversations

Content Creation:

Transcribe podcasts and video content for accessibility
Generate subtitles with accurate speaker attribution
Convert audio interviews into written content

Analysis & Research:

Transcribe customer calls for quality analysis
Process research interviews with speaker segmentation
Analyze multi-speaker audio content for insights