Fireworks
Authentication Type: API Key
Description: Fast AI inference with Fireworks AI models, including audio transcription with speaker diarization using Whisper models.
Authentication
To authenticate, you'll need a Fireworks AI API key. Learn how to create one in the Fireworks AI documentation.
Whisper
Audio transcription with speaker diarization using Whisper models.
Transcribe Audio
Transcribe audio from a URL with speaker diarization. Returns timestamped segments with speaker labels.
Operation Type: Query (Read)
Parameters:
- audioUrl
string(required): URL to the audio file to transcribe (max 1GB) - language
string(nullable): Target language for transcription (ISO-639-1 format, e.g., "en", "es"). If not provided, language will be auto-detected. - prompt
string(nullable): Optional prompt to guide transcription style. E.g., "Um, here's, uh, what was recorded." to include filler words. - model
string(nullable): Model to use. Options: "whisper-v3-turbo" (default) for fast processing, "whisper-v3" for best quality. - minSpeakers
number(nullable): Minimum number of speakers to detect for diarization - maxSpeakers
number(nullable): Maximum number of speakers to detect for diarization
Returns:
- segments
array of objects: Array of transcribed segments with speaker diarization- id
number: Segment ID - start
number: Start timestamp in seconds - end
number: End timestamp in seconds - text
string: Transcribed text for this segment - speakerId
string(nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
- id
- text
string(nullable): Full transcribed text - duration
number(nullable): Duration of the audio in seconds - language
string(nullable): Detected language of the audio
Example Usage:
{
"audioUrl": "https://example.com/audio/meeting-recording.mp3",
"language": "en",
"model": "whisper-v3",
"minSpeakers": 2,
"maxSpeakers": 5
}
Common Use Cases
Meeting Transcription:
- Transcribe recorded meetings with speaker identification
- Generate meeting notes with timestamps and speaker labels
- Create searchable archives of recorded conversations
Content Creation:
- Transcribe podcasts and video content for accessibility
- Generate subtitles with accurate speaker attribution
- Convert audio interviews into written content
Analysis & Research:
- Transcribe customer calls for quality analysis
- Process research interviews with speaker segmentation
- Analyze multi-speaker audio content for insights