AI Agents
Authentication Type: No Authentication Description: AI-powered analysis and processing tools including vision analysis, audio transcription with speaker diarization, and text embeddings generation.
Vision Analysis
AI-powered image and visual content analysis.
Analyze Image
Analyze an image using an AI model to extract insights, descriptions, objects, text, or other details. Provide the image URL and specify which AI model to use.
Operation Type: Query (Read)
Parameters:
- prompt
string(required): Prompt used to analyze the image - imageUrl
string(required): URL of the image to analyze - model
string(required): LLM model to use for image analysis. Options: "gpt-4o-mini", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "gemini-2.5-flash-preview-05-20"
Returns:
- result
string: The AI-generated analysis of the image
Example Usage:
{
"prompt": "Analyze this product image and describe its features, condition, and potential market value. Include details about materials, design elements, and any visible wear or damage.",
"imageUrl": "https://example.com/images/vintage-watch.jpg",
"model": "gpt-4o-mini"
}
Audio Transcription
Transcribe audio files with speaker diarization using OpenAI Whisper.
Transcribe with Speaker Diarization
Transcribe audio from a URL and identify different speakers in the conversation. Returns timestamped segments with speaker labels.
Operation Type: Query (Read)
Parameters:
- audioUrl
string(required): URL to the audio file to transcribe - language
string(nullable): Language of the audio (ISO-639-1 format, e.g., "en", "es") - prompt
string(nullable): Optional text to guide the model's style or continue a previous audio segment
Returns:
- segments
array of objects: Array of transcribed segments with speaker diarization- id
string: Segment ID - start
number: Start timestamp in seconds - end
number: End timestamp in seconds - text
string: Transcribed text for this segment - speaker
string(nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
- id
- language
string(nullable): Detected language of the audio - duration
number(nullable): Duration of the audio in seconds - text
string(nullable): Full transcribed text
Example Usage:
{
"audioUrl": "https://example.com/recordings/meeting-2024-01-15.mp3",
"language": "en",
"prompt": "This is a business meeting discussing quarterly sales targets"
}
Text Embeddings
Generate vector embeddings from text using OpenAI models for semantic search, RAG, or similarity comparisons.
Generate Embeddings
Generate vector embeddings from text for semantic search, RAG, or similarity comparisons.
When to use: Convert text into numerical vectors (embeddings) that capture semantic meaning.
Perfect for:
- Preparing text for vector search in databases like Turbopuffer
- RAG (Retrieval Augmented Generation) pipelines
- Semantic similarity comparisons
- Clustering similar content
Operation Type: Query (Read)
Parameters:
- text
string(required): Text to generate embeddings for (max 8191 tokens) - model
string(required): Embedding model to use. Options: "text-embedding-3-small", "text-embedding-3-large"- text-embedding-3-small (1536 dimensions) - Fast, cost-effective, recommended for most use cases
- text-embedding-3-large (3072 dimensions) - Higher quality, more expensive
Returns:
- embedding
array of numbers: The embedding vector for the input text - dimensions
number: Number of dimensions in the embedding vector
Example Usage:
{
"text": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data without being explicitly programmed.",
"model": "text-embedding-3-small"
}
Use with Turbopuffer: Generate embeddings for your text, then upsert to Turbopuffer with the vector for semantic search capabilities.
Common Use Cases
Audio Transcription & Analysis:
- Transcribe meeting recordings with automatic speaker identification for accurate meeting minutes
- Convert podcast episodes and interviews to text with speaker labels for content repurposing
- Process customer service calls to generate timestamped transcripts for quality assurance and training
- Create searchable archives of video content by extracting and transcribing audio tracks
AI & Machine Learning:
- Generate embeddings for RAG (Retrieval Augmented Generation) systems and AI chatbots
- Create semantic search indexes by combining embeddings generation with Turbopuffer storage
- Build content recommendation systems using vector similarity comparisons
- Develop AI assistants that understand context and meaning through embeddings
Product and Inventory Analysis:
- Analyze product images for e-commerce listings to generate detailed descriptions and feature lists
- Assess condition and quality of items from photos for resale marketplaces and inventory management
- Extract text from product labels and packaging for catalog management and compliance tracking
Content Moderation and Classification:
- Automatically classify and moderate user-generated image content for social platforms
- Detect inappropriate or harmful visual content in uploaded images and media files
- Analyze images for brand safety and advertising suitability across different content categories
Document and Data Extraction:
- Extract text and data from scanned documents, receipts, and business forms
- Analyze charts, graphs, and visual data presentations to extract key metrics and insights
- Process screenshots of applications and interfaces for quality assurance and testing documentation
Creative and Marketing Analysis:
- Analyze visual content for brand consistency and design quality assessment
- Generate detailed descriptions of artwork, photography, and creative assets for marketing purposes
- Evaluate visual elements in marketing materials for accessibility and design effectiveness