AI Agents

Authentication Type: No Authentication Description: AI-powered analysis and processing tools including vision analysis, audio transcription with speaker diarization, and text embeddings generation.

Vision Analysis

AI-powered image and visual content analysis.

Analyze Image

Analyze an image using an AI model to extract insights, descriptions, objects, text, or other details. Provide the image URL and specify which AI model to use.

Operation Type: Query (Read)

Parameters:

prompt string (required): Prompt used to analyze the image
imageUrl string (required): URL of the image to analyze
model string (required): LLM model to use for image analysis. Options: "gpt-4o-mini", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "gemini-2.5-flash-preview-05-20"

Returns:

result string: The AI-generated analysis of the image

Example Usage:

{
  "prompt": "Analyze this product image and describe its features, condition, and potential market value. Include details about materials, design elements, and any visible wear or damage.",
  "imageUrl": "https://example.com/images/vintage-watch.jpg",
  "model": "gpt-4o-mini"
}

Audio Transcription

Transcribe audio files with speaker diarization using OpenAI Whisper.

Transcribe with Speaker Diarization

Transcribe audio from a URL and identify different speakers in the conversation. Returns timestamped segments with speaker labels.

Operation Type: Query (Read)

Parameters:

audioUrl string (required): URL to the audio file to transcribe
language string (nullable): Language of the audio (ISO-639-1 format, e.g., "en", "es")
prompt string (nullable): Optional text to guide the model's style or continue a previous audio segment

Returns:

segments array of objects: Array of transcribed segments with speaker diarization
- id string: Segment ID
- start number: Start timestamp in seconds
- end number: End timestamp in seconds
- text string: Transcribed text for this segment
- speaker string (nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
language string (nullable): Detected language of the audio
duration number (nullable): Duration of the audio in seconds
text string (nullable): Full transcribed text

Example Usage:

{
  "audioUrl": "https://example.com/recordings/meeting-2024-01-15.mp3",
  "language": "en",
  "prompt": "This is a business meeting discussing quarterly sales targets"
}

Text Embeddings

Generate vector embeddings from text using OpenAI models for semantic search, RAG, or similarity comparisons.

Generate Embeddings

Generate vector embeddings from text for semantic search, RAG, or similarity comparisons.

When to use: Convert text into numerical vectors (embeddings) that capture semantic meaning.

Perfect for:

Preparing text for vector search in databases like Turbopuffer
RAG (Retrieval Augmented Generation) pipelines
Semantic similarity comparisons
Clustering similar content

Operation Type: Query (Read)

Parameters:

text string (required): Text to generate embeddings for (max 8191 tokens)
model string (required): Embedding model to use. Options: "text-embedding-3-small", "text-embedding-3-large"
- text-embedding-3-small (1536 dimensions) - Fast, cost-effective, recommended for most use cases
- text-embedding-3-large (3072 dimensions) - Higher quality, more expensive

Returns:

embedding array of numbers: The embedding vector for the input text
dimensions number: Number of dimensions in the embedding vector

Example Usage:

{
  "text": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data without being explicitly programmed.",
  "model": "text-embedding-3-small"
}

Use with Turbopuffer: Generate embeddings for your text, then upsert to Turbopuffer with the vector for semantic search capabilities.

Common Use Cases

Audio Transcription & Analysis:

Transcribe meeting recordings with automatic speaker identification for accurate meeting minutes
Convert podcast episodes and interviews to text with speaker labels for content repurposing
Process customer service calls to generate timestamped transcripts for quality assurance and training
Create searchable archives of video content by extracting and transcribing audio tracks

AI & Machine Learning:

Generate embeddings for RAG (Retrieval Augmented Generation) systems and AI chatbots
Create semantic search indexes by combining embeddings generation with Turbopuffer storage
Build content recommendation systems using vector similarity comparisons
Develop AI assistants that understand context and meaning through embeddings

Product and Inventory Analysis:

Analyze product images for e-commerce listings to generate detailed descriptions and feature lists
Assess condition and quality of items from photos for resale marketplaces and inventory management
Extract text from product labels and packaging for catalog management and compliance tracking

Content Moderation and Classification:

Automatically classify and moderate user-generated image content for social platforms
Detect inappropriate or harmful visual content in uploaded images and media files
Analyze images for brand safety and advertising suitability across different content categories

Document and Data Extraction:

Extract text and data from scanned documents, receipts, and business forms
Analyze charts, graphs, and visual data presentations to extract key metrics and insights
Process screenshots of applications and interfaces for quality assurance and testing documentation

Creative and Marketing Analysis:

Analyze visual content for brand consistency and design quality assessment
Generate detailed descriptions of artwork, photography, and creative assets for marketing purposes
Evaluate visual elements in marketing materials for accessibility and design effectiveness