Logo

AI Agents

Authentication Type: No Authentication Description: AI-powered analysis and processing tools including vision analysis, audio transcription with speaker diarization, and text embeddings generation.


Vision Analysis

AI-powered image and visual content analysis.

Analyze Image

Analyze an image using an AI model to extract insights, descriptions, objects, text, or other details. Provide the image URL and specify which AI model to use.

Operation Type: Query (Read)

Parameters:

  • prompt string (required): Prompt used to analyze the image
  • imageUrl string (required): URL of the image to analyze
  • model string (required): LLM model to use for image analysis. Options: "gpt-4o-mini", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "gemini-2.5-flash-preview-05-20"

Returns:

  • result string: The AI-generated analysis of the image

Example Usage:

{
  "prompt": "Analyze this product image and describe its features, condition, and potential market value. Include details about materials, design elements, and any visible wear or damage.",
  "imageUrl": "https://example.com/images/vintage-watch.jpg",
  "model": "gpt-4o-mini"
}

Audio Transcription

Transcribe audio files with speaker diarization using OpenAI Whisper.

Transcribe with Speaker Diarization

Transcribe audio from a URL and identify different speakers in the conversation. Returns timestamped segments with speaker labels.

Operation Type: Query (Read)

Parameters:

  • audioUrl string (required): URL to the audio file to transcribe
  • language string (nullable): Language of the audio (ISO-639-1 format, e.g., "en", "es")
  • prompt string (nullable): Optional text to guide the model's style or continue a previous audio segment

Returns:

  • segments array of objects: Array of transcribed segments with speaker diarization
    • id string: Segment ID
    • start number: Start timestamp in seconds
    • end number: End timestamp in seconds
    • text string: Transcribed text for this segment
    • speaker string (nullable): Speaker identifier (e.g., "SPEAKER_00", "SPEAKER_01")
  • language string (nullable): Detected language of the audio
  • duration number (nullable): Duration of the audio in seconds
  • text string (nullable): Full transcribed text

Example Usage:

{
  "audioUrl": "https://example.com/recordings/meeting-2024-01-15.mp3",
  "language": "en",
  "prompt": "This is a business meeting discussing quarterly sales targets"
}

Text Embeddings

Generate vector embeddings from text using OpenAI models for semantic search, RAG, or similarity comparisons.

Generate Embeddings

Generate vector embeddings from text for semantic search, RAG, or similarity comparisons.

When to use: Convert text into numerical vectors (embeddings) that capture semantic meaning.

Perfect for:

  • Preparing text for vector search in databases like Turbopuffer
  • RAG (Retrieval Augmented Generation) pipelines
  • Semantic similarity comparisons
  • Clustering similar content

Operation Type: Query (Read)

Parameters:

  • text string (required): Text to generate embeddings for (max 8191 tokens)
  • model string (required): Embedding model to use. Options: "text-embedding-3-small", "text-embedding-3-large"
    • text-embedding-3-small (1536 dimensions) - Fast, cost-effective, recommended for most use cases
    • text-embedding-3-large (3072 dimensions) - Higher quality, more expensive

Returns:

  • embedding array of numbers: The embedding vector for the input text
  • dimensions number: Number of dimensions in the embedding vector

Example Usage:

{
  "text": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data without being explicitly programmed.",
  "model": "text-embedding-3-small"
}

Use with Turbopuffer: Generate embeddings for your text, then upsert to Turbopuffer with the vector for semantic search capabilities.


Common Use Cases

Audio Transcription & Analysis:

  • Transcribe meeting recordings with automatic speaker identification for accurate meeting minutes
  • Convert podcast episodes and interviews to text with speaker labels for content repurposing
  • Process customer service calls to generate timestamped transcripts for quality assurance and training
  • Create searchable archives of video content by extracting and transcribing audio tracks

AI & Machine Learning:

  • Generate embeddings for RAG (Retrieval Augmented Generation) systems and AI chatbots
  • Create semantic search indexes by combining embeddings generation with Turbopuffer storage
  • Build content recommendation systems using vector similarity comparisons
  • Develop AI assistants that understand context and meaning through embeddings

Product and Inventory Analysis:

  • Analyze product images for e-commerce listings to generate detailed descriptions and feature lists
  • Assess condition and quality of items from photos for resale marketplaces and inventory management
  • Extract text from product labels and packaging for catalog management and compliance tracking

Content Moderation and Classification:

  • Automatically classify and moderate user-generated image content for social platforms
  • Detect inappropriate or harmful visual content in uploaded images and media files
  • Analyze images for brand safety and advertising suitability across different content categories

Document and Data Extraction:

  • Extract text and data from scanned documents, receipts, and business forms
  • Analyze charts, graphs, and visual data presentations to extract key metrics and insights
  • Process screenshots of applications and interfaces for quality assurance and testing documentation

Creative and Marketing Analysis:

  • Analyze visual content for brand consistency and design quality assessment
  • Generate detailed descriptions of artwork, photography, and creative assets for marketing purposes
  • Evaluate visual elements in marketing materials for accessibility and design effectiveness