Logo

Turbopuffer

Authentication Type: API Key
Description: High-performance vector and full-text search database. Tools for upserting documents and three specialized search types: Vector Search for semantic similarity, Full-Text Search for keyword matching, and Lookup for sorted/filtered retrieval.


Upsert Documents

Create or update documents in a namespace with or without embeddings. Turbopuffer supports both automatic embedding generation and custom vector insertion.

Upsert with Auto-Embeddings

Create documents with automatic embedding generation. Generates embeddings using text-embedding-3-small (1536 dims) and upserts in one step. Perfect for building vector search indexes without manually calling the embeddings API.

Operation Type: Mutation (Write)

Parameters:

  • namespace string (required): Namespace to write to. Created automatically if it does not exist
  • rows array of objects (required): Array of documents to upsert. Embeddings will be auto-generated for each
    • id string (required): Document ID (can be numeric string, UUID, or any string)
    • textToEmbed string (required): Text to generate embeddings for. This will be automatically embedded using text-embedding-3-small (1536 dims)
    • attributes array of objects (required): Document attributes as key-value pairs. Values are JSON-encoded strings
      • key string (required): Attribute name
      • value string (required): Attribute value (JSON-encoded)
  • schema array of objects (nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid
    • attribute string (required): Attribute name
    • type string (required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string
    • fullTextSearch boolean (nullable): Enable BM25 full-text search on this attribute (string types only)
    • filterable boolean (nullable): Whether attribute can be filtered/sorted. Defaults to true

Returns:

  • rowsAffected number: Total number of rows upserted/created
  • embeddingsGenerated number: Number of embeddings generated

Example Usage:

{
  "namespace": "knowledge_base",
  "rows": [
    {
      "id": "doc_001",
      "textToEmbed": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
      "attributes": [
        { "key": "title", "value": "\"Introduction to ML\"" },
        { "key": "category", "value": "\"AI\"" },
        { "key": "published_date", "value": "\"2024-01-15\"" }
      ]
    }
  ],
  "schema": [
    { "attribute": "title", "type": "string", "fullTextSearch": true },
    { "attribute": "category", "type": "string", "filterable": true },
    { "attribute": "published_date", "type": "datetime", "filterable": true }
  ]
}

Upsert Documents

Create or update documents when you already have vectors or don't need them. Supports text-only documents, custom vectors, or bulk ingestion. Set fullTextSearch=true in schema to enable BM25 keyword search on text fields.

Operation Type: Mutation (Write)

Parameters:

  • namespace string (required): Namespace to write to. Created automatically if it does not exist
  • rows array of objects (required): Array of documents to upsert (create or update)
    • id string (required): Document ID (can be numeric string, UUID, or any string)
    • vector array of numbers (nullable): Optional vector embedding. Required if namespace has vector index
    • attributes array of objects (required): Document attributes as key-value pairs. Values are JSON-encoded strings
      • key string (required): Attribute name
      • value string (required): Attribute value (JSON-encoded)
  • distanceMetric string (nullable): Distance metric for vector similarity. Required if vectors are provided. Use cosine_distance for most cases. Options: "cosine_distance", "euclidean_squared"
  • schema array of objects (nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid
    • attribute string (required): Attribute name
    • type string (required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string
    • fullTextSearch boolean (nullable): Enable BM25 full-text search on this attribute (string types only)
    • filterable boolean (nullable): Whether attribute can be filtered/sorted. Defaults to true

Returns:

  • rowsAffected number: Total number of rows upserted/created

Example Usage:

{
  "namespace": "products",
  "rows": [
    {
      "id": "prod_123",
      "vector": [0.1, -0.2, 0.5, 0.8],
      "attributes": [
        { "key": "name", "value": "\"Wireless Headphones\"" },
        { "key": "price", "value": "299.99" },
        { "key": "category", "value": "\"Electronics\"" }
      ]
    }
  ],
  "distanceMetric": "cosine_distance",
  "schema": [
    { "attribute": "name", "type": "string", "fullTextSearch": true },
    { "attribute": "price", "type": "float", "filterable": true },
    { "attribute": "category", "type": "string", "filterable": true }
  ]
}

Search Operations

Turbopuffer provides three distinct search capabilities optimized for different use cases: semantic similarity through vector embeddings, keyword-based text search, and structured data retrieval with sorting.

Find documents semantically similar to a query vector. Perfect for RAG, "find docs about X" queries, and AI agents. Compares embeddings using cosine similarity - lower distance = more similar. Requires query vector to match namespace dimensionality (usually 1536 for text-embedding-3-small).

Operation Type: Query (Read)

Parameters:

  • namespace string (required): Namespace containing the vector embeddings
  • topK number (required): Number of similar documents to return (1-1200)
  • vector array of numbers (required): Query vector (embedding) to find similar documents. Must match the dimensionality of vectors in the namespace
  • filters array of objects (nullable): Optional filters to narrow results. All conditions are combined with AND logic
    • attribute string (required): Attribute name to filter on
    • operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
    • value string (required): Filter value as JSON string (will be parsed based on type)
  • includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id and distance

Returns:

  • rows array of objects: Similar documents with $dist (distance score) and requested attributes
    • $dist number: Distance score (lower = more similar)
    • Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "documents",
  "topK": 10,
  "vector": [0.1, -0.2, 0.5, 0.8, -0.3, 0.7, 0.2, -0.1],
  "filters": [
    {
      "attribute": "category",
      "operator": "Eq",
      "value": "\"technical\""
    },
    {
      "attribute": "published_date",
      "operator": "Gte",
      "value": "\"2024-01-01\""
    }
  ],
  "includeAttributes": ["title", "content", "author", "published_date"]
}

Traditional keyword-based search using BM25 ranking. Perfect for documentation search, product search, or when you know specific terms to find. Returns documents ranked by relevance score (higher = better match). Field must be configured for full-text search in schema.

Operation Type: Query (Read)

Parameters:

  • namespace string (required): Namespace to search in
  • topK number (required): Number of top-ranked documents to return (1-1200)
  • field string (required): Field name to search in (must be configured for full-text search)
  • query string (required): Search query text. Supports boolean operators and phrases
  • filters array of objects (nullable): Optional filters to narrow results. All conditions are combined with AND logic
    • attribute string (required): Attribute name to filter on
    • operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
    • value string (required): Filter value as JSON string (will be parsed based on type)
  • includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id and BM25 score

Returns:

  • rows array of objects: Matched documents with $dist (BM25 score) and requested attributes
    • $dist number: BM25 relevance score (higher = better match)
    • Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "knowledge_base",
  "topK": 25,
  "field": "content",
  "query": "API authentication OAuth tokens",
  "filters": [
    {
      "attribute": "status",
      "operator": "Eq",
      "value": "\"published\""
    },
    {
      "attribute": "language",
      "operator": "In",
      "value": "[\"en\", \"en-US\"]"
    }
  ],
  "includeAttributes": ["title", "content", "url", "last_updated"]
}

Lookup

Retrieve documents sorted by any attribute with optional filtering. Perfect for getting recent items, pagination, or priority queues. Simple sorting - no ranking algorithm. Combine filters with sorting for powerful queries like "active users sorted by signup date".

Operation Type: Query (Read)

Parameters:

  • namespace string (required): Namespace to query
  • topK number (required): Maximum number of documents to return (1-1200)
  • orderByAttribute string (required): Attribute to sort by (e.g., "timestamp", "priority", "id")
  • orderDirection string (required): Sort direction: "asc" (oldest/lowest first) or "desc" (newest/highest first)
  • filters array of objects (nullable): Filters to select documents. All conditions are combined with AND logic
    • attribute string (required): Attribute name to filter on
    • operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
    • value string (required): Filter value as JSON string (will be parsed based on type)
  • includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id

Returns:

  • rows array of objects: Documents sorted by the specified attribute
    • id string: Document identifier
    • Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "user_activities",
  "topK": 50,
  "orderByAttribute": "timestamp",
  "orderDirection": "desc",
  "filters": [
    {
      "attribute": "user_id",
      "operator": "Eq",
      "value": "\"user_12345\""
    },
    {
      "attribute": "activity_type",
      "operator": "In",
      "value": "[\"login\", \"purchase\", \"view\"]"
    },
    {
      "attribute": "timestamp",
      "operator": "Gte",
      "value": "\"2024-01-01T00:00:00Z\""
    }
  ],
  "includeAttributes": ["timestamp", "activity_type", "metadata", "ip_address"]
}

Common Use Cases

AI & Machine Learning:

  • RAG (Retrieval Augmented Generation) for AI agents and chatbots
  • Semantic search for finding conceptually similar documents
  • Content recommendation based on vector similarity
  • Building AI assistants that understand context and meaning
  • Use with AI Agents embeddings tool: Generate embeddings for text, then upsert to Turbopuffer for semantic search capabilities

Knowledge Management:

  • Documentation search combining semantic and keyword approaches
  • Support ticket search and categorization
  • Research paper discovery and citation networks
  • Enterprise knowledge base with intelligent retrieval

E-commerce & Content:

  • Product recommendation using vector embeddings
  • Content discovery and personalization
  • Search result ranking and filtering
  • User behavior analysis and pattern recognition

Data Analytics:

  • Time-series data retrieval with sorting and filtering
  • Log analysis and monitoring with structured queries
  • Performance metrics tracking over time
  • Business intelligence with flexible data access