Turbopuffer

Authentication Type: API Key
Description: High-performance vector and full-text search database. Tools for upserting documents and three specialized search types: Vector Search for semantic similarity, Full-Text Search for keyword matching, and Lookup for sorted/filtered retrieval.

Authentication

To authenticate, you'll need a Turbopuffer API key. Learn how to get one in the Turbopuffer documentation.

Upsert Documents

Create or update documents in a namespace with or without embeddings. Turbopuffer supports both automatic embedding generation and custom vector insertion.

Upsert with Auto-Embeddings

Create documents with automatic embedding generation. Generates embeddings using text-embedding-3-small (1536 dims) and upserts in one step. Perfect for building vector search indexes without manually calling the embeddings API.

Operation Type: Mutation (Write)

Parameters:

namespace string (required): Namespace to write to. Created automatically if it does not exist
rows array of objects (required): Array of documents to upsert. Embeddings will be auto-generated for each
- id string (required): Document ID (can be numeric string, UUID, or any string)
- textToEmbed string (required): Text to generate embeddings for. This will be automatically embedded using text-embedding-3-small (1536 dims)
- attributes array of objects (required): Document attributes as key-value pairs. Values are JSON-encoded strings
  - key string (required): Attribute name
  - value string (required): Attribute value (JSON-encoded)
schema array of objects (nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid
- attribute string (required): Attribute name
- type string (required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string
- fullTextSearch boolean (nullable): Enable BM25 full-text search on this attribute (string types only)
- filterable boolean (nullable): Whether attribute can be filtered/sorted. Defaults to true

Returns:

rowsAffected number: Total number of rows upserted/created
embeddingsGenerated number: Number of embeddings generated

Example Usage:

{
  "namespace": "knowledge_base",
  "rows": [
    {
      "id": "doc_001",
      "textToEmbed": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
      "attributes": [
        { "key": "title", "value": "\"Introduction to ML\"" },
        { "key": "category", "value": "\"AI\"" },
        { "key": "published_date", "value": "\"2024-01-15\"" }
      ]
    }
  ],
  "schema": [
    { "attribute": "title", "type": "string", "fullTextSearch": true },
    { "attribute": "category", "type": "string", "filterable": true },
    { "attribute": "published_date", "type": "datetime", "filterable": true }
  ]
}

Upsert Documents

Create or update documents when you already have vectors or don't need them. Supports text-only documents, custom vectors, or bulk ingestion. Set fullTextSearch=true in schema to enable BM25 keyword search on text fields.

Operation Type: Mutation (Write)

Parameters:

namespace string (required): Namespace to write to. Created automatically if it does not exist
rows array of objects (required): Array of documents to upsert (create or update)
- id string (required): Document ID (can be numeric string, UUID, or any string)
- vector array of numbers (nullable): Optional vector embedding. Required if namespace has vector index
- attributes array of objects (required): Document attributes as key-value pairs. Values are JSON-encoded strings
  - key string (required): Attribute name
  - value string (required): Attribute value (JSON-encoded)
distanceMetric string (nullable): Distance metric for vector similarity. Required if vectors are provided. Use cosine_distance for most cases. Options: "cosine_distance", "euclidean_squared"
schema array of objects (nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid
- attribute string (required): Attribute name
- type string (required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string
- fullTextSearch boolean (nullable): Enable BM25 full-text search on this attribute (string types only)
- filterable boolean (nullable): Whether attribute can be filtered/sorted. Defaults to true

Returns:

rowsAffected number: Total number of rows upserted/created

Example Usage:

{
  "namespace": "products",
  "rows": [
    {
      "id": "prod_123",
      "vector": [0.1, -0.2, 0.5, 0.8],
      "attributes": [
        { "key": "name", "value": "\"Wireless Headphones\"" },
        { "key": "price", "value": "299.99" },
        { "key": "category", "value": "\"Electronics\"" }
      ]
    }
  ],
  "distanceMetric": "cosine_distance",
  "schema": [
    { "attribute": "name", "type": "string", "fullTextSearch": true },
    { "attribute": "price", "type": "float", "filterable": true },
    { "attribute": "category", "type": "string", "filterable": true }
  ]
}

Search Operations

Turbopuffer provides three distinct search capabilities optimized for different use cases: semantic similarity through vector embeddings, keyword-based text search, and structured data retrieval with sorting.

Vector Search

Find documents semantically similar to a query vector. Perfect for RAG, "find docs about X" queries, and AI agents. Compares embeddings using cosine similarity - lower distance = more similar. Requires query vector to match namespace dimensionality (usually 1536 for text-embedding-3-small).

Operation Type: Query (Read)

Parameters:

namespace string (required): Namespace containing the vector embeddings
topK number (required): Number of similar documents to return (1-1200)
vector array of numbers (required): Query vector (embedding) to find similar documents. Must match the dimensionality of vectors in the namespace
filters array of objects (nullable): Optional filters to narrow results. All conditions are combined with AND logic
- attribute string (required): Attribute name to filter on
- operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
- value string (required): Filter value as JSON string (will be parsed based on type)
includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id and distance

Returns:

rows array of objects: Similar documents with $dist (distance score) and requested attributes
- $dist number: Distance score (lower = more similar)
- Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "documents",
  "topK": 10,
  "vector": [0.1, -0.2, 0.5, 0.8, -0.3, 0.7, 0.2, -0.1],
  "filters": [
    {
      "attribute": "category",
      "operator": "Eq",
      "value": "\"technical\""
    },
    {
      "attribute": "published_date",
      "operator": "Gte",
      "value": "\"2024-01-01\""
    }
  ],
  "includeAttributes": ["title", "content", "author", "published_date"]
}

Full-Text Search

Traditional keyword-based search using BM25 ranking. Perfect for documentation search, product search, or when you know specific terms to find. Returns documents ranked by relevance score (higher = better match). Field must be configured for full-text search in schema.

Operation Type: Query (Read)

Parameters:

namespace string (required): Namespace to search in
topK number (required): Number of top-ranked documents to return (1-1200)
field string (required): Field name to search in (must be configured for full-text search)
query string (required): Search query text. Supports boolean operators and phrases
filters array of objects (nullable): Optional filters to narrow results. All conditions are combined with AND logic
- attribute string (required): Attribute name to filter on
- operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
- value string (required): Filter value as JSON string (will be parsed based on type)
includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id and BM25 score

Returns:

rows array of objects: Matched documents with $dist (BM25 score) and requested attributes
- $dist number: BM25 relevance score (higher = better match)
- Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "knowledge_base",
  "topK": 25,
  "field": "content",
  "query": "API authentication OAuth tokens",
  "filters": [
    {
      "attribute": "status",
      "operator": "Eq",
      "value": "\"published\""
    },
    {
      "attribute": "language",
      "operator": "In",
      "value": "[\"en\", \"en-US\"]"
    }
  ],
  "includeAttributes": ["title", "content", "url", "last_updated"]
}

Lookup

Retrieve documents sorted by any attribute with optional filtering. Perfect for getting recent items, pagination, or priority queues. Simple sorting - no ranking algorithm. Combine filters with sorting for powerful queries like "active users sorted by signup date".

Operation Type: Query (Read)

Parameters:

namespace string (required): Namespace to query
topK number (required): Maximum number of documents to return (1-1200)
orderByAttribute string (required): Attribute to sort by (e.g., "timestamp", "priority", "id")
orderDirection string (required): Sort direction: "asc" (oldest/lowest first) or "desc" (newest/highest first)
filters array of objects (nullable): Filters to select documents. All conditions are combined with AND logic
- attribute string (required): Attribute name to filter on
- operator string (required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens)
- value string (required): Filter value as JSON string (will be parsed based on type)
includeAttributes array of strings (nullable): Attributes to include in results. Leave null to return only id

Returns:

rows array of objects: Documents sorted by the specified attribute
- id string: Document identifier
- Additional attributes based on includeAttributes parameter

Example Usage:

{
  "namespace": "user_activities",
  "topK": 50,
  "orderByAttribute": "timestamp",
  "orderDirection": "desc",
  "filters": [
    {
      "attribute": "user_id",
      "operator": "Eq",
      "value": "\"user_12345\""
    },
    {
      "attribute": "activity_type",
      "operator": "In",
      "value": "[\"login\", \"purchase\", \"view\"]"
    },
    {
      "attribute": "timestamp",
      "operator": "Gte",
      "value": "\"2024-01-01T00:00:00Z\""
    }
  ],
  "includeAttributes": ["timestamp", "activity_type", "metadata", "ip_address"]
}

Common Use Cases

AI & Machine Learning:

RAG (Retrieval Augmented Generation) for AI agents and chatbots
Semantic search for finding conceptually similar documents
Content recommendation based on vector similarity
Building AI assistants that understand context and meaning
Use with AI Agents embeddings tool: Generate embeddings for text, then upsert to Turbopuffer for semantic search capabilities

Knowledge Management:

Documentation search combining semantic and keyword approaches
Support ticket search and categorization
Research paper discovery and citation networks
Enterprise knowledge base with intelligent retrieval

E-commerce & Content:

Product recommendation using vector embeddings
Content discovery and personalization
Search result ranking and filtering
User behavior analysis and pattern recognition

Data Analytics:

Time-series data retrieval with sorting and filtering
Log analysis and monitoring with structured queries
Performance metrics tracking over time
Business intelligence with flexible data access