Turbopuffer
Authentication Type: API Key
Description: High-performance vector and full-text search database. Tools for upserting documents and three specialized search types: Vector Search for semantic similarity, Full-Text Search for keyword matching, and Lookup for sorted/filtered retrieval.
Upsert Documents
Create or update documents in a namespace with or without embeddings. Turbopuffer supports both automatic embedding generation and custom vector insertion.
Upsert with Auto-Embeddings
Create documents with automatic embedding generation. Generates embeddings using text-embedding-3-small (1536 dims) and upserts in one step. Perfect for building vector search indexes without manually calling the embeddings API.
Operation Type: Mutation (Write)
Parameters:
- namespace
string(required): Namespace to write to. Created automatically if it does not exist - rows
array of objects(required): Array of documents to upsert. Embeddings will be auto-generated for each- id
string(required): Document ID (can be numeric string, UUID, or any string) - textToEmbed
string(required): Text to generate embeddings for. This will be automatically embedded using text-embedding-3-small (1536 dims) - attributes
array of objects(required): Document attributes as key-value pairs. Values are JSON-encoded strings- key
string(required): Attribute name - value
string(required): Attribute value (JSON-encoded)
- key
- id
- schema
array of objects(nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid- attribute
string(required): Attribute name - type
string(required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string - fullTextSearch
boolean(nullable): Enable BM25 full-text search on this attribute (string types only) - filterable
boolean(nullable): Whether attribute can be filtered/sorted. Defaults to true
- attribute
Returns:
- rowsAffected
number: Total number of rows upserted/created - embeddingsGenerated
number: Number of embeddings generated
Example Usage:
{
"namespace": "knowledge_base",
"rows": [
{
"id": "doc_001",
"textToEmbed": "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
"attributes": [
{ "key": "title", "value": "\"Introduction to ML\"" },
{ "key": "category", "value": "\"AI\"" },
{ "key": "published_date", "value": "\"2024-01-15\"" }
]
}
],
"schema": [
{ "attribute": "title", "type": "string", "fullTextSearch": true },
{ "attribute": "category", "type": "string", "filterable": true },
{ "attribute": "published_date", "type": "datetime", "filterable": true }
]
}
Upsert Documents
Create or update documents when you already have vectors or don't need them. Supports text-only documents, custom vectors, or bulk ingestion. Set fullTextSearch=true in schema to enable BM25 keyword search on text fields.
Operation Type: Mutation (Write)
Parameters:
- namespace
string(required): Namespace to write to. Created automatically if it does not exist - rows
array of objects(required): Array of documents to upsert (create or update)- id
string(required): Document ID (can be numeric string, UUID, or any string) - vector
array of numbers(nullable): Optional vector embedding. Required if namespace has vector index - attributes
array of objects(required): Document attributes as key-value pairs. Values are JSON-encoded strings- key
string(required): Attribute name - value
string(required): Attribute value (JSON-encoded)
- key
- id
- distanceMetric
string(nullable): Distance metric for vector similarity. Required if vectors are provided. Use cosine_distance for most cases. Options: "cosine_distance", "euclidean_squared" - schema
array of objects(nullable): Optional schema configuration for attributes. Use to enable full-text search or specify types like uuid- attribute
string(required): Attribute name - type
string(required): Type: string, int, uint, float, uuid, datetime, bool, or array variants like []string - fullTextSearch
boolean(nullable): Enable BM25 full-text search on this attribute (string types only) - filterable
boolean(nullable): Whether attribute can be filtered/sorted. Defaults to true
- attribute
Returns:
- rowsAffected
number: Total number of rows upserted/created
Example Usage:
{
"namespace": "products",
"rows": [
{
"id": "prod_123",
"vector": [0.1, -0.2, 0.5, 0.8],
"attributes": [
{ "key": "name", "value": "\"Wireless Headphones\"" },
{ "key": "price", "value": "299.99" },
{ "key": "category", "value": "\"Electronics\"" }
]
}
],
"distanceMetric": "cosine_distance",
"schema": [
{ "attribute": "name", "type": "string", "fullTextSearch": true },
{ "attribute": "price", "type": "float", "filterable": true },
{ "attribute": "category", "type": "string", "filterable": true }
]
}
Search Operations
Turbopuffer provides three distinct search capabilities optimized for different use cases: semantic similarity through vector embeddings, keyword-based text search, and structured data retrieval with sorting.
Vector Search
Find documents semantically similar to a query vector. Perfect for RAG, "find docs about X" queries, and AI agents. Compares embeddings using cosine similarity - lower distance = more similar. Requires query vector to match namespace dimensionality (usually 1536 for text-embedding-3-small).
Operation Type: Query (Read)
Parameters:
- namespace
string(required): Namespace containing the vector embeddings - topK
number(required): Number of similar documents to return (1-1200) - vector
array of numbers(required): Query vector (embedding) to find similar documents. Must match the dimensionality of vectors in the namespace - filters
array of objects(nullable): Optional filters to narrow results. All conditions are combined with AND logic- attribute
string(required): Attribute name to filter on - operator
string(required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens) - value
string(required): Filter value as JSON string (will be parsed based on type)
- attribute
- includeAttributes
array of strings(nullable): Attributes to include in results. Leave null to return only id and distance
Returns:
- rows
array of objects: Similar documents with $dist (distance score) and requested attributes- $dist
number: Distance score (lower = more similar) - Additional attributes based on includeAttributes parameter
- $dist
Example Usage:
{
"namespace": "documents",
"topK": 10,
"vector": [0.1, -0.2, 0.5, 0.8, -0.3, 0.7, 0.2, -0.1],
"filters": [
{
"attribute": "category",
"operator": "Eq",
"value": "\"technical\""
},
{
"attribute": "published_date",
"operator": "Gte",
"value": "\"2024-01-01\""
}
],
"includeAttributes": ["title", "content", "author", "published_date"]
}
Full-Text Search
Traditional keyword-based search using BM25 ranking. Perfect for documentation search, product search, or when you know specific terms to find. Returns documents ranked by relevance score (higher = better match). Field must be configured for full-text search in schema.
Operation Type: Query (Read)
Parameters:
- namespace
string(required): Namespace to search in - topK
number(required): Number of top-ranked documents to return (1-1200) - field
string(required): Field name to search in (must be configured for full-text search) - query
string(required): Search query text. Supports boolean operators and phrases - filters
array of objects(nullable): Optional filters to narrow results. All conditions are combined with AND logic- attribute
string(required): Attribute name to filter on - operator
string(required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens) - value
string(required): Filter value as JSON string (will be parsed based on type)
- attribute
- includeAttributes
array of strings(nullable): Attributes to include in results. Leave null to return only id and BM25 score
Returns:
- rows
array of objects: Matched documents with $dist (BM25 score) and requested attributes- $dist
number: BM25 relevance score (higher = better match) - Additional attributes based on includeAttributes parameter
- $dist
Example Usage:
{
"namespace": "knowledge_base",
"topK": 25,
"field": "content",
"query": "API authentication OAuth tokens",
"filters": [
{
"attribute": "status",
"operator": "Eq",
"value": "\"published\""
},
{
"attribute": "language",
"operator": "In",
"value": "[\"en\", \"en-US\"]"
}
],
"includeAttributes": ["title", "content", "url", "last_updated"]
}
Lookup
Retrieve documents sorted by any attribute with optional filtering. Perfect for getting recent items, pagination, or priority queues. Simple sorting - no ranking algorithm. Combine filters with sorting for powerful queries like "active users sorted by signup date".
Operation Type: Query (Read)
Parameters:
- namespace
string(required): Namespace to query - topK
number(required): Maximum number of documents to return (1-1200) - orderByAttribute
string(required): Attribute to sort by (e.g., "timestamp", "priority", "id") - orderDirection
string(required): Sort direction: "asc" (oldest/lowest first) or "desc" (newest/highest first) - filters
array of objects(nullable): Filters to select documents. All conditions are combined with AND logic- attribute
string(required): Attribute name to filter on - operator
string(required): Filter operator (Eq, NotEq, In, NotIn, Contains, NotContains, ContainsAny, NotContainsAny, Lt, Lte, Gt, Gte, AnyLt, AnyLte, AnyGt, AnyGte, Glob, NotGlob, IGlob, NotIGlob, Regex, ContainsAllTokens) - value
string(required): Filter value as JSON string (will be parsed based on type)
- attribute
- includeAttributes
array of strings(nullable): Attributes to include in results. Leave null to return only id
Returns:
- rows
array of objects: Documents sorted by the specified attribute- id
string: Document identifier - Additional attributes based on includeAttributes parameter
- id
Example Usage:
{
"namespace": "user_activities",
"topK": 50,
"orderByAttribute": "timestamp",
"orderDirection": "desc",
"filters": [
{
"attribute": "user_id",
"operator": "Eq",
"value": "\"user_12345\""
},
{
"attribute": "activity_type",
"operator": "In",
"value": "[\"login\", \"purchase\", \"view\"]"
},
{
"attribute": "timestamp",
"operator": "Gte",
"value": "\"2024-01-01T00:00:00Z\""
}
],
"includeAttributes": ["timestamp", "activity_type", "metadata", "ip_address"]
}
Common Use Cases
AI & Machine Learning:
- RAG (Retrieval Augmented Generation) for AI agents and chatbots
- Semantic search for finding conceptually similar documents
- Content recommendation based on vector similarity
- Building AI assistants that understand context and meaning
- Use with AI Agents embeddings tool: Generate embeddings for text, then upsert to Turbopuffer for semantic search capabilities
Knowledge Management:
- Documentation search combining semantic and keyword approaches
- Support ticket search and categorization
- Research paper discovery and citation networks
- Enterprise knowledge base with intelligent retrieval
E-commerce & Content:
- Product recommendation using vector embeddings
- Content discovery and personalization
- Search result ranking and filtering
- User behavior analysis and pattern recognition
Data Analytics:
- Time-series data retrieval with sorting and filtering
- Log analysis and monitoring with structured queries
- Performance metrics tracking over time
- Business intelligence with flexible data access