Collections
Collections are logical groupings of documents that share the same embedding configuration.
A collection is a logical grouping of documents that share the same embedding configuration. Each collection maps to a Milvus collection for vector storage.
What is a Collection?
When you create a collection, you define:
- Embedding provider and model — how documents will be embedded (e.g., OpenAI
text-embedding-3-small) - Chunk size and overlap — how documents are split into searchable segments
- Default query settings —
top_k,min_score, andsearch_modedefaults
All documents within a collection use the same embedding model, ensuring consistent vector dimensions for search.
Creating a Collection
curl -X POST http://localhost:6100/v1/collections \
-H "Authorization: Bearer $BIGRAG_API_SECRET" \
-H "Content-Type: application/json" \
-d '{
"name": "research_papers",
"description": "Academic research papers",
"embedding_provider": "openai",
"embedding_model": "text-embedding-3-small",
"embedding_api_key": "sk-...",
"dimension": 1536,
"chunk_size": 512,
"chunk_overlap": 50
}'Collection Properties
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | yes | — | 1-128 chars, must match [a-zA-Z][a-zA-Z0-9_]* |
description | string | no | "" | Human-readable description |
embedding_provider | string | no | Server default | openai or cohere |
embedding_model | string | no | Server default | Model name for the provider |
embedding_api_key | string | no | — | API key for the embedding provider |
dimension | integer | no | Server default | Embedding vector dimension |
chunk_size | integer | no | 512 | 64–10,000 characters per chunk |
chunk_overlap | integer | no | 50 | 0–5,000, must be less than chunk_size |
default_top_k | integer | no | 10 | Default number of results (1–1,000) |
default_min_score | float | no | null | Default minimum similarity score |
default_search_mode | string | no | "semantic" | semantic, keyword, or hybrid |
reranking_enabled | boolean | no | false | Enable reranking for this collection |
reranking_model | string | no | "rerank-v3.5" | Cohere reranking model |
reranking_api_key | string | no | null | Cohere API key |
metadata | object | no | {} | Arbitrary key-value pairs |
Collection Stats
Get a lightweight summary of a collection's document and chunk counts without fetching platform-wide stats:
curl http://localhost:6100/v1/collections/research_papers/stats \
-H "Authorization: Bearer $BIGRAG_API_SECRET"Returns document_count, total_chunks, total_tokens, total_size_bytes, and per-status counts.
Default Query Settings
Each collection can define default query settings that apply when not specified per-query:
curl -X POST http://localhost:6100/v1/collections \
-H "Authorization: Bearer $BIGRAG_API_SECRET" \
-H "Content-Type: application/json" \
-d '{
"name": "docs",
"default_top_k": 20,
"default_min_score": 0.3,
"default_search_mode": "hybrid"
}'When querying, if top_k, min_score, or search_mode are omitted, the collection's defaults are used.
Reranking
Collections can enable server-side reranking using the Cohere Rerank API:
curl -X POST http://localhost:6100/v1/collections \
-H "Authorization: Bearer $BIGRAG_API_SECRET" \
-H "Content-Type: application/json" \
-d '{
"name": "docs",
"reranking_enabled": true,
"reranking_model": "rerank-v3.5",
"reranking_api_key": "your-cohere-key"
}'| Field | Default | Description |
|---|---|---|
reranking_enabled | false | Enable reranking for this collection |
reranking_model | "rerank-v3.5" | Cohere reranking model |
reranking_api_key | null | Cohere API key (falls back to embedding key) |
You can override reranking per query by passing "rerank": true or "rerank": false.
Updating a Collection
You can update a collection's description, metadata, reranking settings, and default query settings:
curl -X PUT http://localhost:6100/v1/collections/research_papers \
-H "Authorization: Bearer $BIGRAG_API_SECRET" \
-H "Content-Type: application/json" \
-d '{
"description": "Updated description",
"metadata": { "team": "engineering" },
"reranking_enabled": true,
"reranking_model": "rerank-v3.5",
"default_top_k": 20,
"default_min_score": 0.3,
"default_search_mode": "hybrid"
}'| Field | Type | Description |
|---|---|---|
description | string | Human-readable description |
metadata | object | Arbitrary key-value pairs |
reranking_enabled | boolean | Enable/disable reranking |
reranking_model | string | Cohere reranking model |
reranking_api_key | string | Cohere API key |
default_top_k | integer | Default number of results (1–1,000) |
default_min_score | float | Default minimum similarity score |
default_search_mode | string | semantic, keyword, or hybrid |
All fields are optional — only provided fields are updated.
Embedding configuration (provider, model, dimension) and chunk settings cannot be changed after creation. To change these, delete the collection and create a new one.
Deleting a Collection
Deleting a collection removes all its documents and vectors:
curl -X DELETE http://localhost:6100/v1/collections/research_papers \
-H "Authorization: Bearer $BIGRAG_API_SECRET"