bigRAG
Concepts

Collections

Collections are logical groupings of documents that share the same embedding configuration.

A collection is a logical grouping of documents that share the same embedding configuration. Each collection maps to a Milvus collection for vector storage.

What is a Collection?

When you create a collection, you define:

  • Embedding provider and model — how documents will be embedded (e.g., OpenAI text-embedding-3-small)
  • Chunk size and overlap — how documents are split into searchable segments
  • Default query settingstop_k, min_score, and search_mode defaults

All documents within a collection use the same embedding model, ensuring consistent vector dimensions for search.

Creating a Collection

curl -X POST http://localhost:6100/v1/collections \
  -H "Authorization: Bearer $BIGRAG_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "research_papers",
    "description": "Academic research papers",
    "embedding_provider": "openai",
    "embedding_model": "text-embedding-3-small",
    "embedding_api_key": "sk-...",
    "dimension": 1536,
    "chunk_size": 512,
    "chunk_overlap": 50
  }'

Collection Properties

FieldTypeRequiredDefaultDescription
namestringyes1-128 chars, must match [a-zA-Z][a-zA-Z0-9_]*
descriptionstringno""Human-readable description
embedding_providerstringnoServer defaultopenai or cohere
embedding_modelstringnoServer defaultModel name for the provider
embedding_api_keystringnoAPI key for the embedding provider
dimensionintegernoServer defaultEmbedding vector dimension
chunk_sizeintegerno51264–10,000 characters per chunk
chunk_overlapintegerno500–5,000, must be less than chunk_size
default_top_kintegerno10Default number of results (1–1,000)
default_min_scorefloatnonullDefault minimum similarity score
default_search_modestringno"semantic"semantic, keyword, or hybrid
reranking_enabledbooleannofalseEnable reranking for this collection
reranking_modelstringno"rerank-v3.5"Cohere reranking model
reranking_api_keystringnonullCohere API key
metadataobjectno{}Arbitrary key-value pairs

Collection Stats

Get a lightweight summary of a collection's document and chunk counts without fetching platform-wide stats:

curl http://localhost:6100/v1/collections/research_papers/stats \
  -H "Authorization: Bearer $BIGRAG_API_SECRET"

Returns document_count, total_chunks, total_tokens, total_size_bytes, and per-status counts.

Default Query Settings

Each collection can define default query settings that apply when not specified per-query:

curl -X POST http://localhost:6100/v1/collections \
  -H "Authorization: Bearer $BIGRAG_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "docs",
    "default_top_k": 20,
    "default_min_score": 0.3,
    "default_search_mode": "hybrid"
  }'

When querying, if top_k, min_score, or search_mode are omitted, the collection's defaults are used.

Reranking

Collections can enable server-side reranking using the Cohere Rerank API:

curl -X POST http://localhost:6100/v1/collections \
  -H "Authorization: Bearer $BIGRAG_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "docs",
    "reranking_enabled": true,
    "reranking_model": "rerank-v3.5",
    "reranking_api_key": "your-cohere-key"
  }'
FieldDefaultDescription
reranking_enabledfalseEnable reranking for this collection
reranking_model"rerank-v3.5"Cohere reranking model
reranking_api_keynullCohere API key (falls back to embedding key)

You can override reranking per query by passing "rerank": true or "rerank": false.

Updating a Collection

You can update a collection's description, metadata, reranking settings, and default query settings:

curl -X PUT http://localhost:6100/v1/collections/research_papers \
  -H "Authorization: Bearer $BIGRAG_API_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Updated description",
    "metadata": { "team": "engineering" },
    "reranking_enabled": true,
    "reranking_model": "rerank-v3.5",
    "default_top_k": 20,
    "default_min_score": 0.3,
    "default_search_mode": "hybrid"
  }'
FieldTypeDescription
descriptionstringHuman-readable description
metadataobjectArbitrary key-value pairs
reranking_enabledbooleanEnable/disable reranking
reranking_modelstringCohere reranking model
reranking_api_keystringCohere API key
default_top_kintegerDefault number of results (1–1,000)
default_min_scorefloatDefault minimum similarity score
default_search_modestringsemantic, keyword, or hybrid

All fields are optional — only provided fields are updated.

Embedding configuration (provider, model, dimension) and chunk settings cannot be changed after creation. To change these, delete the collection and create a new one.

Deleting a Collection

Deleting a collection removes all its documents and vectors:

curl -X DELETE http://localhost:6100/v1/collections/research_papers \
  -H "Authorization: Bearer $BIGRAG_API_SECRET"

On this page