Search & RAG¶

Describes how DocDuck performs similarity search and constructs answers.

Steps (/query)¶

Embed question text
Run hybrid retrieval:
- Vector search over embeddings (cosine distance)
- Optional lexical search (tsvector + websearch_to_tsquery) when enabled
- Blend and rerank chunk candidates based on configured search depth
Concatenate chunk texts (ordered by blended score)
Prompt model to synthesize answer
Return answer + mapped sources (filename + snippet + blended score)

Vector similarity: cosine distance via pgvector (embedding <=> $query)
Lexical similarity: ts_rank_cd over search_lexeme generated column
Blended score = (1 - distance/2) * vector_weight + lexical_score * lexical_weight
Lower blended distance (1 - score) ⇒ higher relevance

Lever	Effect
Chunk Size	Too large: diluted relevance; too small: fragmented context
Overlap	Prevents context boundary loss
Top-K	Higher recall vs prompt cost tradeoff

The API exposes a searchDepth knob (1-5) to control retrieval effort:

Depth	Behavior
1	Lexical-only search, minimal orchestration
2	Lexical + vector blend, single retrieval pass
3	Default; blended retrieval with retry heuristics
4	Same as 3, but allows an extra refinement pass
5	Max effort; expanded retries and aggressive reranking

Higher depths cost more tokens but improve recall.

Mode	Symptom	Mitigation
Sparse Index	Generic answers	Ensure enough documents, adjust chunk size
Overlap Too Low	Missing context	Increase `CHUNK_OVERLAP`
Overlap Too High	Cost spike	Reduce overlap when stable