DocDuck¶
Multi-provider document indexing + AI retrieval (RAG) for your internal knowledge. Index OneDrive, S3, or local files; ask natural language questions; get cited answers.
Key Features¶
- Multi-provider: OneDrive (business & personal), S3, local filesystem (extensible)
- Smart idempotent indexing (ETag tracking, orphan cleanup, force reindex)
- Pluggable text extraction (DOCX, TXT/MD, PDF*, ODT, RTF)
- Configurable chunking & OpenAI embeddings
- Minimal Query API with /query, /chat (streaming), /docsearch & provider filtering
- PostgreSQL + pgvector for similarity search
- Kubernetes & Docker friendly (CronJob indexer, long-running API)
- Secure admin operations (seeded admin, secret-based token)
- Pragmatic modern .NET 8 codebase (records, DI, logging)
*PDF requires optional dependency.
Quick Start¶
If you just want to try it locally:
- Provision PostgreSQL with pgvector extension
- Set environment variables (OpenAI API key, DB connection string)
- Run the Indexer once
- Query the API
See Quick Start for copy-paste steps.
Documentation Map¶
| Audience | Start Here |
|---|---|
| Casual evaluator | Quick Start |
| Power user / operator | Installation, Configuration |
| Architect / engineer | Architecture |
| Extending providers | Provider Framework |
| RAG internals | Search & RAG |
| AI Agent embedding | AI Agent Context |
High-Level Architecture¶
Providers → Indexer Pipeline → PostgreSQL (chunks + metadata) ← Query API (RAG) ← User
- Indexer runs on a schedule (or ad-hoc) and updates embeddings
- Query API performs semantic search + synthesis across stored chunks
See Architecture for detailed diagrams.
Why DocDuck?¶
- Production-focused from the start (idempotency, cleanup, metrics-friendly logging)
- Lean: only the moving parts needed for reliable RAG over your documents
- Extensible: add new providers & text extractors with small focused interfaces
- Transparent: clear data model & SQL; easy to audit
Status¶
Active development. API surface & schema considered stable for initial OSS release (v1). Expect additive enhancements.
License¶
MIT — see License.