Providers Overview¶
Providers supply documents for indexing. Each provider implementation yields a list of documents with IDs, filenames, ETags, and optional last-modified timestamps.
Supported Providers¶
| Type | Description | Notes |
|---|---|---|
local |
Local filesystem directory tree | Fast iteration, path based IDs |
onedrive |
Microsoft OneDrive / SharePoint (Graph API) | App-only or delegated auth |
s3 |
AWS S3 bucket/prefix | Minimal permissions recommended |
Provider Concepts¶
- Provider Type: Logical category (
onedrive,s3,local). - Provider Name: Instance label (e.g.
finance-drive). Helps segregate multiple drives/buckets. - Document ID: Stable identifier (path, drive item ID, S3 key).
- ETag: Change detector; drives incremental indexing.
Lifecycle¶
- Enumerate documents
- Filter unchanged (ETag match)
- Download new/updated
- Extract → Chunk → Embed → Upsert
- Cleanup orphaned docs (optional)
Index Registration¶
On each run, enabled providers are registered in the database with last sync time. The API exposes active providers at /providers.
Adding Providers¶
High-level steps (see developer guide for full detail):
1. Implement IDocumentProvider
2. Expose enabling env vars (e.g. PROVIDER_MYTYPE_ENABLED)
3. Register with provider catalog
4. Provide metadata (display name, description)
5. Optionally add configuration seeding
See: Provider Framework
Next¶
- OneDrive specifics: OneDrive
- S3 specifics: S3
- Local specifics: Local Files