S3 Provider¶
Index documents from an AWS S3 bucket/prefix.
Environment Variables¶
PROVIDER_S3_ENABLED=true
PROVIDER_S3_NAME=handbook
S3_BUCKET=my-bucket
S3_PREFIX=handbook/ # optional, can be empty
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
Use an IAM user/role restricted to s3:GetObject + s3:ListBucket on the specific bucket/prefix.
Object Selection¶
- Lists keys under prefix (filtering by extension internally)
- ETag is used for change detection (S3 ETag stable for non-multipart)
- Multipart uploads may produce different ETag semantics; changed ETag triggers reindex
Performance Tips¶
| Setting | Guidance |
|---|---|
| Pagination | Built-in; avoid huge prefixes with millions of keys initially |
| Chunk Size | Adjust CHUNK_SIZE to balance embedding volume |
Troubleshooting¶
| Symptom | Cause | Resolution |
|---|---|---|
| AccessDenied | IAM policy missing action | Add s3:GetObject & s3:ListBucket |
| Slow listing | Very large bucket | Introduce narrower prefix or partition |
| Empty index | Wrong prefix or no supported file types | Verify with aws s3 ls s3://bucket/prefix/ |
Next¶
- Overview: Providers Overview
- Adding custom provider: Provider Framework