Skip to content

S3 Provider

Index documents from an AWS S3 bucket/prefix.

Environment Variables

PROVIDER_S3_ENABLED=true
PROVIDER_S3_NAME=handbook
S3_BUCKET=my-bucket
S3_PREFIX=handbook/          # optional, can be empty
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...

Use an IAM user/role restricted to s3:GetObject + s3:ListBucket on the specific bucket/prefix.

Object Selection

  • Lists keys under prefix (filtering by extension internally)
  • ETag is used for change detection (S3 ETag stable for non-multipart)
  • Multipart uploads may produce different ETag semantics; changed ETag triggers reindex

Performance Tips

Setting Guidance
Pagination Built-in; avoid huge prefixes with millions of keys initially
Chunk Size Adjust CHUNK_SIZE to balance embedding volume

Troubleshooting

Symptom Cause Resolution
AccessDenied IAM policy missing action Add s3:GetObject & s3:ListBucket
Slow listing Very large bucket Introduce narrower prefix or partition
Empty index Wrong prefix or no supported file types Verify with aws s3 ls s3://bucket/prefix/

Next