llama-stack-mirror/llama_stack/providers/remote/files/s3
ehhuang f44eb935c4
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 35s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Pre-commit / pre-commit (push) Successful in 1m19s
chore: simplify authorized sqlstore (#3496)
# What does this PR do?

This PR is generated with AI and reviewed by me.

Refactors the AuthorizedSqlStore class to store the access policy as an
instance variable rather than passing it as a parameter to each method
call. This simplifies the API.

# Test Plan

existing tests
2025-09-19 16:13:56 -07:00
..
__init__.py feat(s3 auth): add authorization support for s3 files provider (#3265) 2025-08-29 16:14:00 +02:00
config.py feat: Add S3 Files Provider (#3202) 2025-08-22 10:38:59 -04:00
files.py chore: simplify authorized sqlstore (#3496) 2025-09-19 16:13:56 -07:00
README.md feat: Add S3 Files Provider (#3202) 2025-08-22 10:38:59 -04:00

S3 Files Provider

A remote S3-based implementation of the Llama Stack Files API that provides scalable cloud file storage with metadata persistence.

Features

  • AWS S3 Storage: Store files in AWS S3 buckets for scalable, durable storage
  • Metadata Management: Uses SQL database for efficient file metadata queries
  • OpenAI API Compatibility: Full compatibility with OpenAI Files API endpoints
  • Flexible Authentication: Support for IAM roles and access keys
  • Custom S3 Endpoints: Support for MinIO and other S3-compatible services

Configuration

Basic Configuration

api: files
provider_type: remote::s3
config:
  bucket_name: my-llama-stack-files
  region: us-east-1
  metadata_store:
    type: sqlite
    db_path: ./s3_files_metadata.db

Advanced Configuration

api: files
provider_type: remote::s3
config:
  bucket_name: my-llama-stack-files
  region: us-east-1
  aws_access_key_id: YOUR_ACCESS_KEY
  aws_secret_access_key: YOUR_SECRET_KEY
  endpoint_url: https://s3.amazonaws.com  # Optional for custom endpoints
  metadata_store:
    type: sqlite
    db_path: ./s3_files_metadata.db

Environment Variables

The configuration supports environment variable substitution:

config:
  bucket_name: "${env.S3_BUCKET_NAME}"
  region: "${env.AWS_REGION:=us-east-1}"
  aws_access_key_id: "${env.AWS_ACCESS_KEY_ID:=}"
  aws_secret_access_key: "${env.AWS_SECRET_ACCESS_KEY:=}"
  endpoint_url: "${env.S3_ENDPOINT_URL:=}"

Note: S3_BUCKET_NAME has no default value since S3 bucket names must be globally unique.

Authentication

For production deployments, use IAM roles:

config:
  bucket_name: my-bucket
  region: us-east-1
  # No credentials needed - will use IAM role

Access Keys

For development or specific use cases:

config:
  bucket_name: my-bucket
  region: us-east-1
  aws_access_key_id: AKIAIOSFODNN7EXAMPLE
  aws_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

S3 Bucket Setup

Required Permissions

The S3 provider requires the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Automatic Bucket Creation

By default, the S3 provider expects the bucket to already exist. If you want the provider to automatically create the bucket when it doesn't exist, set auto_create_bucket: true in your configuration:

config:
  bucket_name: my-bucket
  auto_create_bucket: true  # Will create bucket if it doesn't exist
  region: us-east-1

Note: When auto_create_bucket is enabled, the provider will need additional permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:CreateBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Bucket Policy (Optional)

For additional security, you can add a bucket policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "LlamaStackAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    },
    {
      "Sid": "LlamaStackBucketAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::YOUR-ACCOUNT:role/LlamaStackRole"
      },
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name"
    }
  ]
}

Features

Metadata Persistence

File metadata is stored in a SQL database for fast queries and OpenAI API compatibility. The metadata includes:

  • File ID
  • Original filename
  • Purpose (assistants, batch, etc.)
  • File size in bytes
  • Created and expiration timestamps

TTL and Cleanup

Files currently have a fixed long expiration time (100 years).

Development and Testing

Using MinIO

For self-hosted S3-compatible storage:

config:
  bucket_name: test-bucket
  region: us-east-1
  endpoint_url: http://localhost:9000
  aws_access_key_id: minioadmin
  aws_secret_access_key: minioadmin

Monitoring and Logging

The provider logs important operations and errors. For production deployments, consider:

  • CloudWatch monitoring for S3 operations
  • Custom metrics for file upload/download rates
  • Error rate monitoring
  • Performance metrics tracking

Error Handling

The provider handles various error scenarios:

  • S3 connectivity issues
  • Bucket access permissions
  • File not found errors
  • Metadata consistency checks

Known Limitations

  • Fixed long TTL (100 years) instead of configurable expiration
  • No server-side encryption enabled by default
  • No support for AWS session tokens
  • No S3 key prefix organization support
  • No multipart upload support (all files uploaded as single objects)