feat(distro): fork off a starter-gpu distribution (#3240)

The starter distribution added post-training which added torch
dependencies which pulls in all the nvidia CUDA libraries. This made our
starter container very big. We have worked hard to keep the starter
container small so it serves its purpose as a starter. This PR tries to
get it back to its size by forking off duplicate "-gpu" providers for
post-training. These forked providers are then used for a new
`starter-gpu` distribution which can pull in all dependencies.
This commit is contained in:
Ashwin Bharambe 2025-08-22 15:47:15 -07:00 committed by GitHub
parent 3b9278f254
commit 7519b73fcc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 522 additions and 31 deletions

View file

@ -156,8 +156,8 @@ providers:
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ci-tests}/trace_store.db
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
post_training:
- provider_id: huggingface
provider_type: inline::huggingface
- provider_id: huggingface-cpu
provider_type: inline::huggingface-cpu
config:
checkpoint_format: huggingface
distributed_backend: null