prototype: use pyproject and uv to build distribution

Goals:

* remove the need of a custom tool to install a collection of python
  packages AKA `llama stack build`
* use the power of 'uv', which was designed to manage dependencies
* `llama stack build` can "probably" go away and be replaced with uv

Howto, with the pyproject, you can install an Ollama distribution in a
virtual env like so:

```
uv venv --python 3.10 ollama-distro
source ollama-distro/bin/activate
uv sync --extra ollama
llama stack run llama_stack/templates/ollama/run.yaml
```

Caveats:

* external provider, we could still use a build file or add
the known external providers to the pyproject?
* growth of the uv.lock?

We create a requirements.txt for convenience as some users are most
familiar with this format than looking at pyproject.

Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
Sébastien Han 2025-05-27 20:31:57 +02:00
parent 6832e8a658
commit b6ebbe1bc0
No known key found for this signature in database
13 changed files with 5579 additions and 679 deletions

View file

@ -5,6 +5,10 @@ inputs:
description: The Python version to use description: The Python version to use
required: false required: false
default: "3.11" default: "3.11"
install-ollama:
description: Install ollama
required: false
default: true
runs: runs:
using: "composite" using: "composite"
steps: steps:
@ -17,11 +21,13 @@ runs:
- name: Install dependencies - name: Install dependencies
shell: bash shell: bash
env:
INSTALL_OLLAMA: ${{ inputs.install-ollama }}
if: ${{ env.INSTALL_OLLAMA == 'true' }}
run: | run: |
uv sync --all-groups uv sync --all-groups --extra ollama
uv pip install ollama faiss-cpu
# always test against the latest version of the client # always test against the latest version of the client
# TODO: this is not necessarily a good idea. we need to test against both published and latest # TODO: this is not necessarily a good idea. we need to test against both published and latest
# to find out backwards compatibility issues. # to find out backwards compatibility issues.
uv pip install git+https://github.com/meta-llama/llama-stack-client-python.git@main uv pip install git+https://github.com/meta-llama/llama-stack-client-python.git@main
uv pip install -e .

View file

@ -33,9 +33,6 @@ jobs:
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
- name: Build Llama Stack
run: |
llama stack build --template ollama --image-type venv
- name: Install minikube - name: Install minikube
if: ${{ matrix.auth-provider == 'kubernetes' }} if: ${{ matrix.auth-provider == 'kubernetes' }}

View file

@ -41,16 +41,12 @@ jobs:
- name: Setup ollama - name: Setup ollama
uses: ./.github/actions/setup-ollama uses: ./.github/actions/setup-ollama
- name: Build Llama Stack
run: |
uv run llama stack build --template ollama --image-type venv
- name: Start Llama Stack server in background - name: Start Llama Stack server in background
if: matrix.client-type == 'http' if: matrix.client-type == 'http'
env: env:
INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct" INFERENCE_MODEL: "meta-llama/Llama-3.2-3B-Instruct"
run: | run: |
LLAMA_STACK_LOG_FILE=server.log nohup uv run llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434" & LLAMA_STACK_LOG_FILE=server.log nohup uv run llama stack run ./llama_stack/templates/ollama/run.yaml --env OLLAMA_URL="http://0.0.0.0:11434" &
- name: Wait for Llama Stack server to be ready - name: Wait for Llama Stack server to be ready
if: matrix.client-type == 'http' if: matrix.client-type == 'http'

View file

@ -36,7 +36,7 @@ jobs:
- name: Generate Template List - name: Generate Template List
id: set-matrix id: set-matrix
run: | run: |
templates=$(ls llama_stack/templates/*/*build.yaml | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]') templates=$(ls llama_stack/templates/*/*build.yaml | grep -v "experimental-post-training" | awk -F'/' '{print $(NF-1)}' | jq -R -s -c 'split("\n")[:-1]')
echo "templates=$templates" >> "$GITHUB_OUTPUT" echo "templates=$templates" >> "$GITHUB_OUTPUT"
build: build:
@ -54,16 +54,42 @@ jobs:
- name: Install dependencies - name: Install dependencies
uses: ./.github/actions/setup-runner uses: ./.github/actions/setup-runner
with:
install-ollama: false
- name: Print build dependencies - name: Print dependencies in the image
if: matrix.image-type == 'venv'
run: | run: |
uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test --print-deps-only uv pip list
- name: Run Llama Stack Build - name: Run Llama Stack Build - VENV
if: matrix.image-type == 'venv'
run: | run: |
# USE_COPY_NOT_MOUNT is set to true since mounting is not supported by docker buildx, we use COPY instead uv sync --no-default-groups --extra ${{ matrix.template }}
# LLAMA_STACK_DIR is set to the current directory so we are building from the source
USE_COPY_NOT_MOUNT=true LLAMA_STACK_DIR=. uv run llama stack build --template ${{ matrix.template }} --image-type ${{ matrix.image-type }} --image-name test # TODO
- name: Run Llama Stack Build - CONTAINER
if: matrix.image-type == 'container'
run: |
# TODO: use llama_stack/templates/Containerfile when we have a new release!
cat << 'EOF' > Containerfile
FROM registry.access.redhat.com/ubi9
WORKDIR /app
ARG TEMPLATE
RUN dnf -y update \
&& dnf install -y python3.11 python3.11-pip python3.11-wheel python3.11-setuptools python3.11-devel gcc make \
&& ln -s /bin/pip3.11 /bin/pip \
&& ln -s /bin/python3.11 /bin/python \
&& dnf clean all
RUN mkdir -p /.llama/providers.d /.cache
COPY . /app/llama-stack
RUN cd llama-stack && pip install --no-cache .[${TEMPLATE}]
RUN chmod -R g+rw /app /.llama /.cache
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--config", "/app/llama-stack/templates/${TEMPLATE}/run.yaml"]
EOF
docker build --build-arg TEMPLATE=${{ matrix.template }} -f Containerfile -t ${{ matrix.template }} .
- name: Print dependencies in the image - name: Print dependencies in the image
if: matrix.image-type == 'venv' if: matrix.image-type == 'venv'

View file

@ -43,7 +43,7 @@ jobs:
- name: Build HTML - name: Build HTML
run: | run: |
cd docs cd docs
uv run make html uv run --group docs make html
- name: Trigger ReadTheDocs build - name: Trigger ReadTheDocs build
if: github.event_name != 'pull_request' if: github.event_name != 'pull_request'

View file

@ -40,7 +40,7 @@ def available_providers() -> list[ProviderSpec]:
api=Api.inference, api=Api.inference,
provider_type="inline::vllm", provider_type="inline::vllm",
pip_packages=[ pip_packages=[
"vllm", "vllm; sys_platform == 'linux'",
], ],
module="llama_stack.providers.inline.inference.vllm", module="llama_stack.providers.inline.inference.vllm",
config_class="llama_stack.providers.inline.inference.vllm.VLLMConfig", config_class="llama_stack.providers.inline.inference.vllm.VLLMConfig",
@ -49,8 +49,9 @@ def available_providers() -> list[ProviderSpec]:
api=Api.inference, api=Api.inference,
provider_type="inline::sentence-transformers", provider_type="inline::sentence-transformers",
pip_packages=[ pip_packages=[
"torch torchvision --index-url https://download.pytorch.org/whl/cpu", "torch",
"sentence-transformers --no-deps", "torchvision",
"sentence-transformers",
], ],
module="llama_stack.providers.inline.inference.sentence_transformers", module="llama_stack.providers.inline.inference.sentence_transformers",
config_class="llama_stack.providers.inline.inference.sentence_transformers.config.SentenceTransformersInferenceConfig", config_class="llama_stack.providers.inline.inference.sentence_transformers.config.SentenceTransformersInferenceConfig",

View file

@ -0,0 +1,15 @@
# Usage:
# podman build --build-arg TEMPLATE={TEMPLATE_NAME} -f llama_stack/templates/Containerfile -t TEMPLATE_NAME .
FROM registry.access.redhat.com/ubi9
WORKDIR /app
ARG TEMPLATE
RUN dnf -y update \
&& dnf install -y python3.11 python3.11-pip python3.11-wheel python3.11-setuptools python3.11-devel gcc make \
&& ln -s /bin/pip3.11 /bin/pip \
&& ln -s /bin/python3.11 /bin/python \
&& dnf clean all
RUN mkdir -p /.llama/providers.d /.cache
RUN pip install --no-cache llama-stack[${TEMPLATE}]
RUN chmod -R g+rw /app /.llama /.cache
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--config", "/app/llama-stack/templates/${TEMPLATE}/run.yaml"]

View file

@ -1,5 +1,5 @@
[build-system] [build-system]
requires = ["setuptools>=61.0"] requires = ["setuptools>=80.0"]
build-backend = "setuptools.build_meta" build-backend = "setuptools.build_meta"
[project] [project]
@ -53,6 +53,866 @@ ui = [
"streamlit-option-menu", "streamlit-option-menu",
] ]
#################
# DISTRIBUTIONS #
#################
bedrock = [
"aiosqlite",
"autoevals",
"boto3",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
cerebras = [
"aiosqlite",
"autoevals",
"cerebras_cloud_sdk",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
ci-tests = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"fastapi",
"fire",
"fireworks-ai",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"sqlite-vec",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
dell = [
"aiohttp",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"huggingface_hub",
"langdetect",
"matplotlib",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
fireworks = [
"aiosqlite",
"asyncpg",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"fireworks-ai",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
groq = [
"aiosqlite",
"autoevals",
"chardet",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"litellm",
"matplotlib",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
hf-endpoint = [
"aiohttp",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"huggingface_hub",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
hf-serverless = [
"aiohttp",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"huggingface_hub",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
llama_api = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"fastapi",
"fire",
"httpx",
"langdetect",
"litellm",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"sqlite-vec",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
meta-reference-gpu = [
"accelerate",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"fairscale",
"faiss-cpu",
"fastapi",
"fbgemm-gpu-genai==1.1.2",
"fire",
"httpx",
"langdetect",
"lm-format-enforcer",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchao==0.8.0",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
"zmq",
]
nvidia = [
"aiohttp",
"aiosqlite",
"chardet",
"datasets",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"matplotlib",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"tqdm",
"transformers",
"uvicorn",
]
ollama = [
"aiohttp",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"ollama",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"peft",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"tqdm",
"transformers",
"tree_sitter",
"trl",
"uvicorn",
]
open-benchmark = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"fastapi",
"fire",
"httpx",
"langdetect",
"litellm",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentencepiece",
"sqlalchemy[asyncio]",
"sqlite-vec",
"together",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
passthrough = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
postgres-demo = [
"aiosqlite",
"asyncpg",
"chardet",
"chromadb-client",
"fastapi",
"fire",
"httpx",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"uvicorn",
]
remote-vllm = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
sambanova = [
"aiosqlite",
"chardet",
"chromadb-client",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"litellm",
"matplotlib",
"mcp",
"nltk",
"numpy",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"uvicorn",
]
starter = [
"aiohttp",
"aiosqlite",
"asyncpg",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"fastapi",
"fire",
"fireworks-ai",
"httpx",
"langdetect",
"litellm",
"matplotlib",
"mcp",
"nltk",
"numpy",
"ollama",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"sqlite-vec",
"together",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
tgi = [
"aiohttp",
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"huggingface_hub",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
together = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"together",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
vllm-gpu = [
"aiosqlite",
"autoevals",
"chardet",
"chromadb-client",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
"vllm; sys_platform == 'linux'",
]
watsonx = [
"aiosqlite",
"autoevals",
"chardet",
"datasets",
"emoji",
"faiss-cpu",
"fastapi",
"fire",
"httpx",
"ibm_watson_machine_learning",
"langdetect",
"matplotlib",
"mcp",
"nltk",
"numpy",
"openai",
"opentelemetry-exporter-otlp-proto-http",
"opentelemetry-sdk",
"pandas",
"pillow",
"psycopg2-binary",
"pymongo",
"pypdf",
"pythainlp",
"redis",
"requests",
"scikit-learn",
"scipy",
"sentence-transformers",
"sentencepiece",
"sqlalchemy[asyncio]",
"torch",
"torchvision",
"tqdm",
"transformers",
"tree_sitter",
"uvicorn",
]
[dependency-groups] [dependency-groups]
dev = [ dev = [
"pytest", "pytest",
@ -123,7 +983,7 @@ docs = [
"linkify", "linkify",
"sphinxcontrib.openapi", "sphinxcontrib.openapi",
] ]
codegen = ["rich", "pydantic", "jinja2>=3.1.6"] codegen = ["rich", "pydantic", "jinja2>=3.1.6", "tomlkit"]
[project.urls] [project.urls]
Homepage = "https://github.com/meta-llama/llama-stack" Homepage = "https://github.com/meta-llama/llama-stack"
@ -145,6 +1005,11 @@ explicit = true
torch = [{ index = "pytorch-cpu" }] torch = [{ index = "pytorch-cpu" }]
torchvision = [{ index = "pytorch-cpu" }] torchvision = [{ index = "pytorch-cpu" }]
[[tool.uv.dependency-metadata]]
name = "sentence-transformers"
requires-dist = [
] # This instructs UV to not install any dependencies for this package (torch is installed by default)
[tool.ruff] [tool.ruff]
line-length = 120 line-length = 120
exclude = [ exclude = [

159
requirements-ollama.txt Normal file
View file

@ -0,0 +1,159 @@
# This file was autogenerated by uv via the following command:
# uv export --frozen --no-hashes --no-emit-project --output-file=requirements-ollama.txt --no-annotate --no-default-groups --extra ollama
accelerate==1.7.0
aiohappyeyeballs==2.5.0
aiohttp==3.11.13
aiosignal==1.3.2
aiosqlite==0.21.0
annotated-types==0.7.0
anyio==4.8.0
async-timeout==5.0.1 ; python_full_version < '3.11.3'
attrs==25.1.0
autoevals==0.0.122
backoff==2.2.1
braintrust-core==0.0.58
certifi==2025.1.31
chardet==5.2.0
charset-normalizer==3.4.1
chevron==0.14.0
chromadb-client==1.0.12
click==8.1.8
colorama==0.4.6 ; sys_platform == 'win32'
contourpy==1.3.2
cycler==0.12.1
datasets==3.3.2
deprecated==1.2.18
dill==0.3.8
distro==1.9.0
dnspython==2.7.0
ecdsa==0.19.1
emoji==2.14.1
exceptiongroup==1.2.2 ; python_full_version < '3.11'
faiss-cpu==1.11.0
fastapi==0.115.8
filelock==3.17.0
fire==0.7.0
fonttools==4.58.1
frozenlist==1.5.0
fsspec==2024.12.0
googleapis-common-protos==1.67.0
greenlet==3.2.2
grpcio==1.71.0
h11==0.16.0
hf-xet==1.1.2 ; (platform_machine == 'aarch64' and sys_platform != 'darwin') or (platform_machine == 'amd64' and sys_platform != 'darwin') or (platform_machine == 'arm64' and sys_platform != 'darwin') or (platform_machine == 'x86_64' and sys_platform != 'darwin')
httpcore==1.0.9
httpx==0.28.1
httpx-sse==0.4.0
huggingface-hub==0.29.0 ; sys_platform == 'darwin'
huggingface-hub==0.32.3 ; sys_platform != 'darwin'
idna==3.10
importlib-metadata==8.0.0 ; sys_platform != 'darwin'
importlib-metadata==8.5.0 ; sys_platform == 'darwin'
jinja2==3.1.6
jiter==0.8.2
joblib==1.5.1
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kiwisolver==1.4.8
langdetect==1.0.9
levenshtein==0.27.1
llama-stack-client==0.2.9
markdown-it-py==3.0.0
markupsafe==3.0.2
matplotlib==3.10.3
mcp==1.3.0
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
networkx==3.4.2
nltk==3.9.1
numpy==1.26.4
ollama==0.5.1
openai==1.71.0
opentelemetry-api==1.26.0 ; sys_platform != 'darwin'
opentelemetry-api==1.30.0 ; sys_platform == 'darwin'
opentelemetry-exporter-otlp-proto-common==1.26.0 ; sys_platform != 'darwin'
opentelemetry-exporter-otlp-proto-common==1.30.0 ; sys_platform == 'darwin'
opentelemetry-exporter-otlp-proto-grpc==1.26.0 ; sys_platform != 'darwin'
opentelemetry-exporter-otlp-proto-grpc==1.30.0 ; sys_platform == 'darwin'
opentelemetry-exporter-otlp-proto-http==1.26.0 ; sys_platform != 'darwin'
opentelemetry-exporter-otlp-proto-http==1.30.0 ; sys_platform == 'darwin'
opentelemetry-proto==1.26.0 ; sys_platform != 'darwin'
opentelemetry-proto==1.30.0 ; sys_platform == 'darwin'
opentelemetry-sdk==1.26.0 ; sys_platform != 'darwin'
opentelemetry-sdk==1.30.0 ; sys_platform == 'darwin'
opentelemetry-semantic-conventions==0.47b0 ; sys_platform != 'darwin'
opentelemetry-semantic-conventions==0.51b0 ; sys_platform == 'darwin'
orjson==3.10.18
overrides==7.7.0
packaging==24.2
pandas==2.1.4
peft==0.15.2
pillow==11.1.0
posthog==4.2.0
prompt-toolkit==3.0.50
propcache==0.3.0
protobuf==4.25.8 ; sys_platform != 'darwin'
protobuf==5.29.3 ; sys_platform == 'darwin'
psutil==7.0.0
psycopg2-binary==2.9.10
pyaml==25.1.0
pyarrow==19.0.1
pyasn1==0.4.8
pydantic==2.10.6
pydantic-core==2.27.2
pydantic-settings==2.8.1
pygments==2.19.1
pymongo==4.13.0
pyparsing==3.2.3
pypdf==5.3.1
pythainlp==5.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-jose==3.4.0
python-multipart==0.0.20
pytz==2025.1
pyyaml==6.0.2
rapidfuzz==3.12.2
redis==6.2.0
referencing==0.36.2
regex==2024.11.6
requests==2.32.2 ; (python_full_version < '3.11' and sys_platform == 'darwin') or (python_full_version >= '3.11' and sys_platform == 'linux') or (platform_machine != 'aarch64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')
requests==2.32.3 ; (python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.11' and sys_platform == 'darwin')
rich==13.9.4
rpds-py==0.22.3
rsa==4.9
safetensors==0.5.3
scikit-learn==1.6.1
scipy==1.15.3
sentencepiece==0.2.0
setuptools==80.8.0
six==1.17.0
sniffio==1.3.1
sqlalchemy==2.0.41
sse-starlette==2.2.1
starlette==0.45.3
sympy==1.13.1
tenacity==9.1.2
termcolor==2.5.0
threadpoolctl==3.6.0
tiktoken==0.9.0
tokenizers==0.21.1
torch==2.6.0 ; sys_platform == 'darwin'
torch==2.6.0+cpu ; sys_platform != 'darwin'
tqdm==4.67.1
transformers==4.50.3 ; sys_platform == 'darwin'
transformers==4.52.4 ; sys_platform != 'darwin'
tree-sitter==0.24.0
trl==0.18.1
typing-extensions==4.12.2
tzdata==2025.1
urllib3==2.1.0 ; (python_full_version < '3.11' and platform_machine == 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.11' and sys_platform == 'darwin')
urllib3==2.3.0 ; (python_full_version < '3.11' and sys_platform == 'darwin') or (python_full_version >= '3.11' and sys_platform == 'linux') or (platform_machine != 'aarch64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')
uvicorn==0.34.0
wcwidth==0.2.13
wrapt==1.17.2
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0

View file

@ -14,8 +14,6 @@ anyio==4.8.0
# llama-stack-client # llama-stack-client
# openai # openai
# starlette # starlette
async-timeout==5.0.1 ; python_full_version < '3.11'
# via aiohttp
attrs==25.1.0 attrs==25.1.0
# via # via
# aiohttp # aiohttp
@ -40,8 +38,6 @@ distro==1.9.0
# openai # openai
ecdsa==0.19.1 ecdsa==0.19.1
# via python-jose # via python-jose
exceptiongroup==1.2.2 ; python_full_version < '3.11'
# via anyio
fastapi==0.115.8 fastapi==0.115.8
# via llama-stack # via llama-stack
filelock==3.17.0 filelock==3.17.0
@ -58,6 +54,8 @@ h11==0.16.0
# via # via
# httpcore # httpcore
# llama-stack # llama-stack
hf-xet==1.1.4 ; (platform_machine == 'aarch64' and sys_platform != 'darwin') or (platform_machine == 'amd64' and sys_platform != 'darwin') or (platform_machine == 'arm64' and sys_platform != 'darwin') or (platform_machine == 'x86_64' and sys_platform != 'darwin')
# via huggingface-hub
httpcore==1.0.9 httpcore==1.0.9
# via httpx # via httpx
httpx==0.28.1 httpx==0.28.1
@ -65,7 +63,9 @@ httpx==0.28.1
# llama-stack # llama-stack
# llama-stack-client # llama-stack-client
# openai # openai
huggingface-hub==0.29.0 huggingface-hub==0.29.0 ; sys_platform == 'darwin'
# via llama-stack
huggingface-hub==0.33.0 ; sys_platform != 'darwin'
# via llama-stack # via llama-stack
idna==3.10 idna==3.10
# via # via
@ -99,7 +99,7 @@ openai==1.71.0
# via llama-stack # via llama-stack
packaging==24.2 packaging==24.2
# via huggingface-hub # via huggingface-hub
pandas==2.2.3 pandas==2.1.1
# via llama-stack-client # via llama-stack-client
pillow==11.1.0 pillow==11.1.0
# via llama-stack # via llama-stack
@ -147,7 +147,12 @@ referencing==0.36.2
# jsonschema-specifications # jsonschema-specifications
regex==2024.11.6 regex==2024.11.6
# via tiktoken # via tiktoken
requests==2.32.4 requests==2.32.2 ; (python_full_version < '3.12' and sys_platform == 'darwin') or (python_full_version >= '3.12' and sys_platform == 'linux') or (platform_machine != 'aarch64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')
# via
# huggingface-hub
# llama-stack
# tiktoken
requests==2.32.4 ; (python_full_version < '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform == 'darwin')
# via # via
# huggingface-hub # huggingface-hub
# llama-stack # llama-stack
@ -195,15 +200,15 @@ typing-extensions==4.12.2
# fastapi # fastapi
# huggingface-hub # huggingface-hub
# llama-stack-client # llama-stack-client
# multidict
# openai # openai
# pydantic # pydantic
# pydantic-core # pydantic-core
# referencing # referencing
# rich
tzdata==2025.1 tzdata==2025.1
# via pandas # via pandas
urllib3==2.3.0 urllib3==2.1.0 ; (python_full_version < '3.12' and platform_machine == 'aarch64' and sys_platform == 'linux') or (python_full_version >= '3.12' and sys_platform == 'darwin')
# via requests
urllib3==2.3.0 ; (python_full_version < '3.12' and sys_platform == 'darwin') or (python_full_version >= '3.12' and sys_platform == 'linux') or (platform_machine != 'aarch64' and sys_platform == 'linux') or (sys_platform != 'darwin' and sys_platform != 'linux')
# via requests # via requests
wcwidth==0.2.13 wcwidth==0.2.13
# via prompt-toolkit # via prompt-toolkit

View file

@ -13,8 +13,14 @@ from collections.abc import Iterable
from functools import partial from functools import partial
from pathlib import Path from pathlib import Path
import tomlkit
from rich.progress import Progress, SpinnerColumn, TextColumn from rich.progress import Progress, SpinnerColumn, TextColumn
from llama_stack.distribution.build import (
SERVER_DEPENDENCIES,
get_provider_dependencies,
)
REPO_ROOT = Path(__file__).parent.parent REPO_ROOT = Path(__file__).parent.parent
@ -85,6 +91,24 @@ def check_for_changes(change_tracker: ChangedPathTracker) -> bool:
return has_changes return has_changes
def collect_template_dependencies(template_dir: Path) -> tuple[str | None, list[str]]:
try:
module_name = f"llama_stack.templates.{template_dir.name}"
module = importlib.import_module(module_name)
if template_func := getattr(module, "get_distribution_template", None):
template = template_func()
normal_deps, special_deps = get_provider_dependencies(template)
# Combine all dependencies in order: normal deps, special deps, server deps
all_deps = sorted(set(normal_deps + SERVER_DEPENDENCIES)) + sorted(set(special_deps))
return template.name, all_deps
except Exception as e:
print("Error collecting template dependencies for", template_dir, e)
return None, []
return None, []
def pre_import_templates(template_dirs: list[Path]) -> None: def pre_import_templates(template_dirs: list[Path]) -> None:
# Pre-import all template modules to avoid deadlocks. # Pre-import all template modules to avoid deadlocks.
for template_dir in template_dirs: for template_dir in template_dirs:
@ -92,6 +116,53 @@ def pre_import_templates(template_dirs: list[Path]) -> None:
importlib.import_module(module_name) importlib.import_module(module_name)
def generate_dependencies_files(change_tracker: ChangedPathTracker):
templates_dir = REPO_ROOT / "llama_stack" / "templates"
distribution_deps = {}
for template_dir in find_template_dirs(templates_dir):
print("template_dir", template_dir)
name, deps = collect_template_dependencies(template_dir)
if name:
distribution_deps[name] = deps
else:
print("No template function found for", template_dir)
# First, remove any distributions that are no longer present
pyproject_file = REPO_ROOT / "pyproject.toml"
change_tracker.add_paths(pyproject_file)
# Read and parse the current pyproject.toml content
with open(pyproject_file) as fp:
pyproject = tomlkit.load(fp)
# Get current optional dependencies
current_deps = pyproject["project"]["optional-dependencies"]
# Store ui dependencies if they exist
ui_deps = current_deps.get("ui")
# Remove distributions that are no longer present
for name in list(current_deps.keys()):
if name not in distribution_deps.keys() and name != "ui":
del current_deps[name]
# Now add/update the remaining distributions
for name, deps in distribution_deps.items():
deps_array = tomlkit.array()
for dep in sorted(deps):
deps_array.append(dep)
current_deps[name] = deps_array.multiline(True)
# Restore ui dependencies if they existed
if ui_deps is not None:
current_deps["ui"] = ui_deps
# Write back to pyproject.toml
with open(pyproject_file, "w") as fp:
tomlkit.dump(pyproject, fp)
def main(): def main():
templates_dir = REPO_ROOT / "llama_stack" / "templates" templates_dir = REPO_ROOT / "llama_stack" / "templates"
change_tracker = ChangedPathTracker() change_tracker = ChangedPathTracker()
@ -114,6 +185,9 @@ def main():
list(executor.map(process_func, template_dirs)) list(executor.map(process_func, template_dirs))
progress.update(task, advance=len(template_dirs)) progress.update(task, advance=len(template_dirs))
# TODO: generate a Containerfile for each distribution as well?
generate_dependencies_files(change_tracker)
if check_for_changes(change_tracker): if check_for_changes(change_tracker):
print( print(
"Distribution template changes detected. Please commit the changes.", "Distribution template changes detected. Please commit the changes.",

View file

@ -16,4 +16,4 @@ if [ $FOUND_PYTHON -ne 0 ]; then
uv python install "$PYTHON_VERSION" uv python install "$PYTHON_VERSION"
fi fi
uv run --python "$PYTHON_VERSION" --with-editable . --group unit pytest --asyncio-mode=auto -s -v tests/unit/ $@ uv run --python "$PYTHON_VERSION" --with-editable . --group dev --group unit pytest --asyncio-mode=auto -s -v tests/unit/ $@

5044
uv.lock generated

File diff suppressed because it is too large Load diff