mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-17 15:52:39 +00:00
feat: remove usage of build yaml (#4192)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Failing after 3s
Test llama stack list-deps / list-deps (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 23s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / show-single-provider (push) Successful in 36s
Test llama stack list-deps / list-deps-from-config (push) Successful in 44s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Test External API and Providers / test-external (venv) (push) Failing after 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m56s
UI Tests / ui-tests (22) (push) Successful in 2m2s
Unit Tests / unit-tests (3.13) (push) Failing after 2m35s
Pre-commit / pre-commit (22) (push) Successful in 3m16s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m30s
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Failing after 3s
Test llama stack list-deps / list-deps (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 23s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / show-single-provider (push) Successful in 36s
Test llama stack list-deps / list-deps-from-config (push) Successful in 44s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Test External API and Providers / test-external (venv) (push) Failing after 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m56s
UI Tests / ui-tests (22) (push) Successful in 2m2s
Unit Tests / unit-tests (3.13) (push) Failing after 2m35s
Pre-commit / pre-commit (22) (push) Successful in 3m16s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m30s
# What does this PR do? the build.yaml is only used in the following ways: 1. list-deps 2. distribution code-gen since `llama stack build` no longer exists, I found myself asking "why do we need two different files for list-deps and run"? Removing the BuildConfig and altering the usage of the DistributionTemplate in llama stack list-deps is the first step in removing the build yaml entirely. Removing the BuildConfig and build.yaml cuts the files users need to maintain in half, and allows us to focus on the stability of _just_ the run.yaml This PR removes the build.yaml, BuildConfig datatype, and its usage throughout the codebase. Users are now expected to point to run.yaml files when running list-deps, and our codebase automatically uses these types now for things like `get_provider_registry`. **Additionally, two renames: `StackRunConfig` -> `StackConfig` and `run.yaml` -> `config.yaml`.** The build.yaml made sense for when we were managing the build process for the user and actually _producing_ a run.yaml _from_ the build.yaml, but now that we are simply just getting the provider registry and listing the deps, switching to config.yaml simplifies the scope here greatly. ## Test Plan existing list-deps usage should work in the tests. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
parent
17e6912288
commit
661985e240
103 changed files with 972 additions and 1031 deletions
|
|
@ -96,7 +96,7 @@ We have built-in functionality to run the supported open-benchmarks using llama-
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' template
|
||||
```
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -85,7 +85,7 @@ Llama Stack provides OpenAI-compatible RAG capabilities through:
|
|||
|
||||
## Configuring Default Embedding Models
|
||||
|
||||
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your run.yaml like so:
|
||||
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your config.yaml like so:
|
||||
|
||||
```yaml
|
||||
vector_stores:
|
||||
|
|
|
|||
|
|
@ -22,7 +22,7 @@ export OTEL_SERVICE_NAME="llama-stack-server"
|
|||
uv pip install opentelemetry-distro opentelemetry-exporter-otlp
|
||||
uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement -
|
||||
|
||||
uv run opentelemetry-instrument llama stack run run.yaml
|
||||
uv run opentelemetry-instrument llama stack run config.yaml
|
||||
```
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -85,7 +85,7 @@ Features:
|
|||
- Context retrieval with token limits
|
||||
|
||||
:::note[Default Configuration]
|
||||
By default, llama stack run.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
|
||||
By default, llama stack config.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
|
||||
:::
|
||||
|
||||
## Model Context Protocol (MCP)
|
||||
|
|
|
|||
|
|
@ -337,7 +337,7 @@ uv pip install -e .
|
|||
7. Configure Llama Stack to use the provider:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/run-byoa.yaml
|
||||
# ~/.llama/config.yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
|
|
@ -356,7 +356,7 @@ server:
|
|||
8. Run the server:
|
||||
|
||||
```bash
|
||||
llama stack run ~/.llama/run-byoa.yaml
|
||||
llama stack run ~/.llama/config.yaml
|
||||
```
|
||||
|
||||
9. Test the API:
|
||||
|
|
|
|||
|
|
@ -47,7 +47,7 @@ We have built-in functionality to run the supported open-benckmarks using llama-
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' template
|
||||
```bash
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
```
|
||||
|
||||
#### Run eval CLI
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ This guide will walk you through the process of adding a new API provider to Lla
|
|||
- Begin by reviewing the [core concepts](../concepts/) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
|
||||
- Determine the provider type ([Remote](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/remote) or [Inline](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline)). Remote providers make requests to external services, while inline providers execute implementation locally.
|
||||
- Add your provider to the appropriate [Registry](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/registry/). Specify pip dependencies necessary.
|
||||
- Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `build.yaml` and `run.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
|
||||
- Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `config.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
|
||||
|
||||
|
||||
Here are some example PRs to help you get started:
|
||||
|
|
|
|||
|
|
@ -133,7 +133,7 @@ For more information about the operator, see the [llama-stack-k8s-operator repos
|
|||
### Step 4: Deploy Llama Stack Server using Operator
|
||||
|
||||
Create a `LlamaStackDistribution` custom resource to deploy the Llama Stack server. The operator will automatically create the necessary Deployment, Service, and other resources.
|
||||
You can optionally override the default `run.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
|
||||
You can optionally override the default `config.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
|
|
@ -155,7 +155,7 @@ spec:
|
|||
value: "4096"
|
||||
- name: VLLM_API_TOKEN
|
||||
value: "fake"
|
||||
# Optional: override run.yaml from a ConfigMap using userConfig
|
||||
# Optional: override config.yaml from a ConfigMap using userConfig
|
||||
userConfig:
|
||||
configMap:
|
||||
name: llama-stack-config
|
||||
|
|
@ -172,7 +172,7 @@ EOF
|
|||
- `server.distribution.image`: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence over `name`.
|
||||
- `server.containerSpec.port`: Port on which the Llama Stack server listens (default: 8321)
|
||||
- `server.containerSpec.env`: Environment variables to configure providers:
|
||||
- `server.userConfig`: (Optional) Override the default `run.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
|
||||
- `server.userConfig`: (Optional) Override the default `config.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
|
||||
- `server.storage.size`: Size of the persistent volume for model and data storage
|
||||
- `server.storage.mountPath`: Where to mount the storage in the container
|
||||
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ This guide walks you through inspecting existing distributions, customising thei
|
|||
All first-party distributions live under `llama_stack/distributions/`. Each directory contains:
|
||||
|
||||
- `build.yaml` – the distribution specification (providers, additional dependencies, optional external provider directories).
|
||||
- `run.yaml` – sample run configuration (when provided).
|
||||
- `config.yaml` – sample run configuration (when provided).
|
||||
- Documentation fragments that power this site.
|
||||
|
||||
Browse that folder to understand available providers and copy a distribution to use as a starting point. When creating a new stack, duplicate an existing directory, rename it, and adjust the `build.yaml` file to match your requirements.
|
||||
|
|
@ -35,7 +35,7 @@ docker build . \
|
|||
Handy build arguments:
|
||||
|
||||
- `DISTRO_NAME` – distribution directory name (defaults to `starter`).
|
||||
- `RUN_CONFIG_PATH` – absolute path inside the build context for a run config that should be baked into the image (e.g. `/workspace/run.yaml`).
|
||||
- `RUN_CONFIG_PATH` – absolute path inside the build context for a run config that should be baked into the image (e.g. `/workspace/config.yaml`).
|
||||
- `INSTALL_MODE=editable` – install the repository copied into `/workspace` with `uv pip install -e`. Pair it with `--build-arg LLAMA_STACK_DIR=/workspace`.
|
||||
- `LLAMA_STACK_CLIENT_DIR` – optional editable install of the Python client.
|
||||
- `PYPI_VERSION` / `TEST_PYPI_VERSION` – pin specific releases when not using editable installs.
|
||||
|
|
@ -50,7 +50,7 @@ External providers live outside the main repository but can be bundled by pointi
|
|||
|
||||
1. Copy providers into the build context, for example `cp -R path/to/providers providers.d`.
|
||||
2. Update `build.yaml` with the directory and provider entries.
|
||||
3. Adjust run configs to use the in-container path (usually `/.llama/providers.d`). Pass `--build-arg RUN_CONFIG_PATH=/workspace/run.yaml` if you want to bake the config.
|
||||
3. Adjust run configs to use the in-container path (usually `/.llama/providers.d`). Pass `--build-arg RUN_CONFIG_PATH=/workspace/config.yaml` if you want to bake the config.
|
||||
|
||||
Example `build.yaml` excerpt for a custom Ollama provider:
|
||||
|
||||
|
|
@ -142,7 +142,7 @@ If you prepared a custom run config, mount it into the container and reference i
|
|||
```bash
|
||||
docker run \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v $(pwd)/run.yaml:/app/run.yaml \
|
||||
-v $(pwd)/config.yaml:/app/config.yaml \
|
||||
llama-stack:starter \
|
||||
/app/run.yaml
|
||||
/app/config.yaml
|
||||
```
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ sidebar_position: 6
|
|||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||
|
||||
```{note}
|
||||
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
|
||||
The default `config.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your config.yaml Configuration](customizing_run_yaml.md).
|
||||
```
|
||||
|
||||
```{dropdown} 👋 Click here for a Sample Configuration File
|
||||
|
|
@ -195,7 +195,7 @@ You can override environment variables at runtime by setting them in your shell
|
|||
# Set environment variables in your shell
|
||||
export API_KEY=sk-123
|
||||
export BASE_URL=https://custom-api.com
|
||||
llama stack run --config run.yaml
|
||||
llama stack run --config config.yaml
|
||||
```
|
||||
|
||||
#### Type Safety
|
||||
|
|
|
|||
|
|
@ -1,16 +1,16 @@
|
|||
---
|
||||
title: Customizing run.yaml
|
||||
description: Customizing run.yaml files for Llama Stack templates
|
||||
sidebar_label: Customizing run.yaml
|
||||
title: Customizing config.yaml
|
||||
description: Customizing config.yaml files for Llama Stack templates
|
||||
sidebar_label: Customizing config.yaml
|
||||
sidebar_position: 4
|
||||
---
|
||||
# Customizing run.yaml Files
|
||||
# Customizing config.yaml Files
|
||||
|
||||
The `run.yaml` files generated by Llama Stack templates are **starting points** designed to be customized for your specific needs. They are not meant to be used as-is in production environments.
|
||||
The `config.yaml` files generated by Llama Stack templates are **starting points** designed to be customized for your specific needs. They are not meant to be used as-is in production environments.
|
||||
|
||||
## Key Points
|
||||
|
||||
- **Templates are starting points**: Generated `run.yaml` files contain defaults for development/testing
|
||||
- **Templates are starting points**: Generated `config.yaml` files contain defaults for development/testing
|
||||
- **Customization expected**: Update URLs, credentials, models, and settings for your environment
|
||||
- **Version control separately**: Keep customized configs in your own repository
|
||||
- **Environment-specific**: Create different configurations for dev, staging, production
|
||||
|
|
@ -29,7 +29,7 @@ You can customize:
|
|||
## Best Practices
|
||||
|
||||
- Use environment variables for secrets and environment-specific values
|
||||
- Create separate `run.yaml` files for different environments (dev, staging, prod)
|
||||
- Create separate `config.yaml` files for different environments (dev, staging, prod)
|
||||
- Document your changes with comments
|
||||
- Test configurations before deployment
|
||||
- Keep your customized configs in version control
|
||||
|
|
@ -38,8 +38,8 @@ Example structure:
|
|||
```
|
||||
your-project/
|
||||
├── configs/
|
||||
│ ├── dev-run.yaml
|
||||
│ ├── prod-run.yaml
|
||||
│ ├── dev-config.yaml
|
||||
│ ├── prod-config.yaml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -33,7 +33,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
|
|||
response = client.models.list()
|
||||
```
|
||||
|
||||
If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
|
||||
If you've created a [custom distribution](./building_distro), you can also use the config.yaml configuration file directly:
|
||||
|
||||
```python
|
||||
client = LlamaStackAsLibraryClient(config_path)
|
||||
|
|
|
|||
|
|
@ -15,7 +15,7 @@ This section provides an overview of the distributions available in Llama Stack.
|
|||
|
||||
- **[Available Distributions](./list_of_distributions.mdx)** - Complete list and comparison of all distributions
|
||||
- **[Building Custom Distributions](./building_distro.mdx)** - Create your own distribution from scratch
|
||||
- **[Customizing Configuration](./customizing_run_yaml.mdx)** - Customize run.yaml for your needs
|
||||
- **[Customizing Configuration](./customizing_run_yaml.mdx)** - Customize config.yaml for your needs
|
||||
- **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
|
||||
- **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
|
||||
- **[Configuration Reference](./configuration.mdx)** - Configuration file format details
|
||||
|
|
|
|||
|
|
@ -67,11 +67,11 @@ LLAMA_STACK_PORT=5001
|
|||
docker run \
|
||||
-it \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ./run.yaml:/root/my-run.yaml \
|
||||
-v ./config.yaml:/root/my-config.yaml \
|
||||
-e WATSONX_API_KEY=$WATSONX_API_KEY \
|
||||
-e WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
|
||||
-e WATSONX_BASE_URL=$WATSONX_BASE_URL \
|
||||
llamastack/distribution-watsonx \
|
||||
--config /root/my-run.yaml \
|
||||
--config /root/my-config.yaml \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ The only difference vs. the `tgi` distribution is that it runs the Dell-TGI serv
|
|||
```
|
||||
$ cd distributions/dell-tgi/
|
||||
$ ls
|
||||
compose.yaml README.md run.yaml
|
||||
compose.yaml README.md config.yaml
|
||||
$ docker compose up
|
||||
```
|
||||
|
||||
|
|
@ -65,10 +65,10 @@ registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1
|
|||
#### Start Llama Stack server pointing to TGI server
|
||||
|
||||
```
|
||||
docker run --pull always --network host -it -p 8321:8321 -v ./run.yaml:/root/my-run.yaml --gpus=all llamastack/distribution-tgi --yaml_config /root/my-run.yaml
|
||||
docker run --pull always --network host -it -p 8321:8321 -v ./config.yaml:/root/my-config.yaml --gpus=all llamastack/distribution-tgi --yaml_config /root/my-config.yaml
|
||||
```
|
||||
|
||||
Make sure in you `run.yaml` file, you inference provider is pointing to the correct TGI server endpoint. E.g.
|
||||
Make sure in you `config.yaml` file, you inference provider is pointing to the correct TGI server endpoint. E.g.
|
||||
```
|
||||
inference:
|
||||
- provider_id: tgi0
|
||||
|
|
|
|||
|
|
@ -152,14 +152,14 @@ docker run \
|
|||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v $HOME/.llama:/root/.llama \
|
||||
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
|
||||
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-config.yaml \
|
||||
-e INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
-e DEH_URL=$DEH_URL \
|
||||
-e SAFETY_MODEL=$SAFETY_MODEL \
|
||||
-e DEH_SAFETY_URL=$DEH_SAFETY_URL \
|
||||
-e CHROMA_URL=$CHROMA_URL \
|
||||
llamastack/distribution-dell \
|
||||
--config /root/my-run.yaml \
|
||||
--config /root/my-config.yaml \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -84,8 +84,8 @@ docker run \
|
|||
You can also run the Docker container with a custom run configuration file by mounting it into the container:
|
||||
|
||||
```bash
|
||||
# Set the path to your custom run.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
|
||||
# Set the path to your custom config.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-config.yaml
|
||||
LLAMA_STACK_PORT=8321
|
||||
|
||||
docker run \
|
||||
|
|
@ -94,8 +94,8 @@ docker run \
|
|||
--gpu all \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-run.yaml \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-config.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-config.yaml \
|
||||
llamastack/distribution-meta-reference-gpu \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
|
@ -103,7 +103,7 @@ docker run \
|
|||
**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
|
||||
|
||||
Available run configurations for this distribution:
|
||||
- `run.yaml`
|
||||
- `config.yaml`
|
||||
- `run-with-safety.yaml`
|
||||
|
||||
### Via venv
|
||||
|
|
@ -113,7 +113,7 @@ Make sure you have the Llama Stack CLI available.
|
|||
```bash
|
||||
llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
||||
llama stack run distributions/meta-reference-gpu/run.yaml \
|
||||
llama stack run distributions/meta-reference-gpu/config.yaml \
|
||||
--port 8321
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -138,8 +138,8 @@ docker run \
|
|||
You can also run the Docker container with a custom run configuration file by mounting it into the container:
|
||||
|
||||
```bash
|
||||
# Set the path to your custom run.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
|
||||
# Set the path to your custom config.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-config.yaml
|
||||
LLAMA_STACK_PORT=8321
|
||||
|
||||
docker run \
|
||||
|
|
@ -147,8 +147,8 @@ docker run \
|
|||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-run.yaml \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-config.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-config.yaml \
|
||||
-e NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
llamastack/distribution-nvidia \
|
||||
--port $LLAMA_STACK_PORT
|
||||
|
|
@ -157,7 +157,7 @@ docker run \
|
|||
**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
|
||||
|
||||
Available run configurations for this distribution:
|
||||
- `run.yaml`
|
||||
- `config.yaml`
|
||||
- `run-with-safety.yaml`
|
||||
|
||||
### Via venv
|
||||
|
|
@ -169,7 +169,7 @@ INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
|
|||
llama stack list-deps nvidia | xargs -L1 uv pip install
|
||||
NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
llama stack run ./run.yaml \
|
||||
llama stack run ./config.yaml \
|
||||
--port 8321
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -98,7 +98,7 @@ Note to start the container with Podman, you can do the same but replace `docker
|
|||
`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
|
||||
with `host.containers.internal`.
|
||||
|
||||
The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`.
|
||||
The configuration YAML for the Ollama distribution is available at `distributions/ollama/config.yaml`.
|
||||
|
||||
:::tip
|
||||
Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host's network directly so it can connect to Ollama running on `localhost:11434`.
|
||||
|
|
|
|||
|
|
@ -222,22 +222,21 @@ def get_provider_spec() -> ProviderSpec:
|
|||
|
||||
[ramalama-stack](https://github.com/containers/ramalama-stack) is a recognized external provider that supports installation via module.
|
||||
|
||||
To install Llama Stack with this external provider a user can provider the following build.yaml:
|
||||
To install Llama Stack with this external provider a user can provider the following config.yaml:
|
||||
|
||||
```yaml
|
||||
version: 2
|
||||
distribution_spec:
|
||||
description: Use (an external) Ramalama server for running LLM inference
|
||||
container_image: null
|
||||
providers:
|
||||
inference:
|
||||
- provider_type: remote::ramalama
|
||||
module: ramalama_stack==0.3.0a0
|
||||
image_type: venv
|
||||
image_name: null
|
||||
additional_pip_packages:
|
||||
- aiosqlite
|
||||
- sqlalchemy[asyncio]
|
||||
image_name: ramalama
|
||||
apis:
|
||||
- inference
|
||||
providers:
|
||||
inference:
|
||||
- provider_id: ramalama
|
||||
provider_type: remote::ramalama
|
||||
module: ramalama_stack==0.3.0a0
|
||||
config: {}
|
||||
server:
|
||||
port: 8321
|
||||
```
|
||||
|
||||
No other steps are required beyond installing dependencies with `llama stack list-deps <distro> | xargs -L1 uv pip install` and then running `llama stack run`. The CLI will use `module` to install the provider dependencies, retrieve the spec, etc.
|
||||
|
|
|
|||
|
|
@ -51,7 +51,7 @@ results = await client.vector_stores.search(
|
|||
|
||||
> **Note**: For detailed configuration examples and options, see [Configuration Examples](../openai_file_operations_support.md#configuration-examples) in the full documentation.
|
||||
|
||||
**Basic Setup**: Configure vector_io and files providers in your run.yaml
|
||||
**Basic Setup**: Configure vector_io and files providers in your config.yaml
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
|
|
|
|||
|
|
@ -123,7 +123,7 @@ Connectors are MCP servers maintained and managed by the Responses API provider.
|
|||
|
||||
**Open Questions:**
|
||||
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
|
||||
- Should there be a mechanism for administrators to add custom connectors via `run.yaml` or an API?
|
||||
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -210,7 +210,7 @@ Metadata allows you to attach additional information to a response for your own
|
|||
|
||||
**Status:** Feature Request
|
||||
|
||||
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `run.yaml` or an administrative API.
|
||||
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `config.yaml` or an administrative API.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -355,7 +355,7 @@ The purpose of scoring function is to calculate the score for each example based
|
|||
Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
|
||||
|
||||
### Add new benchmark into template
|
||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/run.yaml)
|
||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/config.yaml)
|
||||
|
||||
Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
|
||||
- `benchmark_id`: identifier of the benchmark
|
||||
|
|
@ -366,7 +366,7 @@ Secondly, you need to add the new benchmark you just created under the `benchmar
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' templates
|
||||
```bash
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
```
|
||||
|
||||
Run eval benchmark CLI with your new benchmark id
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue