mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
Merge 0f4790f531 into 4237eb4aaa
This commit is contained in:
commit
c06f681a02
102 changed files with 971 additions and 1030 deletions
|
|
@ -96,7 +96,7 @@ We have built-in functionality to run the supported open-benchmarks using llama-
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' template
|
||||
```
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -85,7 +85,7 @@ Llama Stack provides OpenAI-compatible RAG capabilities through:
|
|||
|
||||
## Configuring Default Embedding Models
|
||||
|
||||
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your run.yaml like so:
|
||||
To enable automatic vector store creation without specifying embedding models, configure a default embedding model in your config.yaml like so:
|
||||
|
||||
```yaml
|
||||
vector_stores:
|
||||
|
|
|
|||
|
|
@ -85,7 +85,7 @@ Features:
|
|||
- Context retrieval with token limits
|
||||
|
||||
:::note[Default Configuration]
|
||||
By default, llama stack run.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
|
||||
By default, llama stack config.yaml defines toolgroups for web search, wolfram alpha and rag, that are provided by tavily-search, wolfram-alpha and rag providers.
|
||||
:::
|
||||
|
||||
## Model Context Protocol (MCP)
|
||||
|
|
|
|||
|
|
@ -337,7 +337,7 @@ uv pip install -e .
|
|||
7. Configure Llama Stack to use the provider:
|
||||
|
||||
```yaml
|
||||
# ~/.llama/run-byoa.yaml
|
||||
# ~/.llama/config.yaml
|
||||
version: "2"
|
||||
image_name: "llama-stack-api-weather"
|
||||
apis:
|
||||
|
|
@ -356,7 +356,7 @@ server:
|
|||
8. Run the server:
|
||||
|
||||
```bash
|
||||
llama stack run ~/.llama/run-byoa.yaml
|
||||
llama stack run ~/.llama/config.yaml
|
||||
```
|
||||
|
||||
9. Test the API:
|
||||
|
|
|
|||
|
|
@ -47,7 +47,7 @@ We have built-in functionality to run the supported open-benckmarks using llama-
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' template
|
||||
```bash
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
```
|
||||
|
||||
#### Run eval CLI
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ This guide will walk you through the process of adding a new API provider to Lla
|
|||
- Begin by reviewing the [core concepts](../concepts/) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
|
||||
- Determine the provider type ([Remote](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/remote) or [Inline](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline)). Remote providers make requests to external services, while inline providers execute implementation locally.
|
||||
- Add your provider to the appropriate [Registry](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/registry/). Specify pip dependencies necessary.
|
||||
- Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `build.yaml` and `run.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
|
||||
- Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `config.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
|
||||
|
||||
|
||||
Here are some example PRs to help you get started:
|
||||
|
|
|
|||
|
|
@ -133,7 +133,7 @@ For more information about the operator, see the [llama-stack-k8s-operator repos
|
|||
### Step 4: Deploy Llama Stack Server using Operator
|
||||
|
||||
Create a `LlamaStackDistribution` custom resource to deploy the Llama Stack server. The operator will automatically create the necessary Deployment, Service, and other resources.
|
||||
You can optionally override the default `run.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
|
||||
You can optionally override the default `config.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
|
|
@ -155,7 +155,7 @@ spec:
|
|||
value: "4096"
|
||||
- name: VLLM_API_TOKEN
|
||||
value: "fake"
|
||||
# Optional: override run.yaml from a ConfigMap using userConfig
|
||||
# Optional: override config.yaml from a ConfigMap using userConfig
|
||||
userConfig:
|
||||
configMap:
|
||||
name: llama-stack-config
|
||||
|
|
@ -172,7 +172,7 @@ EOF
|
|||
- `server.distribution.image`: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence over `name`.
|
||||
- `server.containerSpec.port`: Port on which the Llama Stack server listens (default: 8321)
|
||||
- `server.containerSpec.env`: Environment variables to configure providers:
|
||||
- `server.userConfig`: (Optional) Override the default `run.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
|
||||
- `server.userConfig`: (Optional) Override the default `config.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
|
||||
- `server.storage.size`: Size of the persistent volume for model and data storage
|
||||
- `server.storage.mountPath`: Where to mount the storage in the container
|
||||
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ This guide walks you through inspecting existing distributions, customising thei
|
|||
All first-party distributions live under `llama_stack/distributions/`. Each directory contains:
|
||||
|
||||
- `build.yaml` – the distribution specification (providers, additional dependencies, optional external provider directories).
|
||||
- `run.yaml` – sample run configuration (when provided).
|
||||
- `config.yaml` – sample run configuration (when provided).
|
||||
- Documentation fragments that power this site.
|
||||
|
||||
Browse that folder to understand available providers and copy a distribution to use as a starting point. When creating a new stack, duplicate an existing directory, rename it, and adjust the `build.yaml` file to match your requirements.
|
||||
|
|
@ -35,7 +35,7 @@ docker build . \
|
|||
Handy build arguments:
|
||||
|
||||
- `DISTRO_NAME` – distribution directory name (defaults to `starter`).
|
||||
- `RUN_CONFIG_PATH` – absolute path inside the build context for a run config that should be baked into the image (e.g. `/workspace/run.yaml`).
|
||||
- `RUN_CONFIG_PATH` – absolute path inside the build context for a run config that should be baked into the image (e.g. `/workspace/config.yaml`).
|
||||
- `INSTALL_MODE=editable` – install the repository copied into `/workspace` with `uv pip install -e`. Pair it with `--build-arg LLAMA_STACK_DIR=/workspace`.
|
||||
- `LLAMA_STACK_CLIENT_DIR` – optional editable install of the Python client.
|
||||
- `PYPI_VERSION` / `TEST_PYPI_VERSION` – pin specific releases when not using editable installs.
|
||||
|
|
@ -50,7 +50,7 @@ External providers live outside the main repository but can be bundled by pointi
|
|||
|
||||
1. Copy providers into the build context, for example `cp -R path/to/providers providers.d`.
|
||||
2. Update `build.yaml` with the directory and provider entries.
|
||||
3. Adjust run configs to use the in-container path (usually `/.llama/providers.d`). Pass `--build-arg RUN_CONFIG_PATH=/workspace/run.yaml` if you want to bake the config.
|
||||
3. Adjust run configs to use the in-container path (usually `/.llama/providers.d`). Pass `--build-arg RUN_CONFIG_PATH=/workspace/config.yaml` if you want to bake the config.
|
||||
|
||||
Example `build.yaml` excerpt for a custom Ollama provider:
|
||||
|
||||
|
|
@ -142,7 +142,7 @@ If you prepared a custom run config, mount it into the container and reference i
|
|||
```bash
|
||||
docker run \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v $(pwd)/run.yaml:/app/run.yaml \
|
||||
-v $(pwd)/config.yaml:/app/config.yaml \
|
||||
llama-stack:starter \
|
||||
/app/run.yaml
|
||||
/app/config.yaml
|
||||
```
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ sidebar_position: 6
|
|||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||
|
||||
```{note}
|
||||
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
|
||||
The default `config.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your config.yaml Configuration](customizing_run_yaml.md).
|
||||
```
|
||||
|
||||
```{dropdown} 👋 Click here for a Sample Configuration File
|
||||
|
|
@ -195,7 +195,7 @@ You can override environment variables at runtime by setting them in your shell
|
|||
# Set environment variables in your shell
|
||||
export API_KEY=sk-123
|
||||
export BASE_URL=https://custom-api.com
|
||||
llama stack run --config run.yaml
|
||||
llama stack run --config config.yaml
|
||||
```
|
||||
|
||||
#### Type Safety
|
||||
|
|
|
|||
|
|
@ -1,16 +1,16 @@
|
|||
---
|
||||
title: Customizing run.yaml
|
||||
description: Customizing run.yaml files for Llama Stack templates
|
||||
sidebar_label: Customizing run.yaml
|
||||
title: Customizing config.yaml
|
||||
description: Customizing config.yaml files for Llama Stack templates
|
||||
sidebar_label: Customizing config.yaml
|
||||
sidebar_position: 4
|
||||
---
|
||||
# Customizing run.yaml Files
|
||||
# Customizing config.yaml Files
|
||||
|
||||
The `run.yaml` files generated by Llama Stack templates are **starting points** designed to be customized for your specific needs. They are not meant to be used as-is in production environments.
|
||||
The `config.yaml` files generated by Llama Stack templates are **starting points** designed to be customized for your specific needs. They are not meant to be used as-is in production environments.
|
||||
|
||||
## Key Points
|
||||
|
||||
- **Templates are starting points**: Generated `run.yaml` files contain defaults for development/testing
|
||||
- **Templates are starting points**: Generated `config.yaml` files contain defaults for development/testing
|
||||
- **Customization expected**: Update URLs, credentials, models, and settings for your environment
|
||||
- **Version control separately**: Keep customized configs in your own repository
|
||||
- **Environment-specific**: Create different configurations for dev, staging, production
|
||||
|
|
@ -29,7 +29,7 @@ You can customize:
|
|||
## Best Practices
|
||||
|
||||
- Use environment variables for secrets and environment-specific values
|
||||
- Create separate `run.yaml` files for different environments (dev, staging, prod)
|
||||
- Create separate `config.yaml` files for different environments (dev, staging, prod)
|
||||
- Document your changes with comments
|
||||
- Test configurations before deployment
|
||||
- Keep your customized configs in version control
|
||||
|
|
@ -38,8 +38,8 @@ Example structure:
|
|||
```
|
||||
your-project/
|
||||
├── configs/
|
||||
│ ├── dev-run.yaml
|
||||
│ ├── prod-run.yaml
|
||||
│ ├── dev-config.yaml
|
||||
│ ├── prod-config.yaml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -33,7 +33,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
|
|||
response = client.models.list()
|
||||
```
|
||||
|
||||
If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
|
||||
If you've created a [custom distribution](./building_distro), you can also use the config.yaml configuration file directly:
|
||||
|
||||
```python
|
||||
client = LlamaStackAsLibraryClient(config_path)
|
||||
|
|
|
|||
|
|
@ -15,7 +15,7 @@ This section provides an overview of the distributions available in Llama Stack.
|
|||
|
||||
- **[Available Distributions](./list_of_distributions.mdx)** - Complete list and comparison of all distributions
|
||||
- **[Building Custom Distributions](./building_distro.mdx)** - Create your own distribution from scratch
|
||||
- **[Customizing Configuration](./customizing_run_yaml.mdx)** - Customize run.yaml for your needs
|
||||
- **[Customizing Configuration](./customizing_run_yaml.mdx)** - Customize config.yaml for your needs
|
||||
- **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
|
||||
- **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
|
||||
- **[Configuration Reference](./configuration.mdx)** - Configuration file format details
|
||||
|
|
|
|||
|
|
@ -67,11 +67,11 @@ LLAMA_STACK_PORT=5001
|
|||
docker run \
|
||||
-it \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ./run.yaml:/root/my-run.yaml \
|
||||
-v ./config.yaml:/root/my-config.yaml \
|
||||
-e WATSONX_API_KEY=$WATSONX_API_KEY \
|
||||
-e WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
|
||||
-e WATSONX_BASE_URL=$WATSONX_BASE_URL \
|
||||
llamastack/distribution-watsonx \
|
||||
--config /root/my-run.yaml \
|
||||
--config /root/my-config.yaml \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ The only difference vs. the `tgi` distribution is that it runs the Dell-TGI serv
|
|||
```
|
||||
$ cd distributions/dell-tgi/
|
||||
$ ls
|
||||
compose.yaml README.md run.yaml
|
||||
compose.yaml README.md config.yaml
|
||||
$ docker compose up
|
||||
```
|
||||
|
||||
|
|
@ -65,10 +65,10 @@ registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1
|
|||
#### Start Llama Stack server pointing to TGI server
|
||||
|
||||
```
|
||||
docker run --pull always --network host -it -p 8321:8321 -v ./run.yaml:/root/my-run.yaml --gpus=all llamastack/distribution-tgi --yaml_config /root/my-run.yaml
|
||||
docker run --pull always --network host -it -p 8321:8321 -v ./config.yaml:/root/my-config.yaml --gpus=all llamastack/distribution-tgi --yaml_config /root/my-config.yaml
|
||||
```
|
||||
|
||||
Make sure in you `run.yaml` file, you inference provider is pointing to the correct TGI server endpoint. E.g.
|
||||
Make sure in you `config.yaml` file, you inference provider is pointing to the correct TGI server endpoint. E.g.
|
||||
```
|
||||
inference:
|
||||
- provider_id: tgi0
|
||||
|
|
|
|||
|
|
@ -152,14 +152,14 @@ docker run \
|
|||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v $HOME/.llama:/root/.llama \
|
||||
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
|
||||
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-config.yaml \
|
||||
-e INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
-e DEH_URL=$DEH_URL \
|
||||
-e SAFETY_MODEL=$SAFETY_MODEL \
|
||||
-e DEH_SAFETY_URL=$DEH_SAFETY_URL \
|
||||
-e CHROMA_URL=$CHROMA_URL \
|
||||
llamastack/distribution-dell \
|
||||
--config /root/my-run.yaml \
|
||||
--config /root/my-config.yaml \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -84,8 +84,8 @@ docker run \
|
|||
You can also run the Docker container with a custom run configuration file by mounting it into the container:
|
||||
|
||||
```bash
|
||||
# Set the path to your custom run.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
|
||||
# Set the path to your custom config.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-config.yaml
|
||||
LLAMA_STACK_PORT=8321
|
||||
|
||||
docker run \
|
||||
|
|
@ -94,8 +94,8 @@ docker run \
|
|||
--gpu all \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-run.yaml \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-config.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-config.yaml \
|
||||
llamastack/distribution-meta-reference-gpu \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
|
@ -103,7 +103,7 @@ docker run \
|
|||
**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
|
||||
|
||||
Available run configurations for this distribution:
|
||||
- `run.yaml`
|
||||
- `config.yaml`
|
||||
- `run-with-safety.yaml`
|
||||
|
||||
### Via venv
|
||||
|
|
@ -113,7 +113,7 @@ Make sure you have the Llama Stack CLI available.
|
|||
```bash
|
||||
llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
|
||||
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
|
||||
llama stack run distributions/meta-reference-gpu/run.yaml \
|
||||
llama stack run distributions/meta-reference-gpu/config.yaml \
|
||||
--port 8321
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -138,8 +138,8 @@ docker run \
|
|||
You can also run the Docker container with a custom run configuration file by mounting it into the container:
|
||||
|
||||
```bash
|
||||
# Set the path to your custom run.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-run.yaml
|
||||
# Set the path to your custom config.yaml file
|
||||
CUSTOM_RUN_CONFIG=/path/to/your/custom-config.yaml
|
||||
LLAMA_STACK_PORT=8321
|
||||
|
||||
docker run \
|
||||
|
|
@ -147,8 +147,8 @@ docker run \
|
|||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-run.yaml \
|
||||
-v $CUSTOM_RUN_CONFIG:/app/custom-config.yaml \
|
||||
-e RUN_CONFIG_PATH=/app/custom-config.yaml \
|
||||
-e NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
llamastack/distribution-nvidia \
|
||||
--port $LLAMA_STACK_PORT
|
||||
|
|
@ -157,7 +157,7 @@ docker run \
|
|||
**Note**: The run configuration must be mounted into the container before it can be used. The `-v` flag mounts your local file into the container, and the `RUN_CONFIG_PATH` environment variable tells the entrypoint script which configuration to use.
|
||||
|
||||
Available run configurations for this distribution:
|
||||
- `run.yaml`
|
||||
- `config.yaml`
|
||||
- `run-with-safety.yaml`
|
||||
|
||||
### Via venv
|
||||
|
|
@ -169,7 +169,7 @@ INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
|
|||
llama stack list-deps nvidia | xargs -L1 uv pip install
|
||||
NVIDIA_API_KEY=$NVIDIA_API_KEY \
|
||||
INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
llama stack run ./run.yaml \
|
||||
llama stack run ./config.yaml \
|
||||
--port 8321
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -98,7 +98,7 @@ Note to start the container with Podman, you can do the same but replace `docker
|
|||
`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
|
||||
with `host.containers.internal`.
|
||||
|
||||
The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`.
|
||||
The configuration YAML for the Ollama distribution is available at `distributions/ollama/config.yaml`.
|
||||
|
||||
:::tip
|
||||
Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host's network directly so it can connect to Ollama running on `localhost:11434`.
|
||||
|
|
|
|||
|
|
@ -222,22 +222,21 @@ def get_provider_spec() -> ProviderSpec:
|
|||
|
||||
[ramalama-stack](https://github.com/containers/ramalama-stack) is a recognized external provider that supports installation via module.
|
||||
|
||||
To install Llama Stack with this external provider a user can provider the following build.yaml:
|
||||
To install Llama Stack with this external provider a user can provider the following config.yaml:
|
||||
|
||||
```yaml
|
||||
version: 2
|
||||
distribution_spec:
|
||||
description: Use (an external) Ramalama server for running LLM inference
|
||||
container_image: null
|
||||
providers:
|
||||
inference:
|
||||
- provider_type: remote::ramalama
|
||||
module: ramalama_stack==0.3.0a0
|
||||
image_type: venv
|
||||
image_name: null
|
||||
additional_pip_packages:
|
||||
- aiosqlite
|
||||
- sqlalchemy[asyncio]
|
||||
image_name: ramalama
|
||||
apis:
|
||||
- inference
|
||||
providers:
|
||||
inference:
|
||||
- provider_id: ramalama
|
||||
provider_type: remote::ramalama
|
||||
module: ramalama_stack==0.3.0a0
|
||||
config: {}
|
||||
server:
|
||||
port: 8321
|
||||
```
|
||||
|
||||
No other steps are required beyond installing dependencies with `llama stack list-deps <distro> | xargs -L1 uv pip install` and then running `llama stack run`. The CLI will use `module` to install the provider dependencies, retrieve the spec, etc.
|
||||
|
|
|
|||
|
|
@ -51,7 +51,7 @@ results = await client.vector_stores.search(
|
|||
|
||||
> **Note**: For detailed configuration examples and options, see [Configuration Examples](../openai_file_operations_support.md#configuration-examples) in the full documentation.
|
||||
|
||||
**Basic Setup**: Configure vector_io and files providers in your run.yaml
|
||||
**Basic Setup**: Configure vector_io and files providers in your config.yaml
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
|
|
|
|||
|
|
@ -123,7 +123,7 @@ Connectors are MCP servers maintained and managed by the Responses API provider.
|
|||
|
||||
**Open Questions:**
|
||||
- Should Llama Stack include built-in support for some, all, or none of OpenAI's connectors?
|
||||
- Should there be a mechanism for administrators to add custom connectors via `run.yaml` or an API?
|
||||
- Should there be a mechanism for administrators to add custom connectors via `config.yaml` or an API?
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -210,7 +210,7 @@ Metadata allows you to attach additional information to a response for your own
|
|||
|
||||
**Status:** Feature Request
|
||||
|
||||
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `run.yaml` or an administrative API.
|
||||
When calling the OpenAI Responses API, model outputs go through safety models configured by OpenAI administrators. Perhaps Llama Stack should provide a mechanism to configure safety models (or non-model logic) for all Responses requests, either through `config.yaml` or an administrative API.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -355,7 +355,7 @@ The purpose of scoring function is to calculate the score for each example based
|
|||
Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
|
||||
|
||||
### Add new benchmark into template
|
||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/run.yaml)
|
||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/config.yaml)
|
||||
|
||||
Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
|
||||
- `benchmark_id`: identifier of the benchmark
|
||||
|
|
@ -366,7 +366,7 @@ Secondly, you need to add the new benchmark you just created under the `benchmar
|
|||
|
||||
Spin up llama stack server with 'open-benchmark' templates
|
||||
```bash
|
||||
llama stack run llama_stack/distributions/open-benchmark/run.yaml
|
||||
llama stack run llama_stack/distributions/open-benchmark/config.yaml
|
||||
```
|
||||
|
||||
Run eval benchmark CLI with your new benchmark id
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue