Merge branch 'main' into feat/litellm_sambanova_usage

This commit is contained in:
Jorge Piedrahita Ortiz 2025-03-18 12:06:58 -05:00 committed by GitHub
commit 5bd1bd30e2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
76 changed files with 3534 additions and 2843 deletions

View file

@ -6,7 +6,7 @@ This guide will walk you through the process of adding a new API provider to Lla
- Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
- Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally.
- Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary.
- Update any distribution {repopath}`Templates::llama_stack/templates/` build.yaml and run.yaml files if they should include your provider by default. Run {repopath}`llama_stack/scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
- Update any distribution {repopath}`Templates::llama_stack/templates/` build.yaml and run.yaml files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
Here are some example PRs to help you get started:

View file

@ -185,8 +185,12 @@ llama stack build --config llama_stack/templates/ollama/build.yaml
:::
:::{tab-item} Building Container
> [!TIP]
> Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podman` in your environment to use Podman.
```{admonition} Podman Alternative
:class: tip
Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podman` in your environment to use Podman.
```
To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type.

View file

@ -6,13 +6,13 @@ The `llamastack/distribution-nvidia` distribution consists of the following prov
| API | Provider(s) |
|-----|-------------|
| agents | `inline::meta-reference` |
| datasetio | `remote::huggingface`, `inline::localfs` |
| datasetio | `inline::localfs` |
| eval | `inline::meta-reference` |
| inference | `remote::nvidia` |
| safety | `inline::llama-guard` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| safety | `remote::nvidia` |
| scoring | `inline::basic` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::code-interpreter`, `inline::rag-runtime`, `remote::model-context-protocol` |
| tool_runtime | `inline::rag-runtime` |
| vector_io | `inline::faiss` |
@ -20,8 +20,10 @@ The `llamastack/distribution-nvidia` distribution consists of the following prov
The following environment variables can be configured:
- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)
- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``)
- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`)
- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`)
- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`)
### Models

View file

@ -92,6 +92,8 @@ Interactive pages for users to play with and explore Llama Stack API capabilitie
## Starting the Llama Stack Playground
### Llama CLI
To start the Llama Stack Playground, run the following commands:
1. Start up the Llama Stack API server
@ -107,3 +109,28 @@ cd llama_stack/distribution/ui
pip install -r requirements.txt
streamlit run app.py
```
### Docker
Playground can also be started in a docker image:
```sh
export LLAMA_STACK_URL=http://localhost:11434
docker run \
-p 8501:8501 \
-e LLAMA_STACK_ENDPOINT=$LLAMA_STACK_URL \
quay.io/jland/llama-stack-playground
```
## Configurable Environment Variables
## Environment Variables
| Environment Variable | Description | Default Value |
|----------------------------|------------------------------------|---------------------------|
| LLAMA_STACK_ENDPOINT | The endpoint for the Llama Stack | http://localhost:8321 |
| FIREWORKS_API_KEY | API key for Fireworks provider | (empty string) |
| TOGETHER_API_KEY | API key for Together provider | (empty string) |
| SAMBANOVA_API_KEY | API key for SambaNova provider | (empty string) |
| OPENAI_API_KEY | API key for OpenAI provider | (empty string) |

View file

@ -114,23 +114,17 @@ pprint(response)
simpleqa_dataset_id = "huggingface::simpleqa"
_ = client.datasets.register(
purpose="eval/messages-answer",
source={
"type": "uri",
"uri": "huggingface://datasets/llamastack/simpleqa?split=train",
},
dataset_id=simpleqa_dataset_id,
provider_id="huggingface",
url={"uri": "https://huggingface.co/datasets/llamastack/simpleqa"},
metadata={
"path": "llamastack/simpleqa",
"split": "train",
},
dataset_schema={
"input_query": {"type": "string"},
"expected_answer": {"type": "string"},
"chat_completion_input": {"type": "chat_completion_input"},
},
)
eval_rows = client.datasetio.get_rows_paginated(
eval_rows = client.datasets.iterrows(
dataset_id=simpleqa_dataset_id,
rows_in_page=5,
limit=5,
)
```
@ -143,7 +137,7 @@ client.benchmarks.register(
response = client.eval.evaluate_rows(
benchmark_id="meta-reference::simpleqa",
input_rows=eval_rows.rows,
input_rows=eval_rows.data,
scoring_functions=["llm-as-judge::405b-simpleqa"],
benchmark_config={
"eval_candidate": {
@ -191,7 +185,7 @@ agent_config = {
response = client.eval.evaluate_rows(
benchmark_id="meta-reference::simpleqa",
input_rows=eval_rows.rows,
input_rows=eval_rows.data,
scoring_functions=["llm-as-judge::405b-simpleqa"],
benchmark_config={
"eval_candidate": {

View file

@ -6,17 +6,32 @@ The `llama-stack-client` CLI allows you to query information about the distribut
### `llama-stack-client`
```bash
llama-stack-client -h
llama-stack-client
Usage: llama-stack-client [OPTIONS] COMMAND [ARGS]...
usage: llama-stack-client [-h] {models,memory_banks,shields} ...
Welcome to the LlamaStackClient CLI
Welcome to the LlamaStackClient CLI
Options:
--version Show the version and exit.
--endpoint TEXT Llama Stack distribution endpoint
--api-key TEXT Llama Stack distribution API key
--config TEXT Path to config file
--help Show this message and exit.
options:
-h, --help show this help message and exit
subcommands:
{models,memory_banks,shields}
Commands:
configure Configure Llama Stack Client CLI.
datasets Manage datasets.
eval Run evaluation tasks.
eval_tasks Manage evaluation tasks.
inference Inference (chat).
inspect Inspect server configuration.
models Manage GenAI models.
post_training Post-training.
providers Manage API providers.
scoring_functions Manage scoring functions.
shields Manage safety shield services.
toolgroups Manage available tool groups.
vector_dbs Manage vector databases.
```
### `llama-stack-client configure`
@ -127,11 +142,11 @@ llama-stack-client vector_dbs list
llama-stack-client vector_dbs register <vector-db-id> [--provider-id <provider-id>] [--provider-vector-db-id <provider-vector-db-id>] [--embedding-model <embedding-model>] [--embedding-dimension <embedding-dimension>]
```
Options:
- `--provider-id`: Optional. Provider ID for the vector db
- `--provider-vector-db-id`: Optional. Provider's vector db ID
- `--embedding-model`: Optional. Embedding model to use. Default: "all-MiniLM-L6-v2"
- `--embedding-dimension`: Optional. Dimension of embeddings. Default: 384
Optional arguments:
- `--provider-id`: Provider ID for the vector db
- `--provider-vector-db-id`: Provider's vector db ID
- `--embedding-model`: Embedding model to use. Default: "all-MiniLM-L6-v2"
- `--embedding-dimension`: Dimension of embeddings. Default: 384
### `llama-stack-client vector_dbs unregister`
```bash
@ -157,11 +172,13 @@ llama-stack-client shields list
llama-stack-client shields register --shield-id <shield-id> [--provider-id <provider-id>] [--provider-shield-id <provider-shield-id>] [--params <params>]
```
Options:
- `--shield-id`: Required. ID of the shield
- `--provider-id`: Optional. Provider ID for the shield
- `--provider-shield-id`: Optional. Provider's shield ID
- `--params`: Optional. JSON configuration parameters for the shield
Required arguments:
- `--shield-id`: ID of the shield
Optional arguments:
- `--provider-id`: Provider ID for the shield
- `--provider-shield-id`: Provider's shield ID
- `--params`: JSON configuration parameters for the shield
## Eval Task Management
@ -175,13 +192,15 @@ llama-stack-client benchmarks list
llama-stack-client benchmarks register --eval-task-id <eval-task-id> --dataset-id <dataset-id> --scoring-functions <function1> [<function2> ...] [--provider-id <provider-id>] [--provider-eval-task-id <provider-eval-task-id>] [--metadata <metadata>]
```
Options:
- `--eval-task-id`: Required. ID of the eval task
- `--dataset-id`: Required. ID of the dataset to evaluate
- `--scoring-functions`: Required. One or more scoring functions to use for evaluation
- `--provider-id`: Optional. Provider ID for the eval task
- `--provider-eval-task-id`: Optional. Provider's eval task ID
- `--metadata`: Optional. Metadata for the eval task in JSON format
Required arguments:
- `--eval-task-id`: ID of the eval task
- `--dataset-id`: ID of the dataset to evaluate
- `--scoring-functions`: One or more scoring functions to use for evaluation
Optional arguments:
- `--provider-id`: Provider ID for the eval task
- `--provider-eval-task-id`: Provider's eval task ID
- `--metadata`: Metadata for the eval task in JSON format
## Eval execution
### `llama-stack-client eval run-benchmark`
@ -189,11 +208,13 @@ Options:
llama-stack-client eval run-benchmark <eval-task-id1> [<eval-task-id2> ...] --eval-task-config <config-file> --output-dir <output-dir> [--num-examples <num>] [--visualize]
```
Options:
- `--eval-task-config`: Required. Path to the eval task config file in JSON format
- `--output-dir`: Required. Path to the directory where evaluation results will be saved
- `--num-examples`: Optional. Number of examples to evaluate (useful for debugging)
- `--visualize`: Optional flag. If set, visualizes evaluation results after completion
Required arguments:
- `--eval-task-config`: Path to the eval task config file in JSON format
- `--output-dir`: Path to the directory where evaluation results will be saved
Optional arguments:
- `--num-examples`: Number of examples to evaluate (useful for debugging)
- `--visualize`: If set, visualizes evaluation results after completion
Example benchmark_config.json:
```json
@ -214,11 +235,13 @@ Example benchmark_config.json:
llama-stack-client eval run-scoring <eval-task-id> --eval-task-config <config-file> --output-dir <output-dir> [--num-examples <num>] [--visualize]
```
Options:
- `--eval-task-config`: Required. Path to the eval task config file in JSON format
- `--output-dir`: Required. Path to the directory where scoring results will be saved
- `--num-examples`: Optional. Number of examples to evaluate (useful for debugging)
- `--visualize`: Optional flag. If set, visualizes scoring results after completion
Required arguments:
- `--eval-task-config`: Path to the eval task config file in JSON format
- `--output-dir`: Path to the directory where scoring results will be saved
Optional arguments:
- `--num-examples`: Number of examples to evaluate (useful for debugging)
- `--visualize`: If set, visualizes scoring results after completion
## Tool Group Management
@ -230,11 +253,11 @@ llama-stack-client toolgroups list
+---------------------------+------------------+------+---------------+
| identifier | provider_id | args | mcp_endpoint |
+===========================+==================+======+===============+
| builtin::code_interpreter | code-interpreter | None | None |
| builtin::code_interpreter | code-interpreter | None | None |
+---------------------------+------------------+------+---------------+
| builtin::rag | rag-runtime | None | None |
| builtin::rag | rag-runtime | None | None |
+---------------------------+------------------+------+---------------+
| builtin::websearch | tavily-search | None | None |
| builtin::websearch | tavily-search | None | None |
+---------------------------+------------------+------+---------------+
```
@ -250,11 +273,11 @@ Shows detailed information about a specific toolgroup. If the toolgroup is not f
llama-stack-client toolgroups register <toolgroup_id> [--provider-id <provider-id>] [--provider-toolgroup-id <provider-toolgroup-id>] [--mcp-config <mcp-config>] [--args <args>]
```
Options:
- `--provider-id`: Optional. Provider ID for the toolgroup
- `--provider-toolgroup-id`: Optional. Provider's toolgroup ID
- `--mcp-config`: Optional. JSON configuration for the MCP endpoint
- `--args`: Optional. JSON arguments for the toolgroup
Optional arguments:
- `--provider-id`: Provider ID for the toolgroup
- `--provider-toolgroup-id`: Provider's toolgroup ID
- `--mcp-config`: JSON configuration for the MCP endpoint
- `--args`: JSON arguments for the toolgroup
### `llama-stack-client toolgroups unregister`
```bash