mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-31 11:33:55 +00:00
Merge-related changes.
This commit is contained in:
commit
a714bbac9d
95 changed files with 11044 additions and 4639 deletions
|
|
@ -1267,7 +1267,6 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"# NBVAL_SKIP\n",
|
||||
"from pydantic import BaseModel\n",
|
||||
"\n",
|
||||
"\n",
|
||||
|
|
@ -1279,7 +1278,7 @@
|
|||
"\n",
|
||||
"user_input = \"Michael Jordan was born in 1963. He played basketball for the Chicago Bulls. He retired in 2003. Extract this information into JSON for me. \"\n",
|
||||
"response = client.inference.completion(\n",
|
||||
" model_id=model_id,\n",
|
||||
" model_id=\"meta-llama/Llama-3.1-8B-Instruct\",\n",
|
||||
" content=user_input,\n",
|
||||
" stream=False,\n",
|
||||
" sampling_params={\n",
|
||||
|
|
@ -1640,7 +1639,7 @@
|
|||
"agent = Agent(\n",
|
||||
" client, \n",
|
||||
" model=model_id,\n",
|
||||
" instructions=\"You are a helpful assistant\",\n",
|
||||
" instructions=\"You are a helpful assistant. Use websearch tool to help answer questions.\",\n",
|
||||
" tools=[\"builtin::websearch\"],\n",
|
||||
")\n",
|
||||
"user_prompts = [\n",
|
||||
|
|
|
|||
1427
docs/notebooks/Llama_Stack_RAG_Lifecycle.ipynb
Normal file
1427
docs/notebooks/Llama_Stack_RAG_Lifecycle.ipynb
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -23,9 +23,12 @@ In this example, we will show you how to:
|
|||
|
||||
##### Building a Search Agent
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
from llama_stack_client.lib.agents.agent import Agent
|
||||
from llama_stack_client.lib.agents.event_logger import EventLogger
|
||||
|
||||
client = LlamaStackClient(base_url=f"http://{HOST}:{PORT}")
|
||||
|
||||
agent = Agent(
|
||||
client,
|
||||
model="meta-llama/Llama-3.3-70B-Instruct",
|
||||
|
|
@ -33,7 +36,7 @@ agent = Agent(
|
|||
tools=["builtin::websearch"],
|
||||
)
|
||||
user_prompts = [
|
||||
"Which teams played in the NBA western conference finals of 2024. Search the web for the answer.",
|
||||
"Which teams played in the NBA Western Conference Finals of 2024. Search the web for the answer.",
|
||||
"In which episode and season of South Park does Bill Cosby (BSM-471) first appear? Give me the number and title. Search the web for the answer.",
|
||||
"What is the British-American kickboxer Andrew Tate's kickboxing name? Search the web for the answer.",
|
||||
]
|
||||
|
|
|
|||
|
|
@ -127,15 +127,11 @@ MCP tools require:
|
|||
|
||||
## Adding Custom Tools
|
||||
|
||||
When you want to use tools other than the built-in tools, you can implement a python function and decorate it with `@client_tool`.
|
||||
When you want to use tools other than the built-in tools, you just need to implement a python function with a docstring. The content of the docstring will be used to describe the tool and the parameters and passed
|
||||
along to the generative model.
|
||||
|
||||
To define a custom tool, you need to use the `@client_tool` decorator.
|
||||
```python
|
||||
from llama_stack_client.lib.agents.client_tool import client_tool
|
||||
|
||||
|
||||
# Example tool definition
|
||||
@client_tool
|
||||
def my_tool(input: int) -> int:
|
||||
"""
|
||||
Runs my awesome tool.
|
||||
|
|
|
|||
|
|
@ -24,6 +24,56 @@ The Evaluation APIs are associated with a set of Resources as shown in the follo
|
|||
- Associated with `Benchmark` resource.
|
||||
|
||||
|
||||
## Open-benchmark Eval
|
||||
|
||||
### List of open-benchmarks Llama Stack support
|
||||
|
||||
Llama stack pre-registers several popular open-benchmarks to easily evaluate model perfomance via CLI.
|
||||
|
||||
The list of open-benchmarks we currently support:
|
||||
- [MMLU-COT](https://arxiv.org/abs/2009.03300) (Measuring Massive Multitask Language Understanding): Benchmark designed to comprehensively evaluate the breadth and depth of a model's academic and professional understanding
|
||||
- [GPQA-COT](https://arxiv.org/abs/2311.12022) (A Graduate-Level Google-Proof Q&A Benchmark): A challenging benchmark of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry.
|
||||
- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
|
||||
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
|
||||
|
||||
|
||||
You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
|
||||
|
||||
### Run evaluation on open-benchmarks via CLI
|
||||
|
||||
We have built-in functionality to run the supported open-benckmarks using llama-stack-client CLI
|
||||
|
||||
#### Spin up Llama Stack server
|
||||
|
||||
Spin up llama stack server with 'open-benchmark' template
|
||||
```
|
||||
llama stack run llama_stack/templates/open-benchmark/run.yaml
|
||||
|
||||
```
|
||||
|
||||
#### Run eval CLI
|
||||
There are 3 necessary inputs to run a benchmark eval
|
||||
- `list of benchmark_ids`: The list of benchmark ids to run evaluation on
|
||||
- `model-id`: The model id to evaluate on
|
||||
- `utput_dir`: Path to store the evaluate results
|
||||
```
|
||||
llama-stack-client eval run-benchmark <benchmark_id_1> <benchmark_id_2> ... \
|
||||
--model_id <model id to evaluate on> \
|
||||
--output_dir <directory to store the evaluate results> \
|
||||
```
|
||||
|
||||
You can run
|
||||
```
|
||||
llama-stack-client eval run-benchmark help
|
||||
```
|
||||
to see the description of all the flags that eval run-benchmark has
|
||||
|
||||
|
||||
In the output log, you can find the file path that has your evaluation results. Open that file and you can see you aggrgate
|
||||
evaluation results over there.
|
||||
|
||||
|
||||
|
||||
## What's Next?
|
||||
|
||||
- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
|
||||
|
|
|
|||
|
|
@ -51,25 +51,25 @@ The main points to consider are:
|
|||
|
||||
```
|
||||
llama stack build -h
|
||||
|
||||
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates]
|
||||
[--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only]
|
||||
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates] [--image-type {conda,container,venv}] [--image-name IMAGE_NAME] [--print-deps-only] [--run]
|
||||
|
||||
Build a Llama stack container
|
||||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
--config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack/distributions/**/build.yaml.
|
||||
If this argument is not provided, you will be prompted to enter information interactively
|
||||
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates
|
||||
--list-templates Show the available templates for building a Llama Stack distribution
|
||||
--config CONFIG Path to a config file to use for the build. You can find example configs in llama_stack/distributions/**/build.yaml. If this argument is not provided, you will
|
||||
be prompted to enter information interactively (default: None)
|
||||
--template TEMPLATE Name of the example template config to use for build. You may use `llama stack build --list-templates` to check out the available templates (default: None)
|
||||
--list-templates Show the available templates for building a Llama Stack distribution (default: False)
|
||||
--image-type {conda,container,venv}
|
||||
Image Type to use for the build. This can be either conda or container or venv. If not specified, will use the image type from the template config.
|
||||
Image Type to use for the build. This can be either conda or container or venv. If not specified, will use the image type from the template config. (default:
|
||||
conda)
|
||||
--image-name IMAGE_NAME
|
||||
[for image-type=conda] Name of the conda environment to use for the build. If
|
||||
not specified, currently active Conda environment will be used. If no Conda
|
||||
environment is active, you must specify a name.
|
||||
--print-deps-only Print the dependencies for the stack only, without building the stack
|
||||
[for image-type=conda|venv] Name of the conda or virtual environment to use for the build. If not specified, currently active Conda environment will be used if
|
||||
found. (default: None)
|
||||
--print-deps-only Print the dependencies for the stack only, without building the stack (default: False)
|
||||
--run Run the stack after building using the same image type, name, and other applicable arguments (default: False)
|
||||
|
||||
```
|
||||
|
||||
After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
|
||||
|
|
@ -212,8 +212,8 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
|
|||
|
||||
```
|
||||
llama stack run -h
|
||||
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE]
|
||||
[--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
|
||||
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE]
|
||||
[--image-type {conda,container,venv}]
|
||||
config
|
||||
|
||||
Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
|
||||
|
|
@ -223,17 +223,17 @@ positional arguments:
|
|||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321
|
||||
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
|
||||
--image-name IMAGE_NAME
|
||||
Name of the image to run. Defaults to the current conda environment
|
||||
--disable-ipv6 Disable IPv6 support
|
||||
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times.
|
||||
Name of the image to run. Defaults to the current conda environment (default: None)
|
||||
--disable-ipv6 Disable IPv6 support (default: False)
|
||||
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: [])
|
||||
--tls-keyfile TLS_KEYFILE
|
||||
Path to TLS key file for HTTPS
|
||||
Path to TLS key file for HTTPS (default: None)
|
||||
--tls-certfile TLS_CERTFILE
|
||||
Path to TLS certificate file for HTTPS
|
||||
Path to TLS certificate file for HTTPS (default: None)
|
||||
--image-type {conda,container,venv}
|
||||
Image Type used during the build. This can be either conda or container or venv.
|
||||
Image Type used during the build. This can be either conda or container or venv. (default: conda)
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -17,26 +17,4 @@ $ llama-stack-client configure --endpoint https://llamastack-preview.fireworks.a
|
|||
$ llama-stack-client models list
|
||||
```
|
||||
|
||||
You will see outputs:
|
||||
```
|
||||
$ llama-stack-client models list
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| identifier | llama_model | provider_id | metadata |
|
||||
+==============================+==============================+===============+============+
|
||||
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.1-70B-Instruct | Llama3.1-70B-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.1-405B-Instruct | Llama3.1-405B-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-1B-Instruct | Llama3.2-1B-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-3B-Instruct | Llama3.2-3B-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | fireworks0 | {} |
|
||||
+------------------------------+------------------------------+---------------+------------+
|
||||
```
|
||||
|
||||
Checkout the [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python/blob/main/docs/cli_reference.md) repo for more details on how to use the `llama-stack-client` CLI. Checkout [llama-stack-app](https://github.com/meta-llama/llama-stack-apps/tree/main) for examples applications built on top of Llama Stack.
|
||||
|
|
|
|||
|
|
@ -24,7 +24,7 @@ The `llamastack/distribution-ollama` distribution consists of the following prov
|
|||
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::code-interpreter`, `inline::rag-runtime`, `remote::model-context-protocol`, `remote::wolfram-alpha` |
|
||||
| vector_io | `inline::sqlite-vec`, `remote::chromadb`, `remote::pgvector` |
|
||||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
|
||||
You should use this distribution if you have a regular desktop machine without very powerful GPUs. Of course, if you have powerful GPUs, you can still continue using this distribution since Ollama supports GPU acceleration.
|
||||
|
|
@ -131,7 +131,7 @@ llama stack run ./run-with-safety.yaml \
|
|||
### (Optional) Update Model Serving Configuration
|
||||
|
||||
```{note}
|
||||
Please check the [model_entries](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L45) for the supported Ollama models.
|
||||
Please check the [model_entries](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/models.py) for the supported Ollama models.
|
||||
```
|
||||
|
||||
To serve a new model with `ollama`
|
||||
|
|
|
|||
|
|
@ -275,18 +275,25 @@ response = client.scoring.score(
|
|||
The following examples give the quick steps to start running evaluations using the llama-stack-client CLI.
|
||||
|
||||
#### Benchmark Evaluation CLI
|
||||
Usage: There are 2 inputs necessary for running a benchmark eval
|
||||
- `eval-task-id`: the identifier associated with the eval task. Each `Benchmark` is parametrized by
|
||||
- `dataset_id`: the identifier associated with the dataset.
|
||||
- `List[scoring_function_id]`: list of scoring function identifiers.
|
||||
- `eval-task-config`: specifies the configuration of the model / agent to evaluate on.
|
||||
There are 3 necessary input for running a benchmark eval
|
||||
- `list of benchmark_ids`: The list of benchmark ids to run evaluation on
|
||||
- `model-id`: The model id to evaluate on
|
||||
- `utput_dir`: Path to store the evaluate results
|
||||
```
|
||||
llama-stack-client eval run-benchmark <benchmark_id_1> <benchmark_id_2> ... \
|
||||
--model_id <model id to evaluate on> \
|
||||
--output_dir <directory to store the evaluate results> \
|
||||
```
|
||||
|
||||
You can run
|
||||
```
|
||||
llama-stack-client eval run-benchmark help
|
||||
```
|
||||
to see the description of all the flags to run benckmark eval
|
||||
|
||||
|
||||
```
|
||||
llama-stack-client eval run_benchmark <eval-task-id> \
|
||||
--eval-task-config ~/benchmark_config.json \
|
||||
--visualize
|
||||
```
|
||||
In the output log, you can find the path to the file that has your evaluation results. Open that file and you can see you aggrgate
|
||||
evaluation results over there.
|
||||
|
||||
|
||||
#### Application Evaluation CLI
|
||||
|
|
@ -338,3 +345,52 @@ The `BenchmarkConfig` are user specified config to define:
|
|||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Open-benchmark Contributing Guide
|
||||
|
||||
### Create the new dataset for your new benchmark
|
||||
An eval open-benchmark essentially contains 2 parts:
|
||||
- `raw data`: The raw dataset associated with the benchmark. You typically need to search the original paper that introduces the benchmark and find the canonical dataset (usually hosted on huggingface)
|
||||
- `prompt template`: How to ask the candidate model to generate the answer (prompt template plays a critical role to the evaluation results). Tyically, you can find the reference prompt template associated with the benchmark in benchmarks author's repo ([exmaple](https://github.com/idavidrein/gpqa/blob/main/prompts/chain_of_thought.txt)) or some other popular open source repos ([example](https://github.com/openai/simple-evals/blob/0a6e8f62e52bc5ae915f752466be3af596caf392/common.py#L14))
|
||||
|
||||
To create new open-benmark in llama stack, you need to combine the prompt template and the raw data into the `chat_completion_input` column in the evaluation dataset.
|
||||
|
||||
Llama stack enforeces the evaluate dataset schema to contain at least 3 columns:
|
||||
- `chat_completion_input`: The actual input to the model to run the generation for eval
|
||||
- `input_query`: The raw input from the raw dataset without the prompt template
|
||||
- `expected_answer`: The ground truth for scoring functions to calcalate the score from.
|
||||
|
||||
|
||||
You need to write a script [example convert script](https://gist.github.com/yanxi0830/118e9c560227d27132a7fd10e2c92840) to convert the benchmark raw dataset to llama stack format eval dataset and update the dataset to huggingface [example benchmark dataset](https://huggingface.co/datasets/llamastack/mmmu)
|
||||
|
||||
|
||||
### Find scoring function for your new benchmark
|
||||
The purpose of scoring function is to calculate the score for each example based on candidate model generation result and expected_answer. It also aggregates the scores from all the examples and generate the final evaluate results.
|
||||
|
||||
|
||||
Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
|
||||
|
||||
### Add new benchmark into template
|
||||
Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/open-benchmark/run.yaml)
|
||||
|
||||
Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
|
||||
- `benchmark_id`: identifier of the benchmark
|
||||
- `dataset_id`: identifier of the dataset associated with your benchmark
|
||||
- `scoring_functions`: scoring function to calculate the score based on generation results and expected_answer
|
||||
|
||||
|
||||
### Test the new benchmark
|
||||
|
||||
Spin up llama stack server with 'open-benchmark' templates
|
||||
```
|
||||
llama stack run llama_stack/templates/open-benchmark/run.yaml
|
||||
|
||||
```
|
||||
|
||||
Run eval benchmark CLI with your new benchmark id
|
||||
```
|
||||
llama-stack-client eval run-benchmark <new_benchmark_id> \
|
||||
--model_id <model id to evaluate on> \
|
||||
--output_dir <directory to store the evaluate results> \
|
||||
```
|
||||
|
|
|
|||
|
|
@ -40,7 +40,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
|
|||
ollama run llama3.2:3b-instruct-fp16 --keepalive -1m
|
||||
```
|
||||
**Note**:
|
||||
- The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43)
|
||||
- The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/models.py)
|
||||
- `keepalive -1m` is used so that ollama continues to keep the model in memory indefinitely. Otherwise, ollama frees up memory and you would have to run `ollama run` again.
|
||||
|
||||
---
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue