docs: concepts and building_applications migration (#3534)

# What does this PR do? - Migrates the remaining documentation sections to the new documentation format    ## Test Plan - Partial migration
2025-12-04 18:13:44 +00:00 · 2025-09-24 14:05:30 -07:00 · 2025-09-24 14:05:30 -07:00 · c71ce8df61
commit c71ce8df61
parent 05ff4c4420
82 changed files with 2535 additions and 1237 deletions
--- a/docs/docs/references/evals_reference/index.mdx
+++ b/docs/docs/references/evals_reference/index.mdx
@ -0,0 +1,377 @@
+# Evaluations
+
+The Llama Stack Evaluation flow allows you to run evaluations on your GenAI application datasets or pre-registered benchmarks.
+
+We introduce a set of APIs in Llama Stack for supporting running evaluations of LLM applications.
+- `/datasetio` + `/datasets` API
+- `/scoring` + `/scoring_functions` API
+- `/eval` + `/benchmarks` API
+
+This guide goes over the sets of APIs and developer experience flow of using Llama Stack to run evaluations for different use cases. Checkout our Colab notebook on working examples with evaluations [here](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing).
+
+## Evaluation Concepts
+
+The Evaluation APIs are associated with a set of Resources as shown in the following diagram. Please visit the Resources section in our [Core Concepts](../concepts/) guide for better high-level understanding.
+
+![Eval Concepts](/img/eval-concept.png)
+
+- **DatasetIO**: defines interface with datasets and data loaders.
+  - Associated with `Dataset` resource.
+- **Scoring**: evaluate outputs of the system.
+  - Associated with `ScoringFunction` resource. We provide a suite of out-of-the box scoring functions and also the ability for you to add custom evaluators. These scoring functions are the core part of defining an evaluation task to output evaluation metrics.
+- **Eval**: generate outputs (via Inference or Agents) and perform scoring.
+  - Associated with `Benchmark` resource.
+
+## Evaluation Examples Walkthrough
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb)
+
+It is best to open this notebook in Colab to follow along with the examples.
+
+### 1. Open Benchmark Model Evaluation
+
+This first example walks you through how to evaluate a model candidate served by Llama Stack on open benchmarks. We will use the following benchmark:
+- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
+- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
+
+#### 1.1 Running MMMU
+- We will use a pre-processed MMMU dataset from [llamastack/mmmu](https://huggingface.co/datasets/llamastack/mmmu). The preprocessing code is shown in this [GitHub Gist](https://gist.github.com/yanxi0830/118e9c560227d27132a7fd10e2c92840). The dataset is obtained by transforming the original [MMMU/MMMU](https://huggingface.co/datasets/MMMU/MMMU) dataset into correct format by `inference/chat-completion` API.
+
+```python
+import datasets
+
+ds = datasets.load_dataset(path="llamastack/mmmu", name="Agriculture", split="dev")
+ds = ds.select_columns(["chat_completion_input", "input_query", "expected_answer"])
+eval_rows = ds.to_pandas().to_dict(orient="records")
+```
+
+- Next, we will run evaluation on an model candidate, we will need to:
+  - Define a system prompt
+  - Define an EvalCandidate
+  - Run evaluate on the dataset
+
+```python
+from rich.pretty import pprint
+from tqdm import tqdm
+
+SYSTEM_PROMPT_TEMPLATE = """
+You are an expert in {subject} whose job is to answer questions from the user using images.
+
+First, reason about the correct answer.
+
+Then write the answer in the following format where X is exactly one of A,B,C,D:
+
+Answer: X
+
+Make sure X is one of A,B,C,D.
+
+If you are uncertain of the correct answer, guess the most likely one.
+"""
+
+system_message = {
+    "role": "system",
+    "content": SYSTEM_PROMPT_TEMPLATE.format(subject=subset),
+}
+
+# register the evaluation benchmark task with the dataset and scoring function
+client.benchmarks.register(
+    benchmark_id="meta-reference::mmmu",
+    dataset_id=f"mmmu-{subset}-{split}",
+    scoring_functions=["basic::regex_parser_multiple_choice_answer"],
+)
+
+response = client.eval.evaluate_rows(
+    benchmark_id="meta-reference::mmmu",
+    input_rows=eval_rows,
+    scoring_functions=["basic::regex_parser_multiple_choice_answer"],
+    benchmark_config={
+        "eval_candidate": {
+            "type": "model",
+            "model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
+            "sampling_params": {
+                "strategy": {
+                    "type": "top_p",
+                    "temperature": 1.0,
+                    "top_p": 0.95,
+                },
+                "max_tokens": 4096,
+                "repeat_penalty": 1.0,
+            },
+            "system_message": system_message,
+        },
+    },
+)
+pprint(response)
+```
+
+#### 1.2. Running SimpleQA
+- We will use a pre-processed SimpleQA dataset from [llamastack/evals](https://huggingface.co/datasets/llamastack/evals/viewer/evals__simpleqa) which is obtained by transforming the input query into correct format accepted by `inference/chat-completion` API.
+- Since we will be using this same dataset in our next example for Agentic evaluation, we will register it using the `/datasets` API, and interact with it through `/datasetio` API.
+
+```python
+simpleqa_dataset_id = "huggingface::simpleqa"
+
+_ = client.datasets.register(
+    purpose="eval/messages-answer",
+    source={
+        "type": "uri",
+        "uri": "huggingface://datasets/llamastack/simpleqa?split=train",
+    },
+    dataset_id=simpleqa_dataset_id,
+)
+
+eval_rows = client.datasets.iterrows(
+    dataset_id=simpleqa_dataset_id,
+    limit=5,
+)
+```
+
+```python
+client.benchmarks.register(
+    benchmark_id="meta-reference::simpleqa",
+    dataset_id=simpleqa_dataset_id,
+    scoring_functions=["llm-as-judge::405b-simpleqa"],
+)
+
+response = client.eval.evaluate_rows(
+    benchmark_id="meta-reference::simpleqa",
+    input_rows=eval_rows.data,
+    scoring_functions=["llm-as-judge::405b-simpleqa"],
+    benchmark_config={
+        "eval_candidate": {
+            "type": "model",
+            "model": "meta-llama/Llama-3.2-90B-Vision-Instruct",
+            "sampling_params": {
+                "strategy": {
+                    "type": "greedy",
+                },
+                "max_tokens": 4096,
+                "repeat_penalty": 1.0,
+            },
+        },
+    },
+)
+pprint(response)
+```
+
+### 2. Agentic Evaluation
+- In this example, we will demonstrate how to evaluate a agent candidate served by Llama Stack via `/agent` API.
+- We will continue to use the SimpleQA dataset we used in previous example.
+- Instead of running evaluation on model, we will run the evaluation on a Search Agent with access to search tool. We will define our agent evaluation candidate through `AgentConfig`.
+
+```python
+agent_config = {
+    "model": "meta-llama/Llama-3.3-70B-Instruct",
+    "instructions": "You are a helpful assistant that have access to tool to search the web. ",
+    "sampling_params": {
+        "strategy": {
+            "type": "top_p",
+            "temperature": 0.5,
+            "top_p": 0.9,
+        }
+    },
+    "toolgroups": [
+        "builtin::websearch",
+    ],
+    "tool_choice": "auto",
+    "tool_prompt_format": "json",
+    "input_shields": [],
+    "output_shields": [],
+    "enable_session_persistence": False,
+}
+
+response = client.eval.evaluate_rows(
+    benchmark_id="meta-reference::simpleqa",
+    input_rows=eval_rows.data,
+    scoring_functions=["llm-as-judge::405b-simpleqa"],
+    benchmark_config={
+        "eval_candidate": {
+            "type": "agent",
+            "config": agent_config,
+        },
+    },
+)
+pprint(response)
+```
+
+### 3. Agentic Application Dataset Scoring
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb)
+
+Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
+
+In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../building_applications/playground) for an interactive interface to upload datasets and run scorings.
+
+```python
+judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
+
+JUDGE_PROMPT = """
+Given a QUESTION and GENERATED_RESPONSE and EXPECTED_RESPONSE.
+
+Compare the factual content of the GENERATED_RESPONSE with the EXPECTED_RESPONSE. Ignore any differences in style, grammar, or punctuation.
+  The GENERATED_RESPONSE may either be a subset or superset of the EXPECTED_RESPONSE, or it may conflict with it. Determine which case applies. Answer the question by selecting one of the following options:
+  (A) The GENERATED_RESPONSE is a subset of the EXPECTED_RESPONSE and is fully consistent with it.
+  (B) The GENERATED_RESPONSE is a superset of the EXPECTED_RESPONSE and is fully consistent with it.
+  (C) The GENERATED_RESPONSE contains all the same details as the EXPECTED_RESPONSE.
+  (D) There is a disagreement between the GENERATED_RESPONSE and the EXPECTED_RESPONSE.
+  (E) The answers differ, but these differences don't matter from the perspective of factuality.
+
+Give your answer in the format "Answer: One of ABCDE, Explanation: ".
+
+Your actual task:
+
+QUESTION: {input_query}
+GENERATED_RESPONSE: {generated_answer}
+EXPECTED_RESPONSE: {expected_answer}
+"""
+
+input_query = (
+    "What are the top 5 topics that were explained? Only list succinct bullet points."
+)
+generated_answer = """
+Here are the top 5 topics that were explained in the documentation for Torchtune:
+
+* What is LoRA and how does it work?
+* Fine-tuning with LoRA: memory savings and parameter-efficient finetuning
+* Running a LoRA finetune with Torchtune: overview and recipe
+* Experimenting with different LoRA configurations: rank, alpha, and attention modules
+* LoRA finetuning
+"""
+expected_answer = """LoRA"""
+
+dataset_rows = [
+    {
+        "input_query": input_query,
+        "generated_answer": generated_answer,
+        "expected_answer": expected_answer,
+    },
+]
+
+scoring_params = {
+    "llm-as-judge::base": {
+        "judge_model": judge_model_id,
+        "prompt_template": JUDGE_PROMPT,
+        "type": "llm_as_judge",
+        "judge_score_regexes": ["Answer: (A|B|C|D|E)"],
+    },
+    "basic::subset_of": None,
+    "braintrust::factuality": None,
+}
+
+response = client.scoring.score(
+    input_rows=dataset_rows, scoring_functions=scoring_params
+)
+```
+
+## Running Evaluations via CLI
+The following examples give the quick steps to start running evaluations using the llama-stack-client CLI.
+
+### Benchmark Evaluation CLI
+There are 3 necessary input for running a benchmark eval
+- `list of benchmark_ids`: The list of benchmark ids to run evaluation on
+- `model-id`: The model id to evaluate on
+- `output_dir`: Path to store the evaluate results
+
+```bash
+llama-stack-client eval run-benchmark <benchmark_id_1> <benchmark_id_2> ... \
+--model_id <model id to evaluate on> \
+--output_dir <directory to store the evaluate results> \
+```
+
+You can run
+```bash
+llama-stack-client eval run-benchmark help
+```
+to see the description of all the flags to run benchmark eval
+
+In the output log, you can find the path to the file that has your evaluation results. Open that file and you can see your aggregate evaluation results over there.
+
+### Application Evaluation CLI
+Usage: For running application evals, you will already have available datasets in hand from your application. You will need to specify:
+- `scoring-fn-id`: List of ScoringFunction identifiers you wish to use to run on your application.
+- `Dataset` used for evaluation:
+  - (1) `--dataset-path`: path to local file system containing datasets to run evaluation on
+  - (2) `--dataset-id`: pre-registered dataset in Llama Stack
+- (Optional) `--scoring-params-config`: optionally parameterize scoring functions with custom params (e.g. `judge_prompt`, `judge_model`, `parsing_regexes`).
+
+```bash
+llama-stack-client eval run_scoring <scoring_fn_id_1> <scoring_fn_id_2> ... <scoring_fn_id_n>
+--dataset-path <path-to-local-dataset> \
+--output-dir ./
+```
+
+### Defining BenchmarkConfig
+The `BenchmarkConfig` are user specified config to define:
+1. `EvalCandidate` to run generation on:
+   - `ModelCandidate`: The model will be used for generation through LlamaStack /inference API.
+   - `AgentCandidate`: The agentic system specified by AgentConfig will be used for generation through LlamaStack  /agents API.
+2. Optionally scoring function params to allow customization of scoring function behaviour. This is useful to parameterize generic scoring functions such as LLMAsJudge with custom `judge_model` / `judge_prompt`.
+
+**Example BenchmarkConfig**
+```json
+{
+    "eval_candidate": {
+        "type": "model",
+        "model": "Llama3.1-405B-Instruct",
+        "sampling_params": {
+            "strategy": {
+                "type": "greedy",
+            },
+            "max_tokens": 0,
+            "repetition_penalty": 1.0
+        }
+    },
+    "scoring_params": {
+        "llm-as-judge::llm_as_judge_base": {
+            "type": "llm_as_judge",
+            "judge_model": "meta-llama/Llama-3.1-8B-Instruct",
+            "prompt_template": "Your job is to look at a question, a gold target ........",
+            "judge_score_regexes": [
+                "(A|B|C)"
+            ]
+        }
+    }
+}
+```
+
+## Open-benchmark Contributing Guide
+
+### Create the new dataset for your new benchmark
+An eval open-benchmark essentially contains 2 parts:
+- `raw data`: The raw dataset associated with the benchmark. You typically need to search the original paper that introduces the benchmark and find the canonical dataset (usually hosted on huggingface)
+- `prompt template`: How to ask the candidate model to generate the answer (prompt template plays a critical role to the evaluation results). Typically, you can find the reference prompt template associated with the benchmark in benchmarks author's repo ([example](https://github.com/idavidrein/gpqa/blob/main/prompts/chain_of_thought.txt)) or some other popular open source repos ([example](https://github.com/openai/simple-evals/blob/0a6e8f62e52bc5ae915f752466be3af596caf392/common.py#L14))
+
+To create new open-benchmark in llama stack, you need to combine the prompt template and the raw data into the `chat_completion_input` column in the evaluation dataset.
+
+Llama stack enforces the evaluate dataset schema to contain at least 3 columns:
+- `chat_completion_input`: The actual input to the model to run the generation for eval
+- `input_query`: The raw input from the raw dataset without the prompt template
+- `expected_answer`: The ground truth for scoring functions to calculate the score from.
+
+You need to write a script [example convert script](https://gist.github.com/yanxi0830/118e9c560227d27132a7fd10e2c92840) to convert the benchmark raw dataset to llama stack format eval dataset and update the dataset to huggingface [example benchmark dataset](https://huggingface.co/datasets/llamastack/mmmu)
+
+### Find scoring function for your new benchmark
+The purpose of scoring function is to calculate the score for each example based on candidate model generation result and expected_answer. It also aggregates the scores from all the examples and generate the final evaluate results.
+
+Firstly, you can see if the existing [llama stack scoring functions](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/scoring) can fulfill your need. If not, you need to write a new scoring function based on what benchmark author / other open source repo describe.
+
+### Add new benchmark into template
+Firstly, you need to add the evaluation dataset associated with your benchmark under `datasets` resource in the [open-benchmark](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/open-benchmark/run.yaml)
+
+Secondly, you need to add the new benchmark you just created under the `benchmarks` resource in the same template. To add the new benchmark, you need to have
+- `benchmark_id`: identifier of the benchmark
+- `dataset_id`: identifier of the dataset associated with your benchmark
+- `scoring_functions`: scoring function to calculate the score based on generation results and expected_answer
+
+### Test the new benchmark
+
+Spin up llama stack server with 'open-benchmark' templates
+```bash
+llama stack run llama_stack/distributions/open-benchmark/run.yaml
+```
+
+Run eval benchmark CLI with your new benchmark id
+```bash
+llama-stack-client eval run-benchmark <new_benchmark_id> \
+--model_id <model id to evaluate on> \
+--output_dir <directory to store the evaluate results> \
+```
--- a/docs/docs/references/evals_reference/resources/eval-concept.png
+++ b/docs/docs/references/evals_reference/resources/eval-concept.png
--- a/docs/docs/references/evals_reference/resources/eval-flow.png
+++ b/docs/docs/references/evals_reference/resources/eval-flow.png
--- a/docs/docs/references/index.mdx
+++ b/docs/docs/references/index.mdx
@ -0,0 +1,12 @@
+---
+title: References
+description: Reference documentation for Llama Stack
+sidebar_label: Overview
+sidebar_position: 1
+---
+
+# References
+
+- [Python SDK Reference](python_sdk_reference/index)
+- [Llama CLI](llama_cli_reference/index) for building and running your Llama Stack server
+- [Llama Stack Client CLI](llama_stack_client_cli_reference) for interacting with your Llama Stack server
--- a/docs/docs/references/llama_cli_reference/download_models.md
+++ b/docs/docs/references/llama_cli_reference/download_models.md
@ -0,0 +1,165 @@
+# Downloading Models
+
+The `llama` CLI tool helps you setup and use the Llama Stack. It should be available on your path after installing the `llama-stack` package.
+
+## Installation
+
+You have two ways to install Llama Stack:
+
+1. **Install as a package**:
+   You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
+   ```bash
+   pip install llama-stack
+   ```
+
+2. **Install from source**:
+   If you prefer to install from the source code, follow these steps:
+   ```bash
+    mkdir -p ~/local
+    cd ~/local
+    git clone git@github.com:meta-llama/llama-stack.git
+
+    uv venv myenv --python 3.12
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate
+
+    cd llama-stack
+    pip install -e .
+
+## Downloading models via CLI
+
+You first need to have models downloaded locally.
+
+To download any model you need the **Model Descriptor**.
+This can be obtained by running the command
+```
+llama model list
+```
+
+You should see a table like this:
+
+```
+----------------------------------+------------------------------------------+----------------+
+| Model Descriptor(ID)             | Hugging Face Repo                        | Context Length |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B                      | meta-llama/Llama-3.1-8B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B                     | meta-llama/Llama-3.1-70B                 | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp8           | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B                    | meta-llama/Llama-3.1-405B-FP8            | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp16          | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B-Instruct             | meta-llama/Llama-3.1-8B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B-Instruct            | meta-llama/Llama-3.1-70B-Instruct        | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp8  | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct           | meta-llama/Llama-3.1-405B-Instruct-FP8   | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B                      | meta-llama/Llama-3.2-1B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B                      | meta-llama/Llama-3.2-3B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision              | meta-llama/Llama-3.2-11B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision              | meta-llama/Llama-3.2-90B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B-Instruct             | meta-llama/Llama-3.2-1B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B-Instruct             | meta-llama/Llama-3.2-3B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision-Instruct     | meta-llama/Llama-3.2-11B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision-Instruct     | meta-llama/Llama-3.2-90B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-11B-Vision         | meta-llama/Llama-Guard-3-11B-Vision      | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B:int4-mp1        | meta-llama/Llama-Guard-3-1B-INT4         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B                 | meta-llama/Llama-Guard-3-1B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B                 | meta-llama/Llama-Guard-3-8B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B:int8-mp1        | meta-llama/Llama-Guard-3-8B-INT8         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Prompt-Guard-86M                 | meta-llama/Prompt-Guard-86M              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-2-8B                 | meta-llama/Llama-Guard-2-8B              | 4K             |
+----------------------------------+------------------------------------------+----------------+
+```
+
+To download models, you can use the llama download command.
+
+#### Downloading from [Meta](https://llama.meta.com/llama-downloads/)
+
+Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/). Note: You need to quote the META_URL
+
+Download the required checkpoints using the following commands:
+```bash
+# download the 8B model, this can be run on a single GPU
+llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url 'META_URL'
+
+# you can also get the 70B model, this will require 8 GPUs however
+llama download --source meta --model-id Llama3.2-11B-Vision-Instruct --meta-url 'META_URL'
+
+# llama-agents have safety enabled by default. For this, you will need
+# safety models -- Llama-Guard and Prompt-Guard
+llama download --source meta --model-id Prompt-Guard-86M --meta-url 'META_URL'
+llama download --source meta --model-id Llama-Guard-3-1B --meta-url 'META_URL'
+```
+
+#### Downloading from [Hugging Face](https://huggingface.co/meta-llama)
+
+Essentially, the same commands above work, just replace `--source meta` with `--source huggingface`.
+
+```bash
+llama download --source huggingface --model-id  Llama3.1-8B-Instruct --hf-token <HF_TOKEN>
+
+llama download --source huggingface --model-id Llama3.1-70B-Instruct --hf-token <HF_TOKEN>
+
+llama download --source huggingface --model-id Llama-Guard-3-1B --ignore-patterns *original*
+llama download --source huggingface --model-id Prompt-Guard-86M --ignore-patterns *original*
+```
+
+**Important:** Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command to validate your access. You can find your token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
+
+```{tip}
+Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored.
+```
+
+## List the downloaded models
+
+To list the downloaded models with the following command:
+```
+llama model list --downloaded
+```
+
+You should see a table like this:
+```
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
+┃ Model                                   ┃ Size     ┃ Modified Time       ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
+│ Llama3.2-1B-Instruct:int4-qlora-eo8     │ 1.53 GB  │ 2025-02-26 11:22:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B                             │ 2.31 GB  │ 2025-02-18 21:48:52 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Prompt-Guard-86M                        │ 0.02 GB  │ 2025-02-26 11:29:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB  │ 2025-02-26 11:37:41 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B                             │ 5.99 GB  │ 2025-02-18 21:51:26 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.1-8B                             │ 14.97 GB │ 2025-02-16 10:36:37 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB  │ 2025-02-26 11:35:02 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B                        │ 2.80 GB  │ 2025-02-26 11:20:46 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B:int4                   │ 0.43 GB  │ 2025-02-26 11:33:33 │
+└─────────────────────────────────────────┴──────────┴─────────────────────┘
+```
--- a/docs/docs/references/llama_cli_reference/index.md
+++ b/docs/docs/references/llama_cli_reference/index.md
@ -0,0 +1,276 @@
+# llama (server-side) CLI Reference
+
+The `llama` CLI tool helps you set up and use the Llama Stack. The CLI is available on your path after installing the `llama-stack` package.
+
+## Installation
+
+You have two ways to install Llama Stack:
+
+1. **Install as a package**:
+   You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
+   ```bash
+   pip install llama-stack
+   ```
+
+2. **Install from source**:
+   If you prefer to install from the source code, follow these steps:
+   ```bash
+    mkdir -p ~/local
+    cd ~/local
+    git clone git@github.com:meta-llama/llama-stack.git
+
+    uv venv myenv --python 3.12
+    source myenv/bin/activate  # On Windows: myenv\Scripts\activate
+
+    cd llama-stack
+    pip install -e .
+
+
+## `llama` subcommands
+1. `download`: Supports downloading models from Meta or Hugging Face.  [Downloading models](#downloading-models)
+2. `model`: Lists available models and their properties. [Understanding models](#understand-the-models)
+3. `stack`: Allows you to build a stack using the `llama stack` distribution and run a Llama Stack server. You can read more about how to build a Llama Stack distribution in the [Build your own Distribution](../../distributions/building_distro) documentation.
+
+### Sample Usage
+
+```
+llama --help
+```
+
+```
+usage: llama [-h] {download,model,stack} ...
+
+Welcome to the Llama CLI
+
+options:
+  -h, --help            show this help message and exit
+
+subcommands:
+  {download,model,stack}
+```
+
+## Downloading models
+
+You first need to have models downloaded locally.
+
+To download any model you need the **Model Descriptor**.
+This can be obtained by running the command
+```
+llama model list
+```
+
+You should see a table like this:
+
+```
+----------------------------------+------------------------------------------+----------------+
+| Model Descriptor(ID)             | Hugging Face Repo                        | Context Length |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B                      | meta-llama/Llama-3.1-8B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B                     | meta-llama/Llama-3.1-70B                 | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp8           | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B                    | meta-llama/Llama-3.1-405B-FP8            | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B:bf16-mp16          | meta-llama/Llama-3.1-405B                | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-8B-Instruct             | meta-llama/Llama-3.1-8B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-70B-Instruct            | meta-llama/Llama-3.1-70B-Instruct        | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp8  | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct           | meta-llama/Llama-3.1-405B-Instruct-FP8   | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.1-405B-Instruct:bf16-mp16 | meta-llama/Llama-3.1-405B-Instruct       | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B                      | meta-llama/Llama-3.2-1B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B                      | meta-llama/Llama-3.2-3B                  | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision              | meta-llama/Llama-3.2-11B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision              | meta-llama/Llama-3.2-90B-Vision          | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-1B-Instruct             | meta-llama/Llama-3.2-1B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-3B-Instruct             | meta-llama/Llama-3.2-3B-Instruct         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-11B-Vision-Instruct     | meta-llama/Llama-3.2-11B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama3.2-90B-Vision-Instruct     | meta-llama/Llama-3.2-90B-Vision-Instruct | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-11B-Vision         | meta-llama/Llama-Guard-3-11B-Vision      | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B:int4-mp1        | meta-llama/Llama-Guard-3-1B-INT4         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-1B                 | meta-llama/Llama-Guard-3-1B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B                 | meta-llama/Llama-Guard-3-8B              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-3-8B:int8-mp1        | meta-llama/Llama-Guard-3-8B-INT8         | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Prompt-Guard-86M                 | meta-llama/Prompt-Guard-86M              | 128K           |
+----------------------------------+------------------------------------------+----------------+
+| Llama-Guard-2-8B                 | meta-llama/Llama-Guard-2-8B              | 4K             |
+----------------------------------+------------------------------------------+----------------+
+```
+
+To download models, you can use the `llama download` command.
+
+### Downloading from [Meta](https://llama.meta.com/llama-downloads/)
+
+Here is an example download command to get the 3B-Instruct/11B-Vision-Instruct model. You will need META_URL which can be obtained from [here](https://llama.meta.com/docs/getting_the_models/meta/)
+
+Download the required checkpoints using the following commands:
+```bash
+# download the 8B model, this can be run on a single GPU
+llama download --source meta --model-id Llama3.2-3B-Instruct --meta-url META_URL
+
+# you can also get the 70B model, this will require 8 GPUs however
+llama download --source meta --model-id Llama3.2-11B-Vision-Instruct --meta-url META_URL
+
+# llama-agents have safety enabled by default. For this, you will need
+# safety models -- Llama-Guard and Prompt-Guard
+llama download --source meta --model-id Prompt-Guard-86M --meta-url META_URL
+llama download --source meta --model-id Llama-Guard-3-1B --meta-url META_URL
+```
+
+### Downloading from [Hugging Face](https://huggingface.co/meta-llama)
+
+Essentially, the same commands above work, just replace `--source meta` with `--source huggingface`.
+
+```bash
+llama download --source huggingface --model-id  Llama3.1-8B-Instruct --hf-token <HF_TOKEN>
+
+llama download --source huggingface --model-id Llama3.1-70B-Instruct --hf-token <HF_TOKEN>
+
+llama download --source huggingface --model-id Llama-Guard-3-1B --ignore-patterns *original*
+llama download --source huggingface --model-id Prompt-Guard-86M --ignore-patterns *original*
+```
+
+**Important:** Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command to validate your access. You can find your token at [https://huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
+
+```{tip}
+Default for `llama download` is to run with `--ignore-patterns *.safetensors` since we use the `.pth` files in the `original` folder. For Llama Guard and Prompt Guard, however, we need safetensors. Hence, please run with `--ignore-patterns original` so that safetensors are downloaded and `.pth` files are ignored.
+```
+
+## List the downloaded models
+
+To list the downloaded models with the following command:
+```
+llama model list --downloaded
+```
+
+You should see a table like this:
+```
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
+┃ Model                                   ┃ Size     ┃ Modified Time       ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
+│ Llama3.2-1B-Instruct:int4-qlora-eo8     │ 1.53 GB  │ 2025-02-26 11:22:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B                             │ 2.31 GB  │ 2025-02-18 21:48:52 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Prompt-Guard-86M                        │ 0.02 GB  │ 2025-02-26 11:29:28 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B-Instruct:int4-spinquant-eo8 │ 3.69 GB  │ 2025-02-26 11:37:41 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-3B                             │ 5.99 GB  │ 2025-02-18 21:51:26 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.1-8B                             │ 14.97 GB │ 2025-02-16 10:36:37 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama3.2-1B-Instruct:int4-spinquant-eo8 │ 1.51 GB  │ 2025-02-26 11:35:02 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B                        │ 2.80 GB  │ 2025-02-26 11:20:46 │
+├─────────────────────────────────────────┼──────────┼─────────────────────┤
+│ Llama-Guard-3-1B:int4                   │ 0.43 GB  │ 2025-02-26 11:33:33 │
+└─────────────────────────────────────────┴──────────┴─────────────────────┘
+```
+
+
+## Understand the models
+The `llama model` command helps you explore the model’s interface.
+
+1. `download`: Download the model from different sources. (meta, huggingface)
+2. `list`: Lists all the models available for download with hardware requirements for deploying the models.
+3. `prompt-format`: Show llama model message formats.
+4. `describe`: Describes all the properties of the model.
+
+### Sample Usage
+
+`llama model <subcommand> <options>`
+
+```
+llama model --help
+```
+```
+usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
+
+Work with llama models
+
+options:
+  -h, --help            show this help message and exit
+
+model_subcommands:
+  {download,list,prompt-format,describe,verify-download,remove}
+```
+
+### Describe
+
+You can use the describe command to know more about a model:
+```
+llama model describe -m Llama3.2-3B-Instruct
+```
+```
+-----------------------------+----------------------------------+
+| Model                       | Llama3.2-3B-Instruct             |
+-----------------------------+----------------------------------+
+| Hugging Face ID             | meta-llama/Llama-3.2-3B-Instruct |
+-----------------------------+----------------------------------+
+| Description                 | Llama 3.2 3b instruct model      |
+-----------------------------+----------------------------------+
+| Context Length              | 128K tokens                      |
+-----------------------------+----------------------------------+
+| Weights format              | bf16                             |
+-----------------------------+----------------------------------+
+| Model params.json           | {                                |
+|                             |     "dim": 3072,                 |
+|                             |     "n_layers": 28,              |
+|                             |     "n_heads": 24,               |
+|                             |     "n_kv_heads": 8,             |
+|                             |     "vocab_size": 128256,        |
+|                             |     "ffn_dim_multiplier": 1.0,   |
+|                             |     "multiple_of": 256,          |
+|                             |     "norm_eps": 1e-05,           |
+|                             |     "rope_theta": 500000.0,      |
+|                             |     "use_scaled_rope": true      |
+|                             | }                                |
+-----------------------------+----------------------------------+
+| Recommended sampling params | {                                |
+|                             |     "temperature": 1.0,          |
+|                             |     "top_p": 0.9,                |
+|                             |     "top_k": 0                   |
+|                             | }                                |
+-----------------------------+----------------------------------+
+```
+
+### Prompt Format
+You can even run `llama model prompt-format` see all of the templates and their tokens:
+
+```
+llama model prompt-format -m Llama3.2-3B-Instruct
+```
+![alt text](../../../resources/prompt-format.png)
+
+
+You will be shown a Markdown formatted description of the model interface and how prompts / messages are formatted for various scenarios.
+
+**NOTE**: Outputs in terminal are color printed to show special tokens.
+
+### Remove model
+You can run `llama model remove` to remove an unnecessary model:
+
+```
+llama model remove -m Llama-Guard-3-8B-int8
+```
--- a/docs/docs/references/llama_stack_client_cli_reference.md
+++ b/docs/docs/references/llama_stack_client_cli_reference.md
@ -0,0 +1,589 @@
+# llama (client-side) CLI Reference
+
+The `llama-stack-client` CLI allows you to query information about the distribution.
+
+## Basic Commands
+
+### `llama-stack-client`
+```bash
+llama-stack-client
+Usage: llama-stack-client [OPTIONS] COMMAND [ARGS]...
+
+  Welcome to the llama-stack-client CLI - a command-line interface for
+  interacting with Llama Stack
+
+Options:
+  --version        Show the version and exit.
+  --endpoint TEXT  Llama Stack distribution endpoint
+  --api-key TEXT   Llama Stack distribution API key
+  --config TEXT    Path to config file
+  --help           Show this message and exit.
+
+Commands:
+  configure          Configure Llama Stack Client CLI.
+  datasets           Manage datasets.
+  eval               Run evaluation tasks.
+  eval_tasks         Manage evaluation tasks.
+  inference          Inference (chat).
+  inspect            Inspect server configuration.
+  models             Manage GenAI models.
+  post_training      Post-training.
+  providers          Manage API providers.
+  scoring_functions  Manage scoring functions.
+  shields            Manage safety shield services.
+  toolgroups         Manage available tool groups.
+  vector_dbs         Manage vector databases.
+```
+
+### `llama-stack-client configure`
+Configure Llama Stack Client CLI.
+```bash
+llama-stack-client configure
+> Enter the host name of the Llama Stack distribution server: localhost
+> Enter the port number of the Llama Stack distribution server: 8321
+Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
+```
+
+Optional arguments:
+- `--endpoint`: Llama Stack distribution endpoint
+- `--api-key`: Llama Stack distribution API key
+
+
+
+## `llama-stack-client inspect version`
+Inspect server configuration.
+```bash
+llama-stack-client inspect version
+```
+```bash
+VersionInfo(version='0.2.14')
+```
+
+
+### `llama-stack-client providers list`
+Show available providers on distribution endpoint
+```bash
+llama-stack-client providers list
+```
+```
+-----------+----------------+-----------------+
+| API       | Provider ID    | Provider Type   |
+===========+================+=================+
+| scoring   | meta0          | meta-reference  |
+-----------+----------------+-----------------+
+| datasetio | meta0          | meta-reference  |
+-----------+----------------+-----------------+
+| inference | tgi0           | remote::tgi     |
+-----------+----------------+-----------------+
+| memory    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
+| agents    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
+| telemetry | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
+| safety    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
+```
+
+### `llama-stack-client providers inspect`
+Show specific provider configuration on distribution endpoint
+```bash
+llama-stack-client providers inspect <provider_id>
+```
+
+
+## Inference
+Inference (chat).
+
+
+### `llama-stack-client inference chat-completion`
+Show available inference chat completion endpoints on distribution endpoint
+```bash
+llama-stack-client inference chat-completion --message <message> [--stream] [--session] [--model-id]
+```
+```bash
+OpenAIChatCompletion(
+    id='chatcmpl-aacd11f3-8899-4ec5-ac5b-e655132f6891',
+    choices=[
+        OpenAIChatCompletionChoice(
+            finish_reason='stop',
+            index=0,
+            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
+                role='assistant',
+                content='The captain of the whaleship Pequod in Nathaniel Hawthorne\'s novel "Moby-Dick" is Captain
+Ahab. He\'s a vengeful and obsessive old sailor who\'s determined to hunt down and kill the white sperm whale
+Moby-Dick, whom he\'s lost his leg to in a previous encounter.',
+                name=None,
+                tool_calls=None,
+                refusal=None,
+                annotations=None,
+                audio=None,
+                function_call=None
+            ),
+            logprobs=None
+        )
+    ],
+    created=1752578797,
+    model='llama3.2:3b-instruct-fp16',
+    object='chat.completion',
+    service_tier=None,
+    system_fingerprint='fp_ollama',
+    usage={
+        'completion_tokens': 67,
+        'prompt_tokens': 33,
+        'total_tokens': 100,
+        'completion_tokens_details': None,
+        'prompt_tokens_details': None
+    }
+)
+```
+
+Required arguments:
+**Note:** At least one of these parameters is required for chat completion
+- `--message`: Message
+- `--session`: Start a Chat Session
+
+Optional arguments:
+- `--stream`: Stream
+- `--model-id`: Model ID
+
+## Model Management
+Manage GenAI models.
+
+
+### `llama-stack-client models list`
+Show available llama models at distribution endpoint
+```bash
+llama-stack-client models list
+```
+```
+Available Models
+
+┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
+┃ model_type   ┃ identifier                           ┃ provider_resource_id         ┃ metadata  ┃ provider_id ┃
+┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
+│ llm          │ meta-llama/Llama-3.2-3B-Instruct     │ llama3.2:3b-instruct-fp16    │           │ ollama      │
+└──────────────┴──────────────────────────────────────┴──────────────────────────────┴───────────┴─────────────┘
+
+Total models: 1
+```
+
+### `llama-stack-client models get`
+Show details of a specific model at the distribution endpoint
+```bash
+llama-stack-client models get Llama3.1-8B-Instruct
+```
+
+```
+----------------------+----------------------+----------------------------------------------------------+---------------+
+| identifier           | llama_model          | metadata                                                 | provider_id   |
+======================+======================+==========================================================+===============+
+| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | {'huggingface_repo': 'meta-llama/Llama-3.1-8B-Instruct'} | tgi0          |
+----------------------+----------------------+----------------------------------------------------------+---------------+
+```
+
+
+```bash
+llama-stack-client models get Random-Model
+
+Model RandomModel is not found at distribution endpoint host:port. Please ensure endpoint is serving specified model.
+```
+
+### `llama-stack-client models register`
+Register a new model at distribution endpoint
+```bash
+llama-stack-client models register <model_id> [--provider-id <provider_id>] [--provider-model-id <provider_model_id>] [--metadata <metadata>] [--model-type <model_type>]
+```
+
+Required arguments:
+- `MODEL_ID`: Model ID
+- `--provider-id`: Provider ID for the model
+
+Optional arguments:
+- `--provider-model-id`: Provider's model ID
+- `--metadata`: JSON metadata for the model
+- `--model-type`: Model type: `llm`, `embedding`
+
+
+### `llama-stack-client models unregister`
+Unregister a model from distribution endpoint
+```bash
+llama-stack-client models unregister <model_id>
+```
+
+## Vector DB Management
+Manage vector databases.
+
+
+### `llama-stack-client vector_dbs list`
+Show available vector dbs on distribution endpoint
+```bash
+llama-stack-client vector_dbs list
+```
+```
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ identifier               ┃ provider_id ┃ provider_resource_id     ┃ vector_db_type ┃ params                            ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
+│ my_demo_vector_db        │ faiss       │ my_demo_vector_db        │                │ embedding_dimension: 384          │
+│                          │             │                          │                │ embedding_model: all-MiniLM-L6-v2 │
+│                          │             │                          │                │ type: vector_db                   │
+│                          │             │                          │                │                                   │
+└──────────────────────────┴─────────────┴──────────────────────────┴────────────────┴───────────────────────────────────┘
+```
+
+### `llama-stack-client vector_dbs register`
+Create a new vector db
+```bash
+llama-stack-client vector_dbs register <vector-db-id> [--provider-id <provider-id>] [--provider-vector-db-id <provider-vector-db-id>] [--embedding-model <embedding-model>] [--embedding-dimension <embedding-dimension>]
+```
+
+
+Required arguments:
+- `VECTOR_DB_ID`: Vector DB ID
+
+Optional arguments:
+- `--provider-id`: Provider ID for the vector db
+- `--provider-vector-db-id`: Provider's vector db ID
+- `--embedding-model`: Embedding model to use. Default: `all-MiniLM-L6-v2`
+- `--embedding-dimension`: Dimension of embeddings. Default: 384
+
+### `llama-stack-client vector_dbs unregister`
+Delete a vector db
+```bash
+llama-stack-client vector_dbs unregister <vector-db-id>
+```
+
+
+Required arguments:
+- `VECTOR_DB_ID`: Vector DB ID
+
+
+## Shield Management
+Manage safety shield services.
+### `llama-stack-client shields list`
+Show available safety shields on distribution endpoint
+```bash
+llama-stack-client shields list
+```
+
+```
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
+┃ identifier                       ┃ provider_alias                                                        ┃ params                ┃ provider_id                        ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
+│ ollama                           │ ollama/llama-guard3:1b                                                │                       │ llama-guard                        │
+└──────────────────────────────────┴───────────────────────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────────┘
+```
+
+### `llama-stack-client shields register`
+Register a new safety shield
+```bash
+llama-stack-client shields register --shield-id <shield-id> [--provider-id <provider-id>] [--provider-shield-id <provider-shield-id>] [--params <params>]
+```
+
+Required arguments:
+- `--shield-id`: ID of the shield
+
+Optional arguments:
+- `--provider-id`: Provider ID for the shield
+- `--provider-shield-id`: Provider's shield ID
+- `--params`: JSON configuration parameters for the shield
+
+
+## Eval execution
+Run evaluation tasks.
+
+
+### `llama-stack-client eval run-benchmark`
+Run a evaluation benchmark task
+```bash
+llama-stack-client eval run-benchmark <eval-task-id1> [<eval-task-id2> ...] --eval-task-config <config-file> --output-dir <output-dir> --model-id <model-id> [--num-examples <num>] [--visualize] [--repeat-penalty <repeat-penalty>] [--top-p <top-p>] [--max-tokens <max-tokens>]
+```
+
+Required arguments:
+- `--eval-task-config`: Path to the eval task config file in JSON format
+- `--output-dir`: Path to the directory where evaluation results will be saved
+- `--model-id`: model id to run the benchmark eval on
+
+Optional arguments:
+- `--num-examples`: Number of examples to evaluate (useful for debugging)
+- `--visualize`: If set, visualizes evaluation results after completion
+- `--repeat-penalty`: repeat-penalty in the sampling params to run generation
+- `--top-p`: top-p in the sampling params to run generation
+- `--max-tokens`: max-tokens in the sampling params to run generation
+- `--temperature`: temperature in the sampling params to run generation
+
+Example benchmark_config.json:
+```json
+{
+    "type": "benchmark",
+    "eval_candidate": {
+        "type": "model",
+        "model": "Llama3.1-405B-Instruct",
+        "sampling_params": {
+            "strategy": "greedy",
+        }
+    }
+}
+```
+
+### `llama-stack-client eval run-scoring`
+Run scoring from application datasets
+```bash
+llama-stack-client eval run-scoring <eval-task-id> --output-dir <output-dir> [--num-examples <num>] [--visualize]
+```
+
+Required arguments:
+- `--output-dir`: Path to the directory where scoring results will be saved
+
+Optional arguments:
+- `--num-examples`: Number of examples to evaluate (useful for debugging)
+- `--visualize`: If set, visualizes scoring results after completion
+- `--scoring-params-config`: Path to the scoring params config file in JSON format
+- `--dataset-id`: Pre-registered dataset_id to score (from llama-stack-client datasets list)
+- `--dataset-path`: Path to the dataset file to score
+
+
+## Eval Tasks
+Manage evaluation tasks.
+
+### `llama-stack-client eval_tasks list`
+Show available eval tasks on distribution endpoint
+```bash
+llama-stack-client eval_tasks list
+```
+
+
+### `llama-stack-client eval_tasks register`
+Register a new eval task
+```bash
+llama-stack-client eval_tasks register --eval-task-id <eval-task-id> --dataset-id <dataset-id> --scoring-functions <scoring-functions> [--provider-id <provider-id>] [--provider-eval-task-id <provider-eval-task-id>] [--metadata <metadata>]
+```
+
+
+Required arguments:
+- `--eval-task-id`: ID of the eval task
+- `--dataset-id`: ID of the dataset to evaluate
+- `--scoring-functions`: Scoring functions to use for evaluation
+
+Optional arguments:
+- `--provider-id`: Provider ID for the eval task
+- `--provider-eval-task-id`: Provider's eval task ID
+
+
+## Tool Group Management
+Manage available tool groups.
+
+
+### `llama-stack-client toolgroups list`
+Show available llama toolgroups at distribution endpoint
+```bash
+llama-stack-client toolgroups list
+```
+```
+---------------------------+------------------+------+---------------+
+| identifier                | provider_id      | args | mcp_endpoint  |
+===========================+==================+======+===============+
+| builtin::rag              | rag-runtime      | None | None          |
+---------------------------+------------------+------+---------------+
+| builtin::websearch        | tavily-search    | None | None          |
+---------------------------+------------------+------+---------------+
+```
+
+### `llama-stack-client toolgroups get`
+Get available llama toolgroups by id
+```bash
+llama-stack-client toolgroups get <toolgroup_id>
+```
+
+Shows detailed information about a specific toolgroup. If the toolgroup is not found, displays an error message.
+
+
+Required arguments:
+- `TOOLGROUP_ID`: ID of the tool group
+
+
+### `llama-stack-client toolgroups register`
+Register a new toolgroup at distribution endpoint
+```bash
+llama-stack-client toolgroups register <toolgroup_id> [--provider-id <provider-id>] [--provider-toolgroup-id <provider-toolgroup-id>] [--mcp-config <mcp-config>] [--args <args>]
+```
+
+
+Required arguments:
+- `TOOLGROUP_ID`: ID of the tool group
+
+Optional arguments:
+- `--provider-id`: Provider ID for the toolgroup
+- `--provider-toolgroup-id`: Provider's toolgroup ID
+- `--mcp-config`: JSON configuration for the MCP endpoint
+- `--args`: JSON arguments for the toolgroup
+
+### `llama-stack-client toolgroups unregister`
+Unregister a toolgroup from distribution endpoint
+```bash
+llama-stack-client toolgroups unregister <toolgroup_id>
+```
+
+
+Required arguments:
+- `TOOLGROUP_ID`: ID of the tool group
+
+
+## Datasets Management
+Manage datasets.
+
+
+### `llama-stack-client datasets list`
+Show available datasets on distribution endpoint
+```bash
+llama-stack-client datasets list
+```
+
+
+### `llama-stack-client datasets register`
+```bash
+llama-stack-client datasets register --dataset_id <dataset_id> --purpose <purpose> [--url <url] [--dataset-path <dataset-path>] [--dataset-id <dataset-id>] [--metadata <metadata>]
+```
+
+Required arguments:
+- `--dataset_id`: Id of the dataset
+- `--purpose`: Purpose of the dataset
+
+Optional arguments:
+- `--metadata`: Metadata of the dataset
+- `--url`: URL of the dataset
+- `--dataset-path`: Local file path to the dataset. If specified, upload dataset via URL
+
+
+### `llama-stack-client datasets unregister`
+Remove a dataset
+```bash
+llama-stack-client datasets unregister <dataset-id>
+```
+
+
+Required arguments:
+- `DATASET_ID`: Id of the dataset
+
+
+## Scoring Functions Management
+Manage scoring functions.
+
+### `llama-stack-client scoring_functions list`
+Show available scoring functions on distribution endpoint
+```bash
+llama-stack-client scoring_functions list
+```
+```
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
+┃ identifier                                 ┃ provider_id  ┃ description                                                   ┃ type             ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
+│ basic::docvqa                              │ basic        │ DocVQA Visual Question & Answer scoring function              │ scoring_function │
+│ basic::equality                            │ basic        │ Returns 1.0 if the input is equal to the target, 0.0          │ scoring_function │
+│                                            │              │ otherwise.                                                    │                  │
+└────────────────────────────────────────────┴──────────────┴───────────────────────────────────────────────────────────────┴──────────────────┘
+```
+
+
+### `llama-stack-client scoring_functions register`
+Register a new scoring function
+```bash
+llama-stack-client scoring_functions register --scoring-fn-id <scoring-fn-id> --description <description> --return-type <return-type> [--provider-id <provider-id>] [--provider-scoring-fn-id <provider-scoring-fn-id>] [--params <params>]
+```
+
+
+Required arguments:
+- `--scoring-fn-id`: Id of the scoring function
+- `--description`: Description of the scoring function
+- `--return-type`: Return type of the scoring function
+
+Optional arguments:
+- `--provider-id`: Provider ID for the scoring function
+- `--provider-scoring-fn-id`: Provider's scoring function ID
+- `--params`: Parameters for the scoring function in JSON format
+
+
+## Post Training Management
+Post-training.
+
+### `llama-stack-client post_training list`
+Show the list of available post training jobs
+```bash
+llama-stack-client post_training list
+```
+```bash
+["job-1", "job-2", "job-3"]
+```
+
+
+### `llama-stack-client post_training artifacts`
+Get the training artifacts of a specific post training job
+```bash
+llama-stack-client post_training artifacts --job-uuid <job-uuid>
+```
+```bash
+JobArtifactsResponse(checkpoints=[], job_uuid='job-1')
+```
+
+
+Required arguments:
+- `--job-uuid`: Job UUID
+
+
+### `llama-stack-client post_training supervised_fine_tune`
+Kick off a supervised fine tune job
+```bash
+llama-stack-client post_training supervised_fine_tune --job-uuid <job-uuid> --model <model> --algorithm-config <algorithm-config> --training-config <training-config> [--checkpoint-dir <checkpoint-dir>]
+```
+
+
+Required arguments:
+- `--job-uuid`: Job UUID
+- `--model`: Model ID
+- `--algorithm-config`: Algorithm Config
+- `--training-config`: Training Config
+
+Optional arguments:
+- `--checkpoint-dir`: Checkpoint Config
+
+
+### `llama-stack-client post_training status`
+Show the status of a specific post training job
+```bash
+llama-stack-client post_training status --job-uuid <job-uuid>
+```
+```bash
+JobStatusResponse(
+    checkpoints=[],
+    job_uuid='job-1',
+    status='completed',
+    completed_at="",
+    resources_allocated="",
+    scheduled_at="",
+    started_at=""
+)
+```
+
+
+Required arguments:
+- `--job-uuid`: Job UUID
+
+
+### `llama-stack-client post_training cancel`
+Cancel the training job
+```bash
+llama-stack-client post_training cancel --job-uuid <job-uuid>
+```
+```bash
+# This functionality is not yet implemented for llama-stack-client
+╭────────────────────────────────────────────────────────────╮
+│ Failed to post_training cancel_training_job                │
+│                                                            │
+│ Error Type: InternalServerError                            │
+│ Details: Error code: 501 - {'detail': 'Not implemented: '} │
+╰────────────────────────────────────────────────────────────╯
+```
+
+
+Required arguments:
+- `--job-uuid`: Job UUID
--- a/docs/docs/references/python_sdk_reference/index.md
+++ b/docs/docs/references/python_sdk_reference/index.md
@ -0,0 +1,462 @@
+# Python SDK Reference
+
+## Shared Types
+
+```python
+from llama_stack_client.types import (
+    AgentConfig,
+    BatchCompletion,
+    CompletionMessage,
+    ContentDelta,
+    Document,
+    InterleavedContent,
+    InterleavedContentItem,
+    Message,
+    ParamType,
+    QueryConfig,
+    QueryResult,
+    ReturnType,
+    SafetyViolation,
+    SamplingParams,
+    ScoringResult,
+    SystemMessage,
+    ToolCall,
+    ToolParamDefinition,
+    ToolResponseMessage,
+    URL,
+    UserMessage,
+)
+```
+
+## Toolgroups
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    ListToolGroupsResponse,
+    ToolGroup,
+    ToolgroupListResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/toolgroups">client.toolgroups.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/toolgroups.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/toolgroup_list_response.py">ToolgroupListResponse</a></code>
+- <code title="get /v1/toolgroups/{toolgroup_id}">client.toolgroups.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/toolgroups.py">get</a>(toolgroup_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_group.py">ToolGroup</a></code>
+- <code title="post /v1/toolgroups">client.toolgroups.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/toolgroups.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/toolgroup_register_params.py">params</a>) -> None</code>
+- <code title="delete /v1/toolgroups/{toolgroup_id}">client.toolgroups.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/toolgroups.py">unregister</a>(toolgroup_id) -> None</code>
+
+## Tools
+
+Types:
+
+```python
+from llama_stack_client.types import ListToolsResponse, Tool, ToolListResponse
+```
+
+Methods:
+
+- <code title="get /v1/tools">client.tools.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tools.py">list</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_list_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_list_response.py">ToolListResponse</a></code>
+- <code title="get /v1/tools/{tool_name}">client.tools.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tools.py">get</a>(tool_name) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool.py">Tool</a></code>
+
+## ToolRuntime
+
+Types:
+
+```python
+from llama_stack_client.types import ToolDef, ToolInvocationResult
+```
+
+Methods:
+
+- <code title="post /v1/tool-runtime/invoke">client.tool_runtime.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tool_runtime/tool_runtime.py">invoke_tool</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_runtime_invoke_tool_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_invocation_result.py">ToolInvocationResult</a></code>
+- <code title="get /v1/tool-runtime/list-tools">client.tool_runtime.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tool_runtime/tool_runtime.py">list_tools</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_runtime_list_tools_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_def.py">JSONLDecoder[ToolDef]</a></code>
+
+### RagTool
+
+Methods:
+
+- <code title="post /v1/tool-runtime/rag-tool/insert">client.tool_runtime.rag_tool.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tool_runtime/rag_tool.py">insert</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_runtime/rag_tool_insert_params.py">params</a>) -> None</code>
+- <code title="post /v1/tool-runtime/rag-tool/query">client.tool_runtime.rag_tool.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/tool_runtime/rag_tool.py">query</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/tool_runtime/rag_tool_query_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shared/query_result.py">QueryResult</a></code>
+
+## Agents
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    InferenceStep,
+    MemoryRetrievalStep,
+    ShieldCallStep,
+    ToolExecutionStep,
+    ToolResponse,
+    AgentCreateResponse,
+)
+```
+
+Methods:
+
+- <code title="post /v1/agents">client.agents.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/agents.py">create</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agent_create_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agent_create_response.py">AgentCreateResponse</a></code>
+- <code title="delete /v1/agents/{agent_id}">client.agents.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/agents.py">delete</a>(agent_id) -> None</code>
+
+### Session
+
+Types:
+
+```python
+from llama_stack_client.types.agents import Session, SessionCreateResponse
+```
+
+Methods:
+
+- <code title="post /v1/agents/{agent_id}/session">client.agents.session.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/session.py">create</a>(agent_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/session_create_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/session_create_response.py">SessionCreateResponse</a></code>
+- <code title="get /v1/agents/{agent_id}/session/{session_id}">client.agents.session.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/session.py">retrieve</a>(session_id, \*, agent_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/session_retrieve_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/session.py">Session</a></code>
+- <code title="delete /v1/agents/{agent_id}/session/{session_id}">client.agents.session.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/session.py">delete</a>(session_id, \*, agent_id) -> None</code>
+
+### Steps
+
+Types:
+
+```python
+from llama_stack_client.types.agents import StepRetrieveResponse
+```
+
+Methods:
+
+- <code title="get /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}">client.agents.steps.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/steps.py">retrieve</a>(step_id, \*, agent_id, session_id, turn_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/step_retrieve_response.py">StepRetrieveResponse</a></code>
+
+### Turn
+
+Types:
+
+```python
+from llama_stack_client.types.agents import Turn, TurnCreateResponse
+```
+
+Methods:
+
+- <code title="post /v1/agents/{agent_id}/session/{session_id}/turn">client.agents.turn.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/turn.py">create</a>(session_id, \*, agent_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/turn_create_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/turn_create_response.py">TurnCreateResponse</a></code>
+- <code title="get /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}">client.agents.turn.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/agents/turn.py">retrieve</a>(turn_id, \*, agent_id, session_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/agents/turn.py">Turn</a></code>
+
+## BatchInference
+
+Types:
+
+```python
+from llama_stack_client.types import BatchInferenceChatCompletionResponse
+```
+
+Methods:
+
+- <code title="post /v1/batch-inference/chat-completion">client.batch_inference.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/batch_inference.py">chat_completion</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/batch_inference_chat_completion_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/batch_inference_chat_completion_response.py">BatchInferenceChatCompletionResponse</a></code>
+- <code title="post /v1/batch-inference/completion">client.batch_inference.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/batch_inference.py">completion</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/batch_inference_completion_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shared/batch_completion.py">BatchCompletion</a></code>
+
+## Datasets
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    ListDatasetsResponse,
+    DatasetRetrieveResponse,
+    DatasetListResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/datasets/{dataset_id}">client.datasets.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasets.py">retrieve</a>(dataset_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/dataset_retrieve_response.py">Optional[DatasetRetrieveResponse]</a></code>
+- <code title="get /v1/datasets">client.datasets.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasets.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/dataset_list_response.py">DatasetListResponse</a></code>
+- <code title="post /v1/datasets">client.datasets.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasets.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/dataset_register_params.py">params</a>) -> None</code>
+- <code title="delete /v1/datasets/{dataset_id}">client.datasets.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasets.py">unregister</a>(dataset_id) -> None</code>
+
+## Eval
+
+Types:
+
+```python
+from llama_stack_client.types import EvaluateResponse, Job
+```
+
+Methods:
+
+- <code title="post /v1/eval/tasks/{benchmark_id}/evaluations">client.eval.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/eval/eval.py">evaluate_rows</a>(benchmark_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/eval_evaluate_rows_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/evaluate_response.py">EvaluateResponse</a></code>
+- <code title="post /v1/eval/tasks/{benchmark_id}/jobs">client.eval.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/eval/eval.py">run_eval</a>(benchmark_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/eval_run_eval_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/job.py">Job</a></code>
+
+### Jobs
+
+Types:
+
+```python
+from llama_stack_client.types.eval import JobStatusResponse
+```
+
+Methods:
+
+- <code title="get /v1/eval/tasks/{benchmark_id}/jobs/{job_id}/result">client.eval.jobs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/eval/jobs.py">retrieve</a>(job_id, \*, benchmark_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/evaluate_response.py">EvaluateResponse</a></code>
+- <code title="delete /v1/eval/tasks/{benchmark_id}/jobs/{job_id}">client.eval.jobs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/eval/jobs.py">cancel</a>(job_id, \*, benchmark_id) -> None</code>
+- <code title="get /v1/eval/tasks/{benchmark_id}/jobs/{job_id}">client.eval.jobs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/eval/jobs.py">status</a>(job_id, \*, benchmark_id) -> Optional[JobStatusResponse]</code>
+
+## Inspect
+
+Types:
+
+```python
+from llama_stack_client.types import HealthInfo, ProviderInfo, RouteInfo, VersionInfo
+```
+
+Methods:
+
+- <code title="get /v1/health">client.inspect.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/inspect.py">health</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/health_info.py">HealthInfo</a></code>
+- <code title="get /v1/version">client.inspect.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/inspect.py">version</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/version_info.py">VersionInfo</a></code>
+
+## Inference
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    CompletionResponse,
+    EmbeddingsResponse,
+    TokenLogProbs,
+    InferenceChatCompletionResponse,
+    InferenceCompletionResponse,
+)
+```
+
+Methods:
+
+- <code title="post /v1/inference/chat-completion">client.inference.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/inference.py">chat_completion</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/inference_chat_completion_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/inference_chat_completion_response.py">InferenceChatCompletionResponse</a></code>
+- <code title="post /v1/inference/completion">client.inference.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/inference.py">completion</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/inference_completion_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/inference_completion_response.py">InferenceCompletionResponse</a></code>
+- <code title="post /v1/inference/embeddings">client.inference.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/inference.py">embeddings</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/inference_embeddings_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/embeddings_response.py">EmbeddingsResponse</a></code>
+
+## VectorIo
+
+Types:
+
+```python
+from llama_stack_client.types import QueryChunksResponse
+```
+
+Methods:
+
+- <code title="post /v1/vector-io/insert">client.vector_io.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_io.py">insert</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_io_insert_params.py">params</a>) -> None</code>
+- <code title="post /v1/vector-io/query">client.vector_io.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_io.py">query</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_io_query_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/query_chunks_response.py">QueryChunksResponse</a></code>
+
+## VectorDBs
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    ListVectorDBsResponse,
+    VectorDBRetrieveResponse,
+    VectorDBListResponse,
+    VectorDBRegisterResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/vector-dbs/{vector_db_id}">client.vector_dbs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_dbs.py">retrieve</a>(vector_db_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_db_retrieve_response.py">Optional[VectorDBRetrieveResponse]</a></code>
+- <code title="get /v1/vector-dbs">client.vector_dbs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_dbs.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_db_list_response.py">VectorDBListResponse</a></code>
+- <code title="post /v1/vector-dbs">client.vector_dbs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_dbs.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_db_register_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/vector_db_register_response.py">VectorDBRegisterResponse</a></code>
+- <code title="delete /v1/vector-dbs/{vector_db_id}">client.vector_dbs.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/vector_dbs.py">unregister</a>(vector_db_id) -> None</code>
+
+## Models
+
+Types:
+
+```python
+from llama_stack_client.types import ListModelsResponse, Model, ModelListResponse
+```
+
+Methods:
+
+- <code title="get /v1/models/{model_id}">client.models.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/models.py">retrieve</a>(model_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/model.py">Optional[Model]</a></code>
+- <code title="get /v1/models">client.models.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/models.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/model_list_response.py">ModelListResponse</a></code>
+- <code title="post /v1/models">client.models.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/models.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/model_register_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/model.py">Model</a></code>
+- <code title="delete /v1/models/{model_id}">client.models.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/models.py">unregister</a>(model_id) -> None</code>
+
+## PostTraining
+
+Types:
+
+```python
+from llama_stack_client.types import ListPostTrainingJobsResponse, PostTrainingJob
+```
+
+Methods:
+
+- <code title="post /v1/post-training/preference-optimize">client.post_training.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/post_training.py">preference_optimize</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training_preference_optimize_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training_job.py">PostTrainingJob</a></code>
+- <code title="post /v1/post-training/supervised-fine-tune">client.post_training.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/post_training.py">supervised_fine_tune</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training_supervised_fine_tune_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training_job.py">PostTrainingJob</a></code>
+
+### Job
+
+Types:
+
+```python
+from llama_stack_client.types.post_training import (
+    JobListResponse,
+    JobArtifactsResponse,
+    JobStatusResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/post-training/jobs">client.post_training.job.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/job.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_list_response.py">JobListResponse</a></code>
+- <code title="get /v1/post-training/job/artifacts">client.post_training.job.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/job.py">artifacts</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_artifacts_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_artifacts_response.py">Optional[JobArtifactsResponse]</a></code>
+- <code title="post /v1/post-training/job/cancel">client.post_training.job.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/job.py">cancel</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_cancel_params.py">params</a>) -> None</code>
+- <code title="get /v1/post-training/job/status">client.post_training.job.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/post_training/job.py">status</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_status_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/post_training/job_status_response.py">Optional[JobStatusResponse]</a></code>
+
+## Providers
+
+Types:
+
+```python
+from llama_stack_client.types import ListProvidersResponse, ProviderListResponse
+```
+
+Methods:
+
+- <code title="get /v1/inspect/providers">client.providers.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/providers.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/provider_list_response.py">ProviderListResponse</a></code>
+
+## Routes
+
+Types:
+
+```python
+from llama_stack_client.types import ListRoutesResponse, RouteListResponse
+```
+
+Methods:
+
+- <code title="get /v1/inspect/routes">client.routes.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/routes.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/route_list_response.py">RouteListResponse</a></code>
+
+## Safety
+
+Types:
+
+```python
+from llama_stack_client.types import RunShieldResponse
+```
+
+Methods:
+
+- <code title="post /v1/safety/run-shield">client.safety.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/safety.py">run_shield</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/safety_run_shield_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/run_shield_response.py">RunShieldResponse</a></code>
+
+## Shields
+
+Types:
+
+```python
+from llama_stack_client.types import ListShieldsResponse, Shield, ShieldListResponse
+```
+
+Methods:
+
+- <code title="get /v1/shields/{identifier}">client.shields.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/shields.py">retrieve</a>(identifier) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shield.py">Optional[Shield]</a></code>
+- <code title="get /v1/shields">client.shields.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/shields.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shield_list_response.py">ShieldListResponse</a></code>
+- <code title="post /v1/shields">client.shields.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/shields.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shield_register_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/shield.py">Shield</a></code>
+
+## SyntheticDataGeneration
+
+Types:
+
+```python
+from llama_stack_client.types import SyntheticDataGenerationResponse
+```
+
+Methods:
+
+- <code title="post /v1/synthetic-data-generation/generate">client.synthetic_data_generation.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/synthetic_data_generation.py">generate</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/synthetic_data_generation_generate_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/synthetic_data_generation_response.py">SyntheticDataGenerationResponse</a></code>
+
+## Telemetry
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    QuerySpansResponse,
+    SpanWithStatus,
+    Trace,
+    TelemetryGetSpanResponse,
+    TelemetryGetSpanTreeResponse,
+    TelemetryQuerySpansResponse,
+    TelemetryQueryTracesResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/telemetry/traces/{trace_id}/spans/{span_id}">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">get_span</a>(span_id, \*, trace_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_get_span_response.py">TelemetryGetSpanResponse</a></code>
+- <code title="get /v1/telemetry/spans/{span_id}/tree">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">get_span_tree</a>(span_id, \*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_get_span_tree_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_get_span_tree_response.py">TelemetryGetSpanTreeResponse</a></code>
+- <code title="get /v1/telemetry/traces/{trace_id}">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">get_trace</a>(trace_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/trace.py">Trace</a></code>
+- <code title="post /v1/telemetry/events">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">log_event</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_log_event_params.py">params</a>) -> None</code>
+- <code title="get /v1/telemetry/spans">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">query_spans</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_query_spans_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_query_spans_response.py">TelemetryQuerySpansResponse</a></code>
+- <code title="get /v1/telemetry/traces">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">query_traces</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_query_traces_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_query_traces_response.py">TelemetryQueryTracesResponse</a></code>
+- <code title="post /v1/telemetry/spans/export">client.telemetry.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/telemetry.py">save_spans_to_dataset</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/telemetry_save_spans_to_dataset_params.py">params</a>) -> None</code>
+
+## Datasetio
+
+Types:
+
+```python
+from llama_stack_client.types import PaginatedRowsResult
+```
+
+Methods:
+
+- <code title="post /v1/datasetio/rows">client.datasetio.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasetio.py">append_rows</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/datasetio_append_rows_params.py">params</a>) -> None</code>
+- <code title="get /v1/datasetio/rows">client.datasetio.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/datasetio.py">get_rows_paginated</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/datasetio_get_rows_paginated_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/paginated_rows_result.py">PaginatedRowsResult</a></code>
+
+## Scoring
+
+Types:
+
+```python
+from llama_stack_client.types import ScoringScoreResponse, ScoringScoreBatchResponse
+```
+
+Methods:
+
+- <code title="post /v1/scoring/score">client.scoring.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/scoring.py">score</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_score_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_score_response.py">ScoringScoreResponse</a></code>
+- <code title="post /v1/scoring/score-batch">client.scoring.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/scoring.py">score_batch</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_score_batch_params.py">params</a>) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_score_batch_response.py">ScoringScoreBatchResponse</a></code>
+
+## ScoringFunctions
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    ListScoringFunctionsResponse,
+    ScoringFn,
+    ScoringFunctionListResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/scoring-functions/{scoring_fn_id}">client.scoring_functions.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/scoring_functions.py">retrieve</a>(scoring_fn_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_fn.py">Optional[ScoringFn]</a></code>
+- <code title="get /v1/scoring-functions">client.scoring_functions.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/scoring_functions.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_function_list_response.py">ScoringFunctionListResponse</a></code>
+- <code title="post /v1/scoring-functions">client.scoring_functions.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/scoring_functions.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/scoring_function_register_params.py">params</a>) -> None</code>
+
+## Benchmarks
+
+Types:
+
+```python
+from llama_stack_client.types import (
+    Benchmark,
+    ListBenchmarksResponse,
+    BenchmarkListResponse,
+)
+```
+
+Methods:
+
+- <code title="get /v1/eval-tasks/{benchmark_id}">client.benchmarks.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/benchmarks.py">retrieve</a>(benchmark_id) -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/benchmark.py">Optional[Benchmark]</a></code>
+- <code title="get /v1/eval-tasks">client.benchmarks.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/benchmarks.py">list</a>() -> <a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/benchmark_list_response.py">BenchmarkListResponse</a></code>
+- <code title="post /v1/eval-tasks">client.benchmarks.<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/resources/benchmarks.py">register</a>(\*\*<a href="https://github.com/meta-llama/llama-stack-client-python/tree/main/src/llama_stack_client/types/benchmark_register_params.py">params</a>) -> None</code>