MDX leftover fixes

This commit is contained in:
Alexey Rybak 2025-09-23 16:40:39 -07:00 committed by raghotham
parent aebd728c81
commit cfc8357930
11 changed files with 96 additions and 110 deletions

View file

@ -75,7 +75,7 @@ outlined on that page and do not file a public issue.
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Meta's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
Complete your CLA here: [https://code.facebook.com/cla](https://code.facebook.com/cla)
**I'd like to contribute!**

View file

@ -12,9 +12,9 @@ This guide will walk you through the process of adding a new API provider to Lla
- Begin by reviewing the [core concepts](../concepts/index.md) of Llama Stack and choose the API your provider belongs to (Inference, Safety, VectorIO, etc.)
- Determine the provider type ({repopath}`Remote::llama_stack/providers/remote` or {repopath}`Inline::llama_stack/providers/inline`). Remote providers make requests to external services, while inline providers execute implementation locally.
- Add your provider to the appropriate {repopath}`Registry::llama_stack/providers/registry/`. Specify pip dependencies necessary.
- Update any distribution {repopath}`Templates::llama_stack/distributions/` `build.yaml` and `run.yaml` files if they should include your provider by default. Run {repopath}`./scripts/distro_codegen.py` if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
- Determine the provider type ([Remote](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/remote) or [Inline](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline)). Remote providers make requests to external services, while inline providers execute implementation locally.
- Add your provider to the appropriate [Registry](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/registry/). Specify pip dependencies necessary.
- Update any distribution [Templates](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/distributions/) `build.yaml` and `run.yaml` files if they should include your provider by default. Run [./scripts/distro_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/distro_codegen.py) if necessary. Note that `distro_codegen.py` will fail if the new provider causes any distribution template to attempt to import provider-specific dependencies. This usually means the distribution's `get_distribution_template()` code path should only import any necessary Config or model alias definitions from each provider and not the provider's actual implementation.
Here are some example PRs to help you get started:
@ -71,9 +71,9 @@ Before running tests, you must have required dependencies installed. This depend
### 1. Integration Testing
Integration tests are located in {repopath}`tests/integration`. These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality. Since these tests use client APIs, they can be run either by pointing to an instance of the Llama Stack server or "inline" by using `LlamaStackAsLibraryClient`.
Integration tests are located in [tests/integration](https://github.com/meta-llama/llama-stack/tree/main/tests/integration). These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality. Since these tests use client APIs, they can be run either by pointing to an instance of the Llama Stack server or "inline" by using `LlamaStackAsLibraryClient`.
Consult {repopath}`tests/integration/README.md` for more details on how to run the tests.
Consult [tests/integration/README.md](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more details on how to run the tests.
Note that each provider's `sample_run_config()` method (in the configuration class for that provider)
typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
@ -81,9 +81,9 @@ Note that each provider's `sample_run_config()` method (in the configuration cla
### 2. Unit Testing
Unit tests are located in {repopath}`tests/unit`. Provider-specific unit tests are located in {repopath}`tests/unit/providers`. These tests are all run automatically as part of the CI process.
Unit tests are located in [tests/unit](https://github.com/meta-llama/llama-stack/tree/main/tests/unit). Provider-specific unit tests are located in [tests/unit/providers](https://github.com/meta-llama/llama-stack/tree/main/tests/unit/providers). These tests are all run automatically as part of the CI process.
Consult {repopath}`tests/unit/README.md` for more details on how to run the tests manually.
Consult [tests/unit/README.md](https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md) for more details on how to run the tests manually.
### 3. Additional end-to-end testing

View file

@ -39,7 +39,7 @@ filtering, sorting, and aggregating vectors.
- `YourVectorIOAdapter.query_chunks()`
- `YourVectorIOAdapter.delete_chunks()`
3. **Add to Registry**: Register your provider in the appropriate registry file.
- Update {repopath}`llama_stack/providers/registry/vector_io.py` to include your new provider.
- Update [llama_stack/providers/registry/vector_io.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/registry/vector_io.py) to include your new provider.
```python
from llama_stack.providers.registry.specs import InlineProviderSpec
from llama_stack.providers.registry.api import Api
@ -65,7 +65,7 @@ InlineProviderSpec(
5. Add your provider to the `vector_io_providers` fixture dictionary.
- Please follow the naming convention of `your_vectorprovider_index` and `your_vectorprovider_adapter` as the tests require this to execute properly.
- Integration Tests
- Integration tests are located in {repopath}`tests/integration`. These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality.
- Integration tests are located in [tests/integration](https://github.com/meta-llama/llama-stack/tree/main/tests/integration). These tests use the python client-SDK APIs (from the `llama_stack_client` package) to test functionality.
- The two set of integration tests are:
- `tests/integration/vector_io/test_vector_io.py`: This file tests registration, insertion, and retrieval.
- `tests/integration/vector_io/test_openai_vector_stores.py`: These tests are for OpenAI-compatible vector stores and test the OpenAI API compatibility.
@ -79,5 +79,5 @@ InlineProviderSpec(
- If you are adding tests for the `remote` provider you will have to update the `test` group, which is used in the GitHub CI for integration tests.
- `uv add new_pip_package --group test`
5. **Update Documentation**: Please update the documentation for end users
- Generate the provider documentation by running {repopath}`./scripts/provider_codegen.py`.
- Update the autogenerated content in the registry/vector_io.py file with information about your provider. Please see other providers for examples.
- Generate the provider documentation by running [./scripts/provider_codegen.py](https://github.com/meta-llama/llama-stack/blob/main/scripts/provider_codegen.py).
- Update the autogenerated content in the registry/vector_io.py file with information about your provider. Please see other providers for examples.

View file

@ -86,8 +86,11 @@ options:
After this step is complete, a file named `<name>-build.yaml` and template file `<name>-run.yaml` will be generated and saved at the output file path specified at the end of the command.
::::{tab-set}
:::{tab-item} Building from a template
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<Tabs>
<TabItem value="template" label="Building from a template">
To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
The following command will allow you to see the available templates and their corresponding providers.
@ -160,8 +163,8 @@ You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and
```{tip}
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
```
:::
:::{tab-item} Building from Scratch
</TabItem>
<TabItem value="scratch" label="Building from Scratch">
If the provided templates do not fit your use case, you could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
@ -190,9 +193,8 @@ Tip: use <TAB> to see options for the providers.
You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml`
```
:::
:::{tab-item} Building from a pre-existing build config file
</TabItem>
<TabItem value="config" label="Building from a pre-existing build config file">
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/distributions/*build.yaml`.
@ -200,9 +202,8 @@ You can now edit ~/.llama/distributions/llamastack-my-local-stack/my-local-stack
```
llama stack build --config llama_stack/distributions/starter/build.yaml
```
:::
:::{tab-item} Building with External Providers
</TabItem>
<TabItem value="external" label="Building with External Providers">
Llama Stack supports external providers that live outside of the main codebase. This allows you to create and maintain your own providers independently or use community-provided providers.
@ -251,15 +252,12 @@ llama stack build --config my-external-stack.yaml
```
For more information on external providers, including directory structure, provider types, and implementation requirements, see the [External Providers documentation](../providers/external.md).
:::
:::{tab-item} Building Container
```{admonition} Podman Alternative
:class: tip
</TabItem>
<TabItem value="container" label="Building Container">
:::tip Podman Alternative
Podman is supported as an alternative to Docker. Set `CONTAINER_BINARY` to `podman` in your environment to use Podman.
```
:::
To build a container image, you may start off from a template and use the `--image-type container` flag to specify `container` as the build image type.
@ -278,7 +276,8 @@ You can now edit ~/meta-llama/llama-stack/tmp/configs/ollama-run.yaml and run `l
```
Now set some environment variables for the inference model ID and Llama Stack Port and create a local directory to mount into the container's file system.
```
```bash
export INFERENCE_MODEL="llama3.2:3b"
export LLAMA_STACK_PORT=8321
mkdir -p ~/.llama
@ -312,9 +311,8 @@ Here are the docker flags and their uses:
* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
:::
::::
</TabItem>
</Tabs>
### Running your Stack server

View file

@ -478,12 +478,12 @@ A rule may also specify a condition, either a 'when' or an 'unless',
with additional constraints as to where the rule applies. The
constraints supported at present are:
- 'user with <attr-value> in <attr-name>'
- 'user with <attr-value> not in <attr-name>'
- 'user with `<attr-value>` in `<attr-name>`'
- 'user with `<attr-value>` not in `<attr-name>`'
- 'user is owner'
- 'user is not owner'
- 'user in owners <attr-name>'
- 'user not in owners <attr-name>'
- 'user in owners `<attr-name>`'
- 'user not in owners `<attr-name>`'
The attributes defined for a user will depend on how the auth
configuration is defined.

View file

@ -31,23 +31,21 @@ ollama run llama3.2:3b --keepalive 60m
Install [uv](https://docs.astral.sh/uv/) to setup your virtual environment
::::{tab-set}
:::{tab-item} macOS and Linux
<Tabs>
<TabItem value="unix" label="macOS and Linux">
Use `curl` to download the script and execute it with `sh`:
```console
curl -LsSf https://astral.sh/uv/install.sh | sh
```
:::
:::{tab-item} Windows
</TabItem>
<TabItem value="windows" label="Windows">
Use `irm` to download the script and execute it with `iex`:
```console
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
:::
::::
</TabItem>
</Tabs>
Setup your virtual environment.
@ -58,9 +56,8 @@ source .venv/bin/activate
### Step 2: Run Llama Stack
Llama Stack is a server that exposes multiple APIs, you connect with it using the Llama Stack client SDK.
::::{tab-set}
:::{tab-item} Using `venv`
<Tabs>
<TabItem value="venv" label="Using venv">
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
@ -71,19 +68,8 @@ We use `starter` as template. By default all providers are disabled, this requir
```bash
llama stack build --distro starter --image-type venv --run
```
:::
:::{tab-item} Using `venv`
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
which defines the providers and their settings.
Now let's build and run the Llama Stack config for Ollama.
```bash
llama stack build --distro starter --image-type venv --run
```
:::
:::{tab-item} Using a Container
</TabItem>
<TabItem value="container" label="Using a Container">
You can use a container image to run the Llama Stack server. We provide several container images for the server
component that works with different inference providers out of the box. For this guide, we will use
`llamastack/distribution-starter` as the container image. If you'd like to build your own image or customize the
@ -110,9 +96,8 @@ with `host.containers.internal`.
The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`.
```{tip}
Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the hosts network directly so it can connect to Ollama running on `localhost:11434`.
:::tip
Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host's network directly so it can connect to Ollama running on `localhost:11434`.
Linux users having issues running the above command should instead try the following:
```bash
@ -126,7 +111,6 @@ docker run -it \
--env OLLAMA_URL=http://localhost:11434
```
:::
::::
You will see output like below:
```
INFO: Application startup complete.
@ -137,31 +121,29 @@ Now you can use the Llama Stack client to run inference and build agents!
You can reuse the server setup or use the [Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/).
Note that the client package is already included in the `llama-stack` package.
</TabItem>
</Tabs>
### Step 3: Run Client CLI
Open a new terminal and navigate to the same directory you started the server from. Then set up a new or activate your
existing server virtual environment.
::::{tab-set}
:::{tab-item} Reuse Server `venv`
<Tabs>
<TabItem value="reuse" label="Reuse Server venv">
```bash
# The client is included in the llama-stack package so we just activate the server venv
source .venv/bin/activate
```
:::
:::{tab-item} Install with `venv`
</TabItem>
<TabItem value="install" label="Install with venv">
```bash
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client
```
:::
::::
</TabItem>
</Tabs>
Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the
connectivity to the server.
@ -237,9 +219,8 @@ OpenAIChatCompletion(
Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
::::{tab-set}
:::{tab-item} Basic Inference
<Tabs>
<TabItem value="inference" label="Basic Inference">
Now you can run inference using the Llama Stack client SDK.
#### i. Create the Script
@ -279,9 +260,8 @@ Which will output:
Model: ollama/llama3.2:3b
OpenAIChatCompletion(id='chatcmpl-30cd0f28-a2ad-4b6d-934b-13707fc60ebf', choices=[OpenAIChatCompletionChoice(finish_reason='stop', index=0, message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(role='assistant', content="Lines of code unfold\nAlgorithms dance with ease\nLogic's gentle kiss", name=None, tool_calls=None, refusal=None, annotations=None, audio=None, function_call=None), logprobs=None)], created=1751732480, model='llama3.2:3b', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage={'completion_tokens': 16, 'prompt_tokens': 37, 'total_tokens': 53, 'completion_tokens_details': None, 'prompt_tokens_details': None})
```
:::
:::{tab-item} Build a Simple Agent
</TabItem>
<TabItem value="agent" label="Build a Simple Agent">
Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
#### i. Create the Script
Create a file `agent.py` and add the following code:
@ -449,9 +429,8 @@ uv run python agent.py
So, that's me in a nutshell!
```
:::
:::{tab-item} Build a RAG Agent
</TabItem>
<TabItem value="rag" label="Build a RAG Agent">
For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents
in a vector database.
@ -554,9 +533,8 @@ uv run python rag_agent.py
...
Overall, DORA is a powerful reinforcement learning algorithm that can learn complex tasks from human demonstrations. However, it requires careful consideration of the challenges and limitations to achieve optimal results.
```
:::
::::
</TabItem>
</Tabs>
**You're Ready to Build Your Own Apps!**

View file

@ -11,11 +11,8 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
| `device` | `<class 'str'>` | No | cuda | |
| `distributed_backend` | `Literal['fsdp', 'deepspeed'` | No | | |
| `checkpoint_format` | `Literal['full_state', 'huggingface'` | No | huggingface | |
| `chat_template` | `<class 'str'>` | No | <|user|>
{input}
<|assistant|>
{output} | |
| `model_specific_config` | `<class 'dict'>` | No | {'trust_remote_code': True, 'attn_implementation': 'sdpa'} | |
| `chat_template` | `<class 'str'>` | No | `&lt;|user|&gt;`<br/>`{input}`<br/>`&lt;|assistant|&gt;`<br/>`{output}` | |
| `model_specific_config` | `<class 'dict'>` | No | `&#123;'trust_remote_code': True, 'attn_implementation': 'sdpa'&#125;` | |
| `max_seq_length` | `<class 'int'>` | No | 2048 | |
| `gradient_checkpointing` | `<class 'bool'>` | No | False | |
| `save_total_limit` | `<class 'int'>` | No | 3 | |

View file

@ -17,8 +17,8 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
| `device` | `<class 'str'>` | No | cuda | |
| `distributed_backend` | `Literal['fsdp', 'deepspeed'` | No | | |
| `checkpoint_format` | `Literal['full_state', 'huggingface'` | No | huggingface | |
| `chat_template` | `<class 'str'>` | No | &lt;|user|&gt;&lt;br/&gt;&#123;input&#125;&lt;br/&gt;&lt;|assistant|&gt;&lt;br/&gt;&#123;output&#125; | |
| `model_specific_config` | `<class 'dict'>` | No | &#123;'trust_remote_code': True, 'attn_implementation': 'sdpa'&#125; | |
| `chat_template` | `<class 'str'>` | No | `&lt;|user|&gt;`<br/>`{input}`<br/>`&lt;|assistant|&gt;`<br/>`{output}` | |
| `model_specific_config` | `<class 'dict'>` | No | `&#123;'trust_remote_code': True, 'attn_implementation': 'sdpa'&#125;` | |
| `max_seq_length` | `<class 'int'>` | No | 2048 | |
| `gradient_checkpointing` | `<class 'bool'>` | No | False | |
| `save_total_limit` | `<class 'int'>` | No | 3 | |

View file

@ -11,11 +11,8 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
| `device` | `<class 'str'>` | No | cuda | |
| `distributed_backend` | `Literal['fsdp', 'deepspeed'` | No | | |
| `checkpoint_format` | `Literal['full_state', 'huggingface'` | No | huggingface | |
| `chat_template` | `<class 'str'>` | No | <|user|>
{input}
<|assistant|>
{output} | |
| `model_specific_config` | `<class 'dict'>` | No | {'trust_remote_code': True, 'attn_implementation': 'sdpa'} | |
| `chat_template` | `<class 'str'>` | No | `&lt;|user|&gt;`<br/>`{input}`<br/>`&lt;|assistant|&gt;`<br/>`{output}` | |
| `model_specific_config` | `<class 'dict'>` | No | `&#123;'trust_remote_code': True, 'attn_implementation': 'sdpa'&#125;` | |
| `max_seq_length` | `<class 'int'>` | No | 2048 | |
| `gradient_checkpointing` | `<class 'bool'>` | No | False | |
| `save_total_limit` | `<class 'int'>` | No | 3 | |

View file

@ -409,7 +409,7 @@ For more details on TLS configuration, refer to the [TLS setup guide](https://mi
| `token` | `str \| None` | No | | The token of the Milvus server |
| `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
| `config` | `dict` | No | &#123;&#125; | This configuration allows additional fields to be passed through to the underlying Milvus client. See the [Milvus](https://milvus.io/docs/install-overview.md) documentation for more details about Milvus in general. |
| `config` | `dict` | No | `{}` | This configuration allows additional fields to be passed through to the underlying Milvus client. See the [Milvus](https://milvus.io/docs/install-overview.md) documentation for more details about Milvus in general. |
:::note
This configuration class accepts additional fields beyond those listed above. You can pass any additional configuration options that will be forwarded to the underlying provider.

View file

@ -229,17 +229,33 @@ def generate_provider_docs(progress, provider_spec: Any, api_name: str) -> str:
# Handle multiline default values and escape problematic characters for MDX
if "\n" in default:
default = (
default.replace("\n", "<br/>")
.replace("<", "&lt;")
.replace(">", "&gt;")
.replace("{", "&#123;")
.replace("}", "&#125;")
)
# For multiline defaults, escape angle brackets and use <br/> for line breaks
lines = default.split("\n")
escaped_lines = []
for line in lines:
if line.strip():
# Escape angle brackets and wrap template tokens in backticks
escaped_line = line.strip().replace("<", "&lt;").replace(">", "&gt;")
if ("{" in escaped_line and "}" in escaped_line) or (
"&lt;|" in escaped_line and "|&gt;" in escaped_line
):
escaped_lines.append(f"`{escaped_line}`")
else:
escaped_lines.append(escaped_line)
else:
escaped_lines.append("")
default = "<br/>".join(escaped_lines)
else:
default = (
default.replace("<", "&lt;").replace(">", "&gt;").replace("{", "&#123;").replace("}", "&#125;")
)
# For single line defaults, escape angle brackets first
escaped_default = default.replace("<", "&lt;").replace(">", "&gt;")
# Then wrap template tokens in backticks
if ("{" in escaped_default and "}" in escaped_default) or (
"&lt;|" in escaped_default and "|&gt;" in escaped_default
):
default = f"`{escaped_default}`"
else:
# Apply additional escaping for curly braces
default = escaped_default.replace("{", "&#123;").replace("}", "&#125;")
description_text = field_info["description"] or ""
# Escape curly braces in description text for MDX compatibility