Merge branch 'main' into models_api_2
|
@ -1,5 +1,4 @@
|
|||
include requirements.txt
|
||||
include llama_stack/distribution/*.sh
|
||||
include llama_stack/cli/scripts/*.sh
|
||||
include llama_stack/distribution/example_configs/conda/*.yaml
|
||||
include llama_stack/distribution/example_configs/docker/*.yaml
|
||||
include llama_stack/distribution/templates/*.yaml
|
||||
|
|
|
@ -28,6 +28,7 @@ Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/
|
|||
```
|
||||
|
||||
**`llama stack configure`**
|
||||
- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
|
||||
```
|
||||
llama stack configure my-local-llama-stack
|
||||
|
||||
|
@ -61,6 +62,7 @@ You can now run `llama stack run my-local-llama-stack --port PORT` or `llama sta
|
|||
```
|
||||
|
||||
**`llama stack run`**
|
||||
- Run `llama stack run <name>` with the name you have previously defined.
|
||||
```
|
||||
llama stack run my-local-llama-stack
|
||||
|
||||
|
@ -110,74 +112,94 @@ In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instru
|
|||
- `providers`: specifies the underlying implementation for serving each API endpoint
|
||||
- `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
|
||||
|
||||
#### Build a local distribution with conda
|
||||
The following command and specifications allows you to get started with building.
|
||||
```
|
||||
llama stack build <path/to/config>
|
||||
```
|
||||
- You will be required to pass in a file path to the build.config file (e.g. `./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_stack/distribution/example_configs/` folder.
|
||||
|
||||
The file will be of the contents
|
||||
```
|
||||
$ cat ./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml
|
||||
At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
|
||||
|
||||
name: 8b-instruct
|
||||
After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
|
||||
|
||||
#### Building from scratch
|
||||
- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
|
||||
```
|
||||
llama stack build
|
||||
```
|
||||
|
||||
Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
|
||||
|
||||
```
|
||||
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-llama-stack
|
||||
> Enter the image type you want your distribution to be built with (docker or conda): conda
|
||||
|
||||
Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
|
||||
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
|
||||
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
|
||||
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
|
||||
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
|
||||
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
|
||||
|
||||
> (Optional) Enter a short description for your Llama Stack distribution:
|
||||
|
||||
Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/my-local-llama-stack-build.yaml
|
||||
```
|
||||
|
||||
#### Building from templates
|
||||
- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
|
||||
|
||||
The following command will allow you to see the available templates and their corresponding providers.
|
||||
```
|
||||
llama stack build --list-templates
|
||||
```
|
||||
|
||||

|
||||
|
||||
You may then pick a template to build your distribution with providers fitted to your liking.
|
||||
|
||||
```
|
||||
llama stack build --template local-tgi --name my-tgi-stack
|
||||
```
|
||||
|
||||
```
|
||||
$ llama stack build --template local-tgi --name my-tgi-stack
|
||||
...
|
||||
...
|
||||
Build spec configuration saved at /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml
|
||||
You may now run `llama stack configure my-tgi-stack` or `llama stack configure /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml`
|
||||
```
|
||||
|
||||
#### Building from config file
|
||||
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
|
||||
|
||||
- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
|
||||
|
||||
```
|
||||
$ cat llama_stack/distribution/templates/local-ollama-build.yaml
|
||||
|
||||
name: local-ollama
|
||||
distribution_spec:
|
||||
distribution_type: local
|
||||
description: Use code from `llama_stack` itself to serve all llama stack APIs
|
||||
docker_image: null
|
||||
description: Like local, but use ollama for running LLM inference
|
||||
providers:
|
||||
inference: meta-reference
|
||||
memory: meta-reference-faiss
|
||||
inference: remote::ollama
|
||||
memory: meta-reference
|
||||
safety: meta-reference
|
||||
agentic_system: meta-reference
|
||||
telemetry: console
|
||||
agents: meta-reference
|
||||
telemetry: meta-reference
|
||||
image_type: conda
|
||||
```
|
||||
|
||||
You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
|
||||
```
|
||||
$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
|
||||
...
|
||||
...
|
||||
Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
|
||||
llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml
|
||||
```
|
||||
|
||||
After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.
|
||||
|
||||
|
||||
#### How to build distribution with different API providers using configs
|
||||
To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.
|
||||
|
||||
```
|
||||
$ cat ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml
|
||||
|
||||
name: local-tgi-conda-example
|
||||
distribution_spec:
|
||||
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
|
||||
docker_image: null
|
||||
providers:
|
||||
inference: remote::tgi
|
||||
memory: meta-reference-faiss
|
||||
safety: meta-reference
|
||||
agentic_system: meta-reference
|
||||
telemetry: console
|
||||
image_type: conda
|
||||
```
|
||||
|
||||
The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
|
||||
```
|
||||
llama stack build ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml --name tgi
|
||||
```
|
||||
|
||||
We provide some example build configs to help you get started with building with different API providers.
|
||||
|
||||
#### How to build distribution with Docker image
|
||||
To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`.
|
||||
|
||||
To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
|
||||
|
||||
```
|
||||
$ cat ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml
|
||||
llama stack build --template local --image-type docker --name docker-0
|
||||
```
|
||||
|
||||
Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
|
||||
|
||||
```
|
||||
name: local-docker-example
|
||||
distribution_spec:
|
||||
description: Use code from `llama_stack` itself to serve all llama stack APIs
|
||||
|
@ -191,9 +213,9 @@ distribution_spec:
|
|||
image_type: docker
|
||||
```
|
||||
|
||||
The following command allows you to build a Docker image with the name `docker-local`
|
||||
The following command allows you to build a Docker image with the name `<name>`
|
||||
```
|
||||
llama stack build ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml --name docker-local
|
||||
llama stack build --config <name>-build.yaml
|
||||
|
||||
Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
|
||||
WORKDIR /app
|
||||
|
@ -203,10 +225,11 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
|
|||
Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
|
||||
```
|
||||
|
||||
|
||||
## Step 2. Configure
|
||||
After our distribution is built (either in form of docker or conda environment), we will run the following command to
|
||||
```
|
||||
llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
|
||||
llama stack configure [ <name> | <docker-image-name> | <path/to/name.build.yaml>]
|
||||
```
|
||||
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
|
||||
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
|
||||
|
@ -298,7 +321,7 @@ INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
|
|||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
|
||||
> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
|
||||
|
|
5850
docs/llama-stack-spec.html
Normal file
3695
docs/llama-stack-spec.yaml
Normal file
|
@ -28,4 +28,4 @@ if [ ${#missing_packages[@]} -ne 0 ]; then
|
|||
exit 1
|
||||
fi
|
||||
|
||||
PYTHONPATH=$PYTHONPATH:../.. python -m rfcs.openapi_generator.generate $*
|
||||
PYTHONPATH=$PYTHONPATH:../.. python -m docs.openapi_generator.generate $*
|
Before Width: | Height: | Size: 128 KiB After Width: | Height: | Size: 128 KiB |
BIN
docs/resources/list-templates.png
Normal file
After Width: | Height: | Size: 220 KiB |
Before Width: | Height: | Size: 71 KiB After Width: | Height: | Size: 71 KiB |
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
|
@ -13,12 +13,12 @@ from typing import Any, Dict, List, Optional
|
|||
|
||||
import fire
|
||||
import httpx
|
||||
|
||||
from llama_stack.distribution.datatypes import RemoteProviderConfig
|
||||
from termcolor import cprint
|
||||
|
||||
from .memory import * # noqa: F403
|
||||
from .common.file_utils import data_url_from_file
|
||||
from llama_stack.distribution.datatypes import RemoteProviderConfig
|
||||
|
||||
from llama_stack.apis.memory import * # noqa: F403
|
||||
from llama_stack.providers.utils.memory.file_utils import data_url_from_file
|
||||
|
||||
|
||||
async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Memory:
|
||||
|
|
|
@ -212,7 +212,7 @@ class StackBuild(Subcommand):
|
|||
providers_for_api = all_providers[api]
|
||||
|
||||
api_provider = prompt(
|
||||
"> Enter the API provider for the {} API: (default=meta-reference): ".format(
|
||||
"> Enter provider for the {} API: (default=meta-reference): ".format(
|
||||
api.value
|
||||
),
|
||||
validator=Validator.from_callable(
|
||||
|
|
|
@ -53,46 +53,61 @@ class StackConfigure(Subcommand):
|
|||
from termcolor import cprint
|
||||
|
||||
docker_image = None
|
||||
build_config_file = Path(args.config)
|
||||
|
||||
if build_config_file.exists():
|
||||
with open(build_config_file, "r") as f:
|
||||
build_config = BuildConfig(**yaml.safe_load(f))
|
||||
self._configure_llama_distribution(build_config, args.output_dir)
|
||||
return
|
||||
|
||||
# if we get here, we need to try to find the conda build config file
|
||||
cprint(
|
||||
f"Could not find {build_config_file}. Trying conda build name instead...",
|
||||
color="green",
|
||||
)
|
||||
conda_dir = Path(os.getenv("CONDA_PREFIX")).parent / f"llamastack-{args.config}"
|
||||
build_config_file = Path(conda_dir) / f"{args.config}-build.yaml"
|
||||
|
||||
if not build_config_file.exists():
|
||||
cprint(
|
||||
f"Could not find {build_config_file}. Trying docker image name instead...",
|
||||
color="green",
|
||||
)
|
||||
docker_image = args.config
|
||||
if build_config_file.exists():
|
||||
with open(build_config_file, "r") as f:
|
||||
build_config = BuildConfig(**yaml.safe_load(f))
|
||||
|
||||
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
|
||||
if args.output_dir:
|
||||
builds_dir = Path(output_dir)
|
||||
os.makedirs(builds_dir, exist_ok=True)
|
||||
self._configure_llama_distribution(build_config, args.output_dir)
|
||||
return
|
||||
|
||||
script = pkg_resources.resource_filename(
|
||||
"llama_stack", "distribution/configure_container.sh"
|
||||
)
|
||||
script_args = [script, docker_image, str(builds_dir)]
|
||||
# if we get here, we need to try to find the docker image
|
||||
cprint(
|
||||
f"Could not find {build_config_file}. Trying docker image name instead...",
|
||||
color="green",
|
||||
)
|
||||
docker_image = args.config
|
||||
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
|
||||
if args.output_dir:
|
||||
builds_dir = Path(output_dir)
|
||||
os.makedirs(builds_dir, exist_ok=True)
|
||||
|
||||
return_code = run_with_pty(script_args)
|
||||
script = pkg_resources.resource_filename(
|
||||
"llama_stack", "distribution/configure_container.sh"
|
||||
)
|
||||
script_args = [script, docker_image, str(builds_dir)]
|
||||
|
||||
# we have regenerated the build config file with script, now check if it exists
|
||||
if return_code != 0:
|
||||
self.parser.error(
|
||||
f"Can not find {build_config_file}. Please run llama stack build first or check if docker image exists"
|
||||
)
|
||||
return_code = run_with_pty(script_args)
|
||||
|
||||
build_name = docker_image.removeprefix("llamastack-")
|
||||
saved_file = str(builds_dir / f"{build_name}-run.yaml")
|
||||
cprint(
|
||||
f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`",
|
||||
color="green",
|
||||
# we have regenerated the build config file with script, now check if it exists
|
||||
if return_code != 0:
|
||||
self.parser.error(
|
||||
f"Failed to configure container {docker_image} with return code {return_code}. Please run `llama stack build first`. "
|
||||
)
|
||||
return
|
||||
|
||||
with open(build_config_file, "r") as f:
|
||||
build_config = BuildConfig(**yaml.safe_load(f))
|
||||
|
||||
self._configure_llama_distribution(build_config, args.output_dir)
|
||||
build_name = docker_image.removeprefix("llamastack-")
|
||||
saved_file = str(builds_dir / f"{build_name}-run.yaml")
|
||||
cprint(
|
||||
f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`",
|
||||
color="green",
|
||||
)
|
||||
return
|
||||
|
||||
def _configure_llama_distribution(
|
||||
self,
|
||||
|
|
|
@ -30,12 +30,8 @@ def make_routing_entry_type(config_class: Any):
|
|||
def configure_api_providers(
|
||||
config: StackRunConfig, spec: DistributionSpec
|
||||
) -> StackRunConfig:
|
||||
cprint("Configuring APIs to serve...", "white", attrs=["bold"])
|
||||
print("Enter comma-separated list of APIs to serve:")
|
||||
|
||||
apis = config.apis_to_serve or list(spec.providers.keys())
|
||||
config.apis_to_serve = [a for a in apis if a != "telemetry"]
|
||||
print("")
|
||||
|
||||
apis = [v.value for v in stack_apis()]
|
||||
all_providers = api_providers()
|
||||
|
|
|
@ -7,15 +7,14 @@
|
|||
from typing import List
|
||||
|
||||
from llama_models.llama3.api.datatypes import Message, Role, UserMessage
|
||||
from termcolor import cprint
|
||||
|
||||
from llama_stack.apis.safety import (
|
||||
OnViolationAction,
|
||||
RunShieldRequest,
|
||||
Safety,
|
||||
ShieldDefinition,
|
||||
ShieldResponse,
|
||||
)
|
||||
from termcolor import cprint
|
||||
|
||||
|
||||
class SafetyException(Exception): # noqa: N818
|
||||
|
@ -45,10 +44,8 @@ class ShieldRunnerMixin:
|
|||
messages[0] = UserMessage(content=messages[0].content)
|
||||
|
||||
res = await self.safety_api.run_shields(
|
||||
RunShieldRequest(
|
||||
messages=messages,
|
||||
shields=shields,
|
||||
)
|
||||
messages=messages,
|
||||
shields=shields,
|
||||
)
|
||||
|
||||
results = res.responses
|
||||
|
|
|
@ -11,10 +11,10 @@ from llama_models.datatypes import ModelFamily
|
|||
from llama_models.schema_utils import json_schema_type
|
||||
from llama_models.sku_list import all_registered_models, resolve_model
|
||||
|
||||
from llama_stack.apis.inference import QuantizationConfig
|
||||
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
|
||||
from llama_stack.apis.inference import QuantizationConfig
|
||||
|
||||
|
||||
@json_schema_type
|
||||
class MetaReferenceImplConfig(BaseModel):
|
||||
|
@ -24,7 +24,7 @@ class MetaReferenceImplConfig(BaseModel):
|
|||
)
|
||||
quantization: Optional[QuantizationConfig] = None
|
||||
torch_seed: Optional[int] = None
|
||||
max_seq_len: int
|
||||
max_seq_len: int = 4096
|
||||
max_batch_size: int = 1
|
||||
|
||||
@field_validator("model")
|
||||
|
|
|
@ -2,7 +2,8 @@ blobfile
|
|||
fire
|
||||
httpx
|
||||
huggingface-hub
|
||||
llama-models>=0.0.18
|
||||
llama-models>=0.0.19
|
||||
prompt-toolkit
|
||||
python-dotenv
|
||||
pydantic
|
||||
requests
|
||||
|
|
|
@ -21,7 +21,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo
|
|||
|
||||
### Model Lifecycle
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
|
||||
|
@ -35,7 +35,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc
|
|||
|
||||
### Agentic System
|
||||
|
||||

|
||||

|
||||
|
||||
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. What’s needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
|
||||
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
|
||||
|
@ -60,12 +60,12 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll
|
|||
|
||||
We define the Llama Stack as a layer cake shown below.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||
|
||||
The API is defined in the [YAML](RFC-0001-llama-stack-assets/llama-stack-spec.yaml) and [HTML](RFC-0001-llama-stack-assets/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
|
||||
The API is defined in the [YAML](../docs/llama-stack-spec.yaml) and [HTML](../docs/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
|
||||
|
||||
|
||||
|
||||
|
|
2
setup.py
|
@ -16,7 +16,7 @@ def read_requirements():
|
|||
|
||||
setup(
|
||||
name="llama_stack",
|
||||
version="0.0.18",
|
||||
version="0.0.20",
|
||||
author="Meta Llama",
|
||||
author_email="llama-oss@meta.com",
|
||||
description="Llama Stack",
|
||||
|
|