Merge branch 'main' into models_api_2

This commit is contained in:
Xi Yan 2024-09-18 22:36:48 -07:00 committed by GitHub
commit df33e6fbec
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
30 changed files with 9689 additions and 113 deletions

View file

@ -1,5 +1,4 @@
include requirements.txt
include llama_stack/distribution/*.sh
include llama_stack/cli/scripts/*.sh
include llama_stack/distribution/example_configs/conda/*.yaml
include llama_stack/distribution/example_configs/docker/*.yaml
include llama_stack/distribution/templates/*.yaml

View file

@ -28,6 +28,7 @@ Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/
```
**`llama stack configure`**
- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
```
llama stack configure my-local-llama-stack
@ -61,6 +62,7 @@ You can now run `llama stack run my-local-llama-stack --port PORT` or `llama sta
```
**`llama stack run`**
- Run `llama stack run <name>` with the name you have previously defined.
```
llama stack run my-local-llama-stack
@ -110,74 +112,94 @@ In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instru
- `providers`: specifies the underlying implementation for serving each API endpoint
- `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment.
#### Build a local distribution with conda
The following command and specifications allows you to get started with building.
```
llama stack build <path/to/config>
```
- You will be required to pass in a file path to the build.config file (e.g. `./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_stack/distribution/example_configs/` folder.
The file will be of the contents
```
$ cat ./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml
At the end of build command, we will generate `<name>-build.yaml` file storing the build configurations.
name: 8b-instruct
After this step is complete, a file named `<name>-build.yaml` will be generated and saved at the output file path specified at the end of the command.
#### Building from scratch
- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations.
```
llama stack build
```
Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs.
```
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-llama-stack
> Enter the image type you want your distribution to be built with (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
> (Optional) Enter a short description for your Llama Stack distribution:
Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/my-local-llama-stack-build.yaml
```
#### Building from templates
- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers.
The following command will allow you to see the available templates and their corresponding providers.
```
llama stack build --list-templates
```
![alt text](resources/list-templates.png)
You may then pick a template to build your distribution with providers fitted to your liking.
```
llama stack build --template local-tgi --name my-tgi-stack
```
```
$ llama stack build --template local-tgi --name my-tgi-stack
...
...
Build spec configuration saved at /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml
You may now run `llama stack configure my-tgi-stack` or `llama stack configure /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml`
```
#### Building from config file
- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command.
- The config file will be of contents like the ones in `llama_stack/distributions/templates/`.
```
$ cat llama_stack/distribution/templates/local-ollama-build.yaml
name: local-ollama
distribution_spec:
distribution_type: local
description: Use code from `llama_stack` itself to serve all llama stack APIs
docker_image: null
description: Like local, but use ollama for running LLM inference
providers:
inference: meta-reference
memory: meta-reference-faiss
inference: remote::ollama
memory: meta-reference
safety: meta-reference
agentic_system: meta-reference
telemetry: console
agents: meta-reference
telemetry: meta-reference
image_type: conda
```
You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution.
```
$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct
...
...
Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml
llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml
```
After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`.
#### How to build distribution with different API providers using configs
To specify a different API provider, we can change the `distribution_spec` in our `<name>-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider.
```
$ cat ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml
name: local-tgi-conda-example
distribution_spec:
description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint).
docker_image: null
providers:
inference: remote::tgi
memory: meta-reference-faiss
safety: meta-reference
agentic_system: meta-reference
telemetry: console
image_type: conda
```
The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`.
```
llama stack build ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml --name tgi
```
We provide some example build configs to help you get started with building with different API providers.
#### How to build distribution with Docker image
To build a docker image, simply change the `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`.
To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type.
```
$ cat ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml
llama stack build --template local --image-type docker --name docker-0
```
Alternatively, you may use a config file and set `image_type` to `docker` in our `<name>-build.yaml` file, and run `llama stack build <name>-build.yaml`. The `<name>-build.yaml` will be of contents like:
```
name: local-docker-example
distribution_spec:
description: Use code from `llama_stack` itself to serve all llama stack APIs
@ -191,9 +213,9 @@ distribution_spec:
image_type: docker
```
The following command allows you to build a Docker image with the name `docker-local`
The following command allows you to build a Docker image with the name `<name>`
```
llama stack build ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml --name docker-local
llama stack build --config <name>-build.yaml
Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim
WORKDIR /app
@ -203,10 +225,11 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml
```
## Step 2. Configure
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [<path/to/name.build.yaml> | <docker-image-name>]
llama stack configure [ <name> | <docker-image-name> | <path/to/name.build.yaml>]
```
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
@ -298,7 +321,7 @@ INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
```
> [!NOTE]
> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`.
> [!IMPORTANT]
> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.

5850
docs/llama-stack-spec.html Normal file

File diff suppressed because it is too large Load diff

3695
docs/llama-stack-spec.yaml Normal file

File diff suppressed because it is too large Load diff

View file

@ -28,4 +28,4 @@ if [ ${#missing_packages[@]} -ne 0 ]; then
exit 1
fi
PYTHONPATH=$PYTHONPATH:../.. python -m rfcs.openapi_generator.generate $*
PYTHONPATH=$PYTHONPATH:../.. python -m docs.openapi_generator.generate $*

View file

Before

Width:  |  Height:  |  Size: 128 KiB

After

Width:  |  Height:  |  Size: 128 KiB

Before After
Before After

Binary file not shown.

After

Width:  |  Height:  |  Size: 220 KiB

View file

Before

Width:  |  Height:  |  Size: 71 KiB

After

Width:  |  Height:  |  Size: 71 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 17 KiB

Before After
Before After

View file

@ -13,12 +13,12 @@ from typing import Any, Dict, List, Optional
import fire
import httpx
from llama_stack.distribution.datatypes import RemoteProviderConfig
from termcolor import cprint
from .memory import * # noqa: F403
from .common.file_utils import data_url_from_file
from llama_stack.distribution.datatypes import RemoteProviderConfig
from llama_stack.apis.memory import * # noqa: F403
from llama_stack.providers.utils.memory.file_utils import data_url_from_file
async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Memory:

View file

@ -212,7 +212,7 @@ class StackBuild(Subcommand):
providers_for_api = all_providers[api]
api_provider = prompt(
"> Enter the API provider for the {} API: (default=meta-reference): ".format(
"> Enter provider for the {} API: (default=meta-reference): ".format(
api.value
),
validator=Validator.from_callable(

View file

@ -53,46 +53,61 @@ class StackConfigure(Subcommand):
from termcolor import cprint
docker_image = None
build_config_file = Path(args.config)
if build_config_file.exists():
with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
self._configure_llama_distribution(build_config, args.output_dir)
return
# if we get here, we need to try to find the conda build config file
cprint(
f"Could not find {build_config_file}. Trying conda build name instead...",
color="green",
)
conda_dir = Path(os.getenv("CONDA_PREFIX")).parent / f"llamastack-{args.config}"
build_config_file = Path(conda_dir) / f"{args.config}-build.yaml"
if not build_config_file.exists():
cprint(
f"Could not find {build_config_file}. Trying docker image name instead...",
color="green",
)
docker_image = args.config
if build_config_file.exists():
with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
if args.output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
self._configure_llama_distribution(build_config, args.output_dir)
return
script = pkg_resources.resource_filename(
"llama_stack", "distribution/configure_container.sh"
)
script_args = [script, docker_image, str(builds_dir)]
# if we get here, we need to try to find the docker image
cprint(
f"Could not find {build_config_file}. Trying docker image name instead...",
color="green",
)
docker_image = args.config
builds_dir = BUILDS_BASE_DIR / ImageType.docker.value
if args.output_dir:
builds_dir = Path(output_dir)
os.makedirs(builds_dir, exist_ok=True)
return_code = run_with_pty(script_args)
script = pkg_resources.resource_filename(
"llama_stack", "distribution/configure_container.sh"
)
script_args = [script, docker_image, str(builds_dir)]
# we have regenerated the build config file with script, now check if it exists
if return_code != 0:
self.parser.error(
f"Can not find {build_config_file}. Please run llama stack build first or check if docker image exists"
)
return_code = run_with_pty(script_args)
build_name = docker_image.removeprefix("llamastack-")
saved_file = str(builds_dir / f"{build_name}-run.yaml")
cprint(
f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`",
color="green",
# we have regenerated the build config file with script, now check if it exists
if return_code != 0:
self.parser.error(
f"Failed to configure container {docker_image} with return code {return_code}. Please run `llama stack build first`. "
)
return
with open(build_config_file, "r") as f:
build_config = BuildConfig(**yaml.safe_load(f))
self._configure_llama_distribution(build_config, args.output_dir)
build_name = docker_image.removeprefix("llamastack-")
saved_file = str(builds_dir / f"{build_name}-run.yaml")
cprint(
f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`",
color="green",
)
return
def _configure_llama_distribution(
self,

View file

@ -30,12 +30,8 @@ def make_routing_entry_type(config_class: Any):
def configure_api_providers(
config: StackRunConfig, spec: DistributionSpec
) -> StackRunConfig:
cprint("Configuring APIs to serve...", "white", attrs=["bold"])
print("Enter comma-separated list of APIs to serve:")
apis = config.apis_to_serve or list(spec.providers.keys())
config.apis_to_serve = [a for a in apis if a != "telemetry"]
print("")
apis = [v.value for v in stack_apis()]
all_providers = api_providers()

View file

@ -7,15 +7,14 @@
from typing import List
from llama_models.llama3.api.datatypes import Message, Role, UserMessage
from termcolor import cprint
from llama_stack.apis.safety import (
OnViolationAction,
RunShieldRequest,
Safety,
ShieldDefinition,
ShieldResponse,
)
from termcolor import cprint
class SafetyException(Exception): # noqa: N818
@ -45,10 +44,8 @@ class ShieldRunnerMixin:
messages[0] = UserMessage(content=messages[0].content)
res = await self.safety_api.run_shields(
RunShieldRequest(
messages=messages,
shields=shields,
)
messages=messages,
shields=shields,
)
results = res.responses

View file

@ -11,10 +11,10 @@ from llama_models.datatypes import ModelFamily
from llama_models.schema_utils import json_schema_type
from llama_models.sku_list import all_registered_models, resolve_model
from llama_stack.apis.inference import QuantizationConfig
from pydantic import BaseModel, Field, field_validator
from llama_stack.apis.inference import QuantizationConfig
@json_schema_type
class MetaReferenceImplConfig(BaseModel):
@ -24,7 +24,7 @@ class MetaReferenceImplConfig(BaseModel):
)
quantization: Optional[QuantizationConfig] = None
torch_seed: Optional[int] = None
max_seq_len: int
max_seq_len: int = 4096
max_batch_size: int = 1
@field_validator("model")

View file

@ -2,7 +2,8 @@ blobfile
fire
httpx
huggingface-hub
llama-models>=0.0.18
llama-models>=0.0.19
prompt-toolkit
python-dotenv
pydantic
requests

View file

@ -21,7 +21,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo
### Model Lifecycle
![Figure 1: Model Life Cycle](RFC-0001-llama-stack-assets/model-lifecycle.png)
![Figure 1: Model Life Cycle](../docs/resources/model-lifecycle.png)
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
@ -35,7 +35,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc
### Agentic System
![Figure 2: Agentic System](RFC-0001-llama-stack-assets/agentic-system.png)
![Figure 2: Agentic System](../docs/resources/agentic-system.png)
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. Whats needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
@ -60,12 +60,12 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll
We define the Llama Stack as a layer cake shown below.
![Figure 3: Llama Stack](RFC-0001-llama-stack-assets/llama-stack.png)
![Figure 3: Llama Stack](../docs/resources/llama-stack.png)
The API is defined in the [YAML](RFC-0001-llama-stack-assets/llama-stack-spec.yaml) and [HTML](RFC-0001-llama-stack-assets/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
The API is defined in the [YAML](../docs/llama-stack-spec.yaml) and [HTML](../docs/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.

View file

@ -16,7 +16,7 @@ def read_requirements():
setup(
name="llama_stack",
version="0.0.18",
version="0.0.20",
author="Meta Llama",
author_email="llama-oss@meta.com",
description="Llama Stack",