config templates restructure, docs (#262)

* wip

* config templates

* readmes
This commit is contained in:
Xi Yan 2024-10-16 23:25:10 -07:00 committed by GitHub
parent a07dfffbbf
commit d787d1e84f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
16 changed files with 57 additions and 78 deletions

View file

@ -90,10 +90,10 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please
* [CLI reference](docs/cli_reference.md) * [CLI reference](docs/cli_reference.md)
* Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. * Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
* [Getting Started](docs/getting_started.md) * [Getting Started](docs/getting_started.md)
* Guide to build and run a Llama Stack server. * Guide to start a Llama Stack server.
* [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
* [Contributing](CONTRIBUTING.md) * [Contributing](CONTRIBUTING.md)
## Llama Stack Client SDK ## Llama Stack Client SDK
| **Language** | **Client SDK** | **Package** | | **Language** | **Client SDK** | **Package** |
@ -104,3 +104,5 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | | Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) |
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.

View file

@ -1,45 +1,9 @@
# llama-stack # Getting Started with Llama Stack
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-stack)](https://pypi.org/project/llama-stack/)
[![Discord](https://img.shields.io/discord/1257833999603335178)](https://discord.gg/llama-stack)
This repository contains the specifications and implementations of the APIs which are part of the Llama Stack.
The Llama Stack defines and standardizes the building blocks needed to bring generative AI applications to market. These blocks span the entire development lifecycle: from model training and fine-tuning, through product evaluation, to invoking AI agents in production. Beyond definition, we're developing open-source versions and partnering with cloud providers, ensuring developers can assemble AI solutions using consistent, interlocking pieces across platforms. The ultimate goal is to accelerate innovation in the AI space.
The Stack APIs are rapidly improving, but still very much work in progress and we invite feedback as well as direct contributions.
## APIs
The Llama Stack consists of the following set of APIs:
- Inference
- Safety
- Memory
- Agentic System
- Evaluation
- Post Training
- Synthetic Data Generation
- Reward Scoring
Each of the APIs themselves is a collection of REST endpoints.
## API Providers
A Provider is what makes the API real -- they provide the actual implementation backing the API.
As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
## Llama Stack Distribution
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server.
## Installation ## Installation
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack` You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack`
@ -57,17 +21,14 @@ cd llama-stack
$CONDA_PREFIX/bin/pip install -e . $CONDA_PREFIX/bin/pip install -e .
``` ```
# Getting Started For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. ## Starting the Llama Stack Server
This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
You may also checkout this [notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for trying out out demo scripts.
## Quick Cheatsheet ## Quick Cheatsheet
This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
#### Via docker #### Starting up server via docker
``` ```
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
``` ```
@ -75,8 +36,12 @@ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llama
> [!NOTE] > [!NOTE]
> `~/.llama` should be the path containing downloaded weights of Llama models. > `~/.llama` should be the path containing downloaded weights of Llama models.
> [!TIP]
> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI). You can checkout [these scripts](../llama_stack/distribution/docker/README.md) to help you get started.
#### Build->Configure->Run via conda
You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
#### Via conda
**`llama stack build`** **`llama stack build`**
- You'll be prompted to enter build information interactively. - You'll be prompted to enter build information interactively.
``` ```
@ -445,4 +410,7 @@ Similarly you can test safety (if you configured llama-guard and/or prompt-guard
python -m llama_stack.apis.safety.client localhost 5000 python -m llama_stack.apis.safety.client localhost 5000
``` ```
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.

View file

@ -1,4 +1,4 @@
name: local name: local-gpu
distribution_spec: distribution_spec:
description: Use code from `llama_stack` itself to serve all llama stack APIs description: Use code from `llama_stack` itself to serve all llama stack APIs
providers: providers:
@ -7,4 +7,4 @@ distribution_spec:
safety: meta-reference safety: meta-reference
agents: meta-reference agents: meta-reference
telemetry: meta-reference telemetry: meta-reference
image_type: conda image_type: docker

View file

@ -1,11 +1,11 @@
name: local-gpu name: local-tgi-chroma
distribution_spec: distribution_spec:
description: local meta reference description: remote tgi inference + chromadb memory
docker_image: null docker_image: null
providers: providers:
inference: meta-reference inference: remote::tgi
safety: meta-reference safety: meta-reference
agents: meta-reference agents: meta-reference
memory: meta-reference memory: remote::chromadb
telemetry: meta-reference telemetry: meta-reference
image_type: docker image_type: docker

View file

@ -1,16 +1,16 @@
version: '2' version: '2'
built_at: '2024-10-08T17:42:33.690666' built_at: '2024-10-08T17:40:45.325529'
image_name: local-gpu image_name: local
docker_image: local-gpu docker_image: null
conda_env: null conda_env: local
apis: apis:
- memory
- inference
- agents
- shields - shields
- safety - agents
- models - models
- memory
- memory_banks - memory_banks
- inference
- safety
providers: providers:
inference: inference:
- provider_id: meta-reference - provider_id: meta-reference
@ -25,8 +25,13 @@ providers:
- provider_id: meta-reference - provider_id: meta-reference
provider_type: meta-reference provider_type: meta-reference
config: config:
llama_guard_shield: null llama_guard_shield:
prompt_guard_shield: null model: Llama-Guard-3-1B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
memory: memory:
- provider_id: meta-reference - provider_id: meta-reference
provider_type: meta-reference provider_type: meta-reference

View file

@ -1,29 +1,33 @@
version: '2' version: '2'
built_at: '2024-10-08T17:42:07.505267' built_at: '2024-10-08T17:40:45.325529'
image_name: local-cpu image_name: local
docker_image: local-cpu docker_image: null
conda_env: null conda_env: local
apis: apis:
- shields
- agents - agents
- inference
- models - models
- memory - memory
- safety
- shields
- memory_banks - memory_banks
- inference
- safety
providers: providers:
inference: inference:
- provider_id: remote::ollama - provider_id: tgi0
provider_type: remote::ollama provider_type: remote::tgi
config: config:
host: localhost url: http://127.0.0.1:5009
port: 6000
safety: safety:
- provider_id: meta-reference - provider_id: meta-reference
provider_type: meta-reference provider_type: meta-reference
config: config:
llama_guard_shield: null llama_guard_shield:
prompt_guard_shield: null model: Llama-Guard-3-1B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
memory: memory:
- provider_id: meta-reference - provider_id: meta-reference
provider_type: meta-reference provider_type: meta-reference