mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-27 18:50:41 +00:00
config templates restructure, docs (#262)
* wip * config templates * readmes
This commit is contained in:
parent
a07dfffbbf
commit
d787d1e84f
16 changed files with 57 additions and 78 deletions
|
@ -90,10 +90,10 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please
|
||||||
* [CLI reference](docs/cli_reference.md)
|
* [CLI reference](docs/cli_reference.md)
|
||||||
* Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
|
* Guide using `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
|
||||||
* [Getting Started](docs/getting_started.md)
|
* [Getting Started](docs/getting_started.md)
|
||||||
* Guide to build and run a Llama Stack server.
|
* Guide to start a Llama Stack server.
|
||||||
|
* [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
|
||||||
* [Contributing](CONTRIBUTING.md)
|
* [Contributing](CONTRIBUTING.md)
|
||||||
|
|
||||||
|
|
||||||
## Llama Stack Client SDK
|
## Llama Stack Client SDK
|
||||||
|
|
||||||
| **Language** | **Client SDK** | **Package** |
|
| **Language** | **Client SDK** | **Package** |
|
||||||
|
@ -104,3 +104,5 @@ The `llama` CLI makes it easy to work with the Llama Stack set of tools. Please
|
||||||
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) |
|
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) |
|
||||||
|
|
||||||
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
|
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
|
||||||
|
|
||||||
|
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
|
||||||
|
|
|
@ -1,45 +1,9 @@
|
||||||
# llama-stack
|
# Getting Started with Llama Stack
|
||||||
|
|
||||||
[](https://pypi.org/project/llama-stack/)
|
|
||||||
[](https://discord.gg/llama-stack)
|
|
||||||
|
|
||||||
This repository contains the specifications and implementations of the APIs which are part of the Llama Stack.
|
|
||||||
|
|
||||||
The Llama Stack defines and standardizes the building blocks needed to bring generative AI applications to market. These blocks span the entire development lifecycle: from model training and fine-tuning, through product evaluation, to invoking AI agents in production. Beyond definition, we're developing open-source versions and partnering with cloud providers, ensuring developers can assemble AI solutions using consistent, interlocking pieces across platforms. The ultimate goal is to accelerate innovation in the AI space.
|
|
||||||
|
|
||||||
The Stack APIs are rapidly improving, but still very much work in progress and we invite feedback as well as direct contributions.
|
|
||||||
|
|
||||||
|
|
||||||
## APIs
|
|
||||||
|
|
||||||
The Llama Stack consists of the following set of APIs:
|
|
||||||
|
|
||||||
- Inference
|
|
||||||
- Safety
|
|
||||||
- Memory
|
|
||||||
- Agentic System
|
|
||||||
- Evaluation
|
|
||||||
- Post Training
|
|
||||||
- Synthetic Data Generation
|
|
||||||
- Reward Scoring
|
|
||||||
|
|
||||||
Each of the APIs themselves is a collection of REST endpoints.
|
|
||||||
|
|
||||||
## API Providers
|
|
||||||
|
|
||||||
A Provider is what makes the API real -- they provide the actual implementation backing the API.
|
|
||||||
|
|
||||||
As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
|
|
||||||
|
|
||||||
A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
|
|
||||||
|
|
||||||
|
|
||||||
## Llama Stack Distribution
|
|
||||||
|
|
||||||
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
|
|
||||||
|
|
||||||
|
This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server.
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
|
||||||
|
|
||||||
You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack`
|
You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack`
|
||||||
|
|
||||||
|
@ -57,17 +21,14 @@ cd llama-stack
|
||||||
$CONDA_PREFIX/bin/pip install -e .
|
$CONDA_PREFIX/bin/pip install -e .
|
||||||
```
|
```
|
||||||
|
|
||||||
# Getting Started
|
For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
|
||||||
|
|
||||||
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
|
## Starting the Llama Stack Server
|
||||||
|
|
||||||
This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
|
|
||||||
|
|
||||||
You may also checkout this [notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) for trying out out demo scripts.
|
|
||||||
|
|
||||||
## Quick Cheatsheet
|
## Quick Cheatsheet
|
||||||
|
This guides allows you to quickly get started with building and running a Llama Stack server in < 5 minutes!
|
||||||
|
|
||||||
#### Via docker
|
#### Starting up server via docker
|
||||||
```
|
```
|
||||||
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
|
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
|
||||||
```
|
```
|
||||||
|
@ -75,8 +36,12 @@ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llama
|
||||||
> [!NOTE]
|
> [!NOTE]
|
||||||
> `~/.llama` should be the path containing downloaded weights of Llama models.
|
> `~/.llama` should be the path containing downloaded weights of Llama models.
|
||||||
|
|
||||||
|
> [!TIP]
|
||||||
|
> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI). You can checkout [these scripts](../llama_stack/distribution/docker/README.md) to help you get started.
|
||||||
|
|
||||||
|
#### Build->Configure->Run via conda
|
||||||
|
You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
|
||||||
|
|
||||||
#### Via conda
|
|
||||||
**`llama stack build`**
|
**`llama stack build`**
|
||||||
- You'll be prompted to enter build information interactively.
|
- You'll be prompted to enter build information interactively.
|
||||||
```
|
```
|
||||||
|
@ -445,4 +410,7 @@ Similarly you can test safety (if you configured llama-guard and/or prompt-guard
|
||||||
python -m llama_stack.apis.safety.client localhost 5000
|
python -m llama_stack.apis.safety.client localhost 5000
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
|
||||||
|
|
||||||
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
|
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
|
||||||
|
|
|
@ -1,4 +1,4 @@
|
||||||
name: local
|
name: local-gpu
|
||||||
distribution_spec:
|
distribution_spec:
|
||||||
description: Use code from `llama_stack` itself to serve all llama stack APIs
|
description: Use code from `llama_stack` itself to serve all llama stack APIs
|
||||||
providers:
|
providers:
|
||||||
|
@ -7,4 +7,4 @@ distribution_spec:
|
||||||
safety: meta-reference
|
safety: meta-reference
|
||||||
agents: meta-reference
|
agents: meta-reference
|
||||||
telemetry: meta-reference
|
telemetry: meta-reference
|
||||||
image_type: conda
|
image_type: docker
|
|
@ -1,11 +1,11 @@
|
||||||
name: local-gpu
|
name: local-tgi-chroma
|
||||||
distribution_spec:
|
distribution_spec:
|
||||||
description: local meta reference
|
description: remote tgi inference + chromadb memory
|
||||||
docker_image: null
|
docker_image: null
|
||||||
providers:
|
providers:
|
||||||
inference: meta-reference
|
inference: remote::tgi
|
||||||
safety: meta-reference
|
safety: meta-reference
|
||||||
agents: meta-reference
|
agents: meta-reference
|
||||||
memory: meta-reference
|
memory: remote::chromadb
|
||||||
telemetry: meta-reference
|
telemetry: meta-reference
|
||||||
image_type: docker
|
image_type: docker
|
|
@ -1,16 +1,16 @@
|
||||||
version: '2'
|
version: '2'
|
||||||
built_at: '2024-10-08T17:42:33.690666'
|
built_at: '2024-10-08T17:40:45.325529'
|
||||||
image_name: local-gpu
|
image_name: local
|
||||||
docker_image: local-gpu
|
docker_image: null
|
||||||
conda_env: null
|
conda_env: local
|
||||||
apis:
|
apis:
|
||||||
- memory
|
|
||||||
- inference
|
|
||||||
- agents
|
|
||||||
- shields
|
- shields
|
||||||
- safety
|
- agents
|
||||||
- models
|
- models
|
||||||
|
- memory
|
||||||
- memory_banks
|
- memory_banks
|
||||||
|
- inference
|
||||||
|
- safety
|
||||||
providers:
|
providers:
|
||||||
inference:
|
inference:
|
||||||
- provider_id: meta-reference
|
- provider_id: meta-reference
|
||||||
|
@ -25,8 +25,13 @@ providers:
|
||||||
- provider_id: meta-reference
|
- provider_id: meta-reference
|
||||||
provider_type: meta-reference
|
provider_type: meta-reference
|
||||||
config:
|
config:
|
||||||
llama_guard_shield: null
|
llama_guard_shield:
|
||||||
prompt_guard_shield: null
|
model: Llama-Guard-3-1B
|
||||||
|
excluded_categories: []
|
||||||
|
disable_input_check: false
|
||||||
|
disable_output_check: false
|
||||||
|
prompt_guard_shield:
|
||||||
|
model: Prompt-Guard-86M
|
||||||
memory:
|
memory:
|
||||||
- provider_id: meta-reference
|
- provider_id: meta-reference
|
||||||
provider_type: meta-reference
|
provider_type: meta-reference
|
|
@ -1,29 +1,33 @@
|
||||||
version: '2'
|
version: '2'
|
||||||
built_at: '2024-10-08T17:42:07.505267'
|
built_at: '2024-10-08T17:40:45.325529'
|
||||||
image_name: local-cpu
|
image_name: local
|
||||||
docker_image: local-cpu
|
docker_image: null
|
||||||
conda_env: null
|
conda_env: local
|
||||||
apis:
|
apis:
|
||||||
|
- shields
|
||||||
- agents
|
- agents
|
||||||
- inference
|
|
||||||
- models
|
- models
|
||||||
- memory
|
- memory
|
||||||
- safety
|
|
||||||
- shields
|
|
||||||
- memory_banks
|
- memory_banks
|
||||||
|
- inference
|
||||||
|
- safety
|
||||||
providers:
|
providers:
|
||||||
inference:
|
inference:
|
||||||
- provider_id: remote::ollama
|
- provider_id: tgi0
|
||||||
provider_type: remote::ollama
|
provider_type: remote::tgi
|
||||||
config:
|
config:
|
||||||
host: localhost
|
url: http://127.0.0.1:5009
|
||||||
port: 6000
|
|
||||||
safety:
|
safety:
|
||||||
- provider_id: meta-reference
|
- provider_id: meta-reference
|
||||||
provider_type: meta-reference
|
provider_type: meta-reference
|
||||||
config:
|
config:
|
||||||
llama_guard_shield: null
|
llama_guard_shield:
|
||||||
prompt_guard_shield: null
|
model: Llama-Guard-3-1B
|
||||||
|
excluded_categories: []
|
||||||
|
disable_input_check: false
|
||||||
|
disable_output_check: false
|
||||||
|
prompt_guard_shield:
|
||||||
|
model: Prompt-Guard-86M
|
||||||
memory:
|
memory:
|
||||||
- provider_id: meta-reference
|
- provider_id: meta-reference
|
||||||
provider_type: meta-reference
|
provider_type: meta-reference
|
Loading…
Add table
Add a link
Reference in a new issue