mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-27 10:46:41 +00:00
Some checks failed
Installer CI / lint (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Installer CI / smoke-test (push) Has been skipped
Integration Tests / test-matrix (http, 3.10, providers) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.10, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 4s
Integration Tests / test-matrix (http, 3.10, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.10, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 12s
Integration Tests / test-matrix (http, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, vector_io) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.10) (push) Failing after 4s
Unit Tests / unit-tests (3.11) (push) Failing after 4s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 50s
# What does this PR do? Our "run this line to get started" pipes into `sh`, but the default shell on Ubuntu (a common setup) is `dash`, which doesn't support `pipefail`: ``` dalton@ollama-test:~$ ls -l /usr/bin/sh lrwxrwxrwx 1 root root 4 Mar 31 2024 /usr/bin/sh -> dash ``` ``` $ curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | sh sh: 8: set: Illegal option -o pipefail ``` Let's be explicit with `bash`? It covers Linux, WSL, macOS and I doubt anyone's trying to run Llama Stack on embedded systems :) ## Test Plan ``` dalton@ollama-test:~/llama-stack$ cat install.sh | sh This script must be run with bash dalton@ollama-test:~/llama-stack$ cat install.sh | bash ❌ Docker or Podman is required. Install Docker: https://docs.docker.com/get-docker/ or Podman: https://podman.io/getting-started/installation ```
177 lines
15 KiB
Markdown
177 lines
15 KiB
Markdown
# Llama Stack
|
|
|
|
[](https://pypi.org/project/llama_stack/)
|
|
[](https://pypi.org/project/llama-stack/)
|
|
[](https://github.com/meta-llama/llama-stack/blob/main/LICENSE)
|
|
[](https://discord.gg/llama-stack)
|
|
[](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
|
|
[](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)
|
|
|
|
[**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
|
|
|
|
### ✨🎉 Llama 4 Support 🎉✨
|
|
We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.
|
|
|
|
<details>
|
|
|
|
<summary>👋 Click here to see how to run Llama 4 models on Llama Stack </summary>
|
|
|
|
\
|
|
*Note you need 8xH100 GPU-host to run these models*
|
|
|
|
```bash
|
|
pip install -U llama_stack
|
|
|
|
MODEL="Llama-4-Scout-17B-16E-Instruct"
|
|
# get meta url from llama.com
|
|
llama model download --source meta --model-id $MODEL --meta-url <META_URL>
|
|
|
|
# start a llama stack server
|
|
INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu
|
|
|
|
# install client to interact with the server
|
|
pip install llama-stack-client
|
|
```
|
|
### CLI
|
|
```bash
|
|
# Run a chat completion
|
|
llama-stack-client --endpoint http://localhost:8321 \
|
|
inference chat-completion \
|
|
--model-id meta-llama/$MODEL \
|
|
--message "write a haiku for meta's llama 4 models"
|
|
|
|
ChatCompletionResponse(
|
|
completion_message=CompletionMessage(content="Whispers in code born\nLlama's gentle, wise heartbeat\nFuture's soft unfold", role='assistant', stop_reason='end_of_turn', tool_calls=[]),
|
|
logprobs=None,
|
|
metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]
|
|
)
|
|
```
|
|
### Python SDK
|
|
```python
|
|
from llama_stack_client import LlamaStackClient
|
|
|
|
client = LlamaStackClient(base_url=f"http://localhost:8321")
|
|
|
|
model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
|
|
prompt = "Write a haiku about coding"
|
|
|
|
print(f"User> {prompt}")
|
|
response = client.inference.chat_completion(
|
|
model_id=model_id,
|
|
messages=[
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": prompt},
|
|
],
|
|
)
|
|
print(f"Assistant> {response.completion_message.content}")
|
|
```
|
|
As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!
|
|
|
|
|
|
</details>
|
|
|
|
### 🚀 One-Line Installer 🚀
|
|
|
|
To try Llama Stack locally, run:
|
|
|
|
```bash
|
|
curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | bash
|
|
```
|
|
|
|
### Overview
|
|
|
|
Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides
|
|
|
|
- **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
|
|
- **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
|
|
- **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment.
|
|
- **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android.
|
|
- **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack.
|
|
|
|
<div style="text-align: center;">
|
|
<img
|
|
src="https://github.com/user-attachments/assets/33d9576d-95ea-468d-95e2-8fa233205a50"
|
|
width="480"
|
|
title="Llama Stack"
|
|
alt="Llama Stack"
|
|
/>
|
|
</div>
|
|
|
|
### Llama Stack Benefits
|
|
- **Flexible Options**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
|
|
- **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
|
|
- **Robust Ecosystem**: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
|
|
|
|
By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.
|
|
|
|
### API Providers
|
|
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
|
|
|
|
| **API Provider Builder** | **Environments** | **Agents** | **Inference** | **Memory** | **Safety** | **Telemetry** | **Post Training** |
|
|
|:------------------------:|:----------------------:|:----------:|:-------------:|:----------:|:----------:|:-------------:|:-----------------:|
|
|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | |
|
|
| SambaNova | Hosted | | ✅ | | ✅ | | |
|
|
| Cerebras | Hosted | | ✅ | | | | |
|
|
| Fireworks | Hosted | ✅ | ✅ | ✅ | | | |
|
|
| AWS Bedrock | Hosted | | ✅ | | ✅ | | |
|
|
| Together | Hosted | ✅ | ✅ | | ✅ | | |
|
|
| Groq | Hosted | | ✅ | | | | |
|
|
| Ollama | Single Node | | ✅ | | | | |
|
|
| TGI | Hosted and Single Node | | ✅ | | | | |
|
|
| NVIDIA NIM | Hosted and Single Node | | ✅ | | | | |
|
|
| Chroma | Single Node | | | ✅ | | | |
|
|
| PG Vector | Single Node | | | ✅ | | | |
|
|
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | |
|
|
| vLLM | Hosted and Single Node | | ✅ | | | | |
|
|
| OpenAI | Hosted | | ✅ | | | | |
|
|
| Anthropic | Hosted | | ✅ | | | | |
|
|
| Gemini | Hosted | | ✅ | | | | |
|
|
| watsonx | Hosted | | ✅ | | | | |
|
|
| HuggingFace | Single Node | | | | | | ✅ |
|
|
| TorchTune | Single Node | | | | | | ✅ |
|
|
| NVIDIA NEMO | Hosted | | | | | | ✅ |
|
|
|
|
|
|
### Distributions
|
|
|
|
A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:
|
|
|
|
| **Distribution** | **Llama Stack Docker** | Start This Distribution |
|
|
|:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
|
|
| Meta Reference | [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html) |
|
|
| SambaNova | [llamastack/distribution-sambanova](https://hub.docker.com/repository/docker/llamastack/distribution-sambanova/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/sambanova.html) |
|
|
| Cerebras | [llamastack/distribution-cerebras](https://hub.docker.com/repository/docker/llamastack/distribution-cerebras/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/cerebras.html) |
|
|
| Ollama | [llamastack/distribution-ollama](https://hub.docker.com/repository/docker/llamastack/distribution-ollama/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/ollama.html) |
|
|
| TGI | [llamastack/distribution-tgi](https://hub.docker.com/repository/docker/llamastack/distribution-tgi/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/tgi.html) |
|
|
| Together | [llamastack/distribution-together](https://hub.docker.com/repository/docker/llamastack/distribution-together/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/together.html) |
|
|
| Fireworks | [llamastack/distribution-fireworks](https://hub.docker.com/repository/docker/llamastack/distribution-fireworks/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/fireworks.html) |
|
|
| vLLM | [llamastack/distribution-remote-vllm](https://hub.docker.com/repository/docker/llamastack/distribution-remote-vllm/general) | [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/remote-vllm.html) |
|
|
|
|
|
|
### Documentation
|
|
|
|
Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details.
|
|
|
|
* CLI references
|
|
* [llama (server-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
|
|
* [llama (client-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_stack_client_cli_reference.html): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution.
|
|
* Getting Started
|
|
* [Quick guide to start a Llama Stack server](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).
|
|
* [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
|
|
* The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).
|
|
* A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.
|
|
* [Contributing](CONTRIBUTING.md)
|
|
* [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider.
|
|
|
|
### Llama Stack Client SDKs
|
|
|
|
| **Language** | **Client SDK** | **Package** |
|
|
| :----: | :----: | :----: |
|
|
| Python | [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [](https://pypi.org/project/llama_stack_client/)
|
|
| Swift | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift) | [](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)
|
|
| Typescript | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [](https://npmjs.org/package/llama-stack-client)
|
|
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)
|
|
|
|
Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
|
|
|
|
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
|