mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-24 16:57:21 +00:00 
			
		
		
		
	# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Added coverage badge to README. - [See my
fork](https://github.com/ChristianZaccaria/llama-stack)
- Added a GitHub Actions workflow that runs the tests and updates the
coverage badge. - [See
run](4574811323)
- Documented steps in `testing.md` for running the tests locally, and
viewing the `html` report.
- Excluded non-essential files from coverage reporting to provide a more
accurate measurement.
Automatically created PR to update coverage badge:
https://github.com/ChristianZaccaria/llama-stack/pull/9
# Note for reviewers
1. Currently the coverage report shows a 45% coverage. Wondering if
there are other files or directories that should also be excluded from
the report to increase the percentage. The directories with the least
test coverage are `llama_stack/cli`, `llama_stack/models`, and
`llama_stack/ui`. - Should we exclude these?
2. **[Required]** The `GITHUB_TOKEN` should have write permissions to
open a PR to update the coverage badge.
# GitHub Issue
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2355 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
The `testing.md` file describes how to run the unit tests locally.
		
	
			
		
			
				
	
	
		
			178 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			178 lines
		
	
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Llama Stack
 | |
| 
 | |
| [](https://pypi.org/project/llama_stack/)
 | |
| [](https://pypi.org/project/llama-stack/)
 | |
| [](https://github.com/meta-llama/llama-stack/blob/main/LICENSE)
 | |
| [](https://discord.gg/llama-stack)
 | |
| [](https://github.com/meta-llama/llama-stack/actions/workflows/unit-tests.yml?query=branch%3Amain)
 | |
| [](https://github.com/meta-llama/llama-stack/actions/workflows/integration-tests.yml?query=branch%3Amain)
 | |
| 
 | |
| 
 | |
| [**Quick Start**](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html) | [**Documentation**](https://llama-stack.readthedocs.io/en/latest/index.html) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
 | |
| 
 | |
| ### ✨🎉 Llama 4 Support  🎉✨
 | |
| We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.
 | |
| 
 | |
| <details>
 | |
| 
 | |
| <summary>👋 Click here to see how to run Llama 4 models on Llama Stack </summary>
 | |
| 
 | |
| \
 | |
| *Note you need 8xH100 GPU-host to run these models*
 | |
| 
 | |
| ```bash
 | |
| pip install -U llama_stack
 | |
| 
 | |
| MODEL="Llama-4-Scout-17B-16E-Instruct"
 | |
| # get meta url from llama.com
 | |
| llama model download --source meta --model-id $MODEL --meta-url <META_URL>
 | |
| 
 | |
| # start a llama stack server
 | |
| INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu
 | |
| 
 | |
| # install client to interact with the server
 | |
| pip install llama-stack-client
 | |
| ```
 | |
| ### CLI
 | |
| ```bash
 | |
| # Run a chat completion
 | |
| MODEL="Llama-4-Scout-17B-16E-Instruct"
 | |
| 
 | |
| llama-stack-client --endpoint http://localhost:8321 \
 | |
| inference chat-completion \
 | |
| --model-id meta-llama/$MODEL \
 | |
| --message "write a haiku for meta's llama 4 models"
 | |
| 
 | |
| ChatCompletionResponse(
 | |
|     completion_message=CompletionMessage(content="Whispers in code born\nLlama's gentle, wise heartbeat\nFuture's soft unfold", role='assistant', stop_reason='end_of_turn', tool_calls=[]),
 | |
|     logprobs=None,
 | |
|     metrics=[Metric(metric='prompt_tokens', value=21.0, unit=None), Metric(metric='completion_tokens', value=28.0, unit=None), Metric(metric='total_tokens', value=49.0, unit=None)]
 | |
| )
 | |
| ```
 | |
| ### Python SDK
 | |
| ```python
 | |
| from llama_stack_client import LlamaStackClient
 | |
| 
 | |
| client = LlamaStackClient(base_url=f"http://localhost:8321")
 | |
| 
 | |
| model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
 | |
| prompt = "Write a haiku about coding"
 | |
| 
 | |
| print(f"User> {prompt}")
 | |
| response = client.inference.chat_completion(
 | |
|     model_id=model_id,
 | |
|     messages=[
 | |
|         {"role": "system", "content": "You are a helpful assistant."},
 | |
|         {"role": "user", "content": prompt},
 | |
|     ],
 | |
| )
 | |
| print(f"Assistant> {response.completion_message.content}")
 | |
| ```
 | |
| As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!
 | |
| 
 | |
| 
 | |
| </details>
 | |
| 
 | |
| ### 🚀 One-Line Installer 🚀
 | |
| 
 | |
| To try Llama Stack locally, run:
 | |
| 
 | |
| ```bash
 | |
| curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/scripts/install.sh | bash
 | |
| ```
 | |
| 
 | |
| ### Overview
 | |
| 
 | |
| Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides
 | |
| 
 | |
| - **Unified API layer** for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
 | |
| - **Plugin architecture** to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
 | |
| - **Prepackaged verified distributions** which offer a one-stop solution for developers to get started quickly and reliably in any environment.
 | |
| - **Multiple developer interfaces** like CLI and SDKs for Python, Typescript, iOS, and Android.
 | |
| - **Standalone applications** as examples for how to build production-grade AI applications with Llama Stack.
 | |
| 
 | |
| <div style="text-align: center;">
 | |
|   <img
 | |
|     src="https://github.com/user-attachments/assets/33d9576d-95ea-468d-95e2-8fa233205a50"
 | |
|     width="480"
 | |
|     title="Llama Stack"
 | |
|     alt="Llama Stack"
 | |
|   />
 | |
| </div>
 | |
| 
 | |
| ### Llama Stack Benefits
 | |
| - **Flexible Options**: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
 | |
| - **Consistent Experience**: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
 | |
| - **Robust Ecosystem**: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
 | |
| 
 | |
| By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.
 | |
| 
 | |
| ### API Providers
 | |
| Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.
 | |
| Please checkout for [full list](https://llama-stack.readthedocs.io/en/latest/providers/index.html)
 | |
| 
 | |
| | API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Telemetry | Post Training | Eval | DatasetIO |
 | |
| |:-------------------:|:------------:|:------:|:---------:|:--------:|:------:|:---------:|:-------------:|:----:|:--------:|
 | |
| | Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 | |
| | SambaNova | Hosted | | ✅ | | ✅ | | | | |
 | |
| | Cerebras | Hosted | | ✅ | | | | | | |
 | |
| | Fireworks | Hosted | ✅ | ✅ | ✅ | | | | | |
 | |
| | AWS Bedrock | Hosted | | ✅ | | ✅ | | | | |
 | |
| | Together | Hosted | ✅ | ✅ | | ✅ | | | | |
 | |
| | Groq | Hosted | | ✅ | | | | | | |
 | |
| | Ollama | Single Node | | ✅ | | | | | | |
 | |
| | TGI | Hosted/Single Node | | ✅ | | | | | | |
 | |
| | NVIDIA NIM | Hosted/Single Node | | ✅ | | ✅ | | | | |
 | |
| | ChromaDB | Hosted/Single Node | | | ✅ | | | | | |
 | |
| | PG Vector | Single Node | | | ✅ | | | | | |
 | |
| | PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | | | | | | |
 | |
| | vLLM | Single Node | | ✅ | | | | | | |
 | |
| | OpenAI | Hosted | | ✅ | | | | | | |
 | |
| | Anthropic | Hosted | | ✅ | | | | | | |
 | |
| | Gemini | Hosted | | ✅ | | | | | | |
 | |
| | WatsonX | Hosted | | ✅ | | | | | | |
 | |
| | HuggingFace | Single Node | | | | | | ✅ | | ✅ |
 | |
| | TorchTune | Single Node | | | | | | ✅ | | |
 | |
| | NVIDIA NEMO | Hosted | | ✅ | ✅ | | | ✅ | ✅ | ✅ |
 | |
| | NVIDIA | Hosted | | | | | | ✅ | ✅ | ✅ |
 | |
| 
 | |
| > **Note**: Additional providers are available through external packages. See [External Providers](https://llama-stack.readthedocs.io/en/latest/providers/external.html) documentation.
 | |
| 
 | |
| ### Distributions
 | |
| 
 | |
| A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code.
 | |
| Here are some of the distributions we support:
 | |
| 
 | |
| |               **Distribution**                |                                                                    **Llama Stack Docker**                                                                     |                                                 Start This Distribution                                                  |
 | |
| |:---------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|
 | |
| |                Starter Distribution                 |           [llamastack/distribution-starter](https://hub.docker.com/repository/docker/llamastack/distribution-starter/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/starter.html)      |
 | |
| |                Meta Reference                 |           [llamastack/distribution-meta-reference-gpu](https://hub.docker.com/repository/docker/llamastack/distribution-meta-reference-gpu/general)           |      [Guide](https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html)      |
 | |
| |                   PostgreSQL                  |                [llamastack/distribution-postgres-demo](https://hub.docker.com/repository/docker/llamastack/distribution-postgres-demo/general)                |                  |
 | |
| 
 | |
| ### Documentation
 | |
| 
 | |
| Please checkout our [Documentation](https://llama-stack.readthedocs.io/en/latest/index.html) page for more details.
 | |
| 
 | |
| * CLI references
 | |
|     * [llama (server-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html): Guide for using the `llama` CLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution.
 | |
|     * [llama (client-side) CLI Reference](https://llama-stack.readthedocs.io/en/latest/references/llama_stack_client_cli_reference.html): Guide for using the `llama-stack-client` CLI, which allows you to query information about the distribution.
 | |
| * Getting Started
 | |
|     * [Quick guide to start a Llama Stack server](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).
 | |
|     * [Jupyter notebook](./docs/getting_started.ipynb) to walk-through how to use simple text and vision inference llama_stack_client APIs
 | |
|     * The complete Llama Stack lesson [Colab notebook](https://colab.research.google.com/drive/1dtVmxotBsI4cGZQNsJRYPrLiDeT0Wnwt) of the new [Llama 3.2 course on Deeplearning.ai](https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/8/llama-stack).
 | |
|     * A [Zero-to-Hero Guide](https://github.com/meta-llama/llama-stack/tree/main/docs/zero_to_hero_guide) that guide you through all the key components of llama stack with code samples.
 | |
| * [Contributing](CONTRIBUTING.md)
 | |
|     * [Adding a new API Provider](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) to walk-through how to add a new API provider.
 | |
| 
 | |
| ### Llama Stack Client SDKs
 | |
| 
 | |
| |  **Language** |  **Client SDK** | **Package** |
 | |
| | :----: | :----: | :----: |
 | |
| | Python |  [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [](https://pypi.org/project/llama_stack_client/)
 | |
| | Swift  | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift) | [](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)
 | |
| | Typescript   | [llama-stack-client-typescript](https://github.com/meta-llama/llama-stack-client-typescript) | [](https://npmjs.org/package/llama-stack-client)
 | |
| | Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)
 | |
| 
 | |
| Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [typescript](https://github.com/meta-llama/llama-stack-client-typescript), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
 | |
| 
 | |
| You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
 |