Composable building blocks to build Llama Apps https://llama-stack.readthedocs.io
Find a file
Matthew Farrellee 0e13512dd7
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m14s
chore: fix agents tests for non-ollama providers, provide max_tokens (#3657)
# What does this PR do?

closes #3656 


## Test Plan

openai is not enabled in ci, so manual testing with:

```
$ ./scripts/integration-tests.sh --stack-config ci-tests --suite base --setup gpt --subdirs agents --inference-mode live                            
=== Llama Stack Integration Test Runner ===
Stack Config: ci-tests
Setup: gpt
Inference Mode: live
Test Suite: base
Test Subdirs: agents
Test Pattern: 

Checking llama packages
llama-stack                              0.2.23          .../llama-stack
llama-stack-client                       0.3.0a3
ollama                                   0.5.1
=== System Resources Before Tests ===
...
=== Applying Setup Environment Variables ===
Setting up environment variables:
=== Running Integration Tests ===
Test subdirs to run: agents
Added test files from agents: 3 files

=== Running all collected tests in a single pytest command ===
Total test files: 3
+ pytest -s -v tests/integration/agents/test_persistence.py tests/integration/agents/test_openai_responses.py tests/integration/agents/test_agents.py --stack-config=ci-tests --inference-mode=live -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag )' --setup=gpt --color=yes --capture=tee-sys
WARNING  2025-10-02 13:14:32,653 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,043 root:258 uncategorized: Unknown logging category: tests. Falling back to default 'root' level: 20                    
INFO     2025-10-02 13:14:33,063 tests.integration.conftest:86 tests: Applying setup 'gpt'                                                            
========================================= test session starts ==========================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- .../.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'html': '4.1.1', 'anyio': '4.9.0', 'timeout': '2.4.0', 'cov': '6.2.1', 'asyncio': '1.1.0', 'nbval': '0.11.0', 'socket': '0.7.0', 'json-report': '1.5.0', 'metadata': '3.1.1'}}
rootdir: ...
configfile: pyproject.toml
plugins: html-4.1.1, anyio-4.9.0, timeout-2.4.0, cov-6.2.1, asyncio-1.1.0, nbval-0.11.0, socket-0.7.0, json-report-1.5.0, metadata-3.1.1
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items / 6 deselected / 26 selected                                                        

tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [  3%]
tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [  7%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] 
instantiating llama_stack_client
WARNING  2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20                  
WARNING  2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
INFO     2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20                    
WARNING  2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20                
WARNING  2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20             
WARNING  2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20          
WARNING  2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20        
INFO     2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
WARNING  2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20         
WARNING  2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20            
WARNING  2025-10-02 13:14:36,726 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass    
         Fireworks API Key in the header X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}                                         
WARNING  2025-10-02 13:14:36,727 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass     
         Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}                                           
WARNING  2025-10-02 13:14:38,404 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
WARNING  2025-10-02 13:14:38,406 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is 
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in 
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,408 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is   
         not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in   
         the provider config.                                                                                                                         
WARNING  2025-10-02 13:14:38,411 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key 
         is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, 
         or in the provider config.                                                                                                                   
llama_stack_client instantiated in 5.237s
SKIPPED [ 11%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[openai_client-txt=openai/gpt-4o] SKIPPED [ 15%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items_with_limit_and_order[txt=openai/gpt-4o] SKIPPED [ 19%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response[txt=openai/gpt-4o] SKIPPED [ 23%]
tests/integration/agents/test_openai_responses.py::test_function_call_output_response_with_none_arguments[txt=openai/gpt-4o] SKIPPED [ 26%]
tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o] PASSED                 [ 30%]
tests/integration/agents/test_agents.py::test_agent_name[txt=openai/gpt-4o] SKIPPED (this te...) [ 34%]
tests/integration/agents/test_agents.py::test_tool_config[openai/gpt-4o] PASSED                  [ 38%]
tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] FAILED                  [ 42%]
tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o] PASSED    [ 46%]
tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o] INFO     2025-10-02 13:14:51,559 llama_stack.providers.inline.agents.meta_reference.agent_instance:691 agents::meta_reference: done with MAX          
         iterations (2), exiting.                                                                                                                     
PASSED         [ 50%]
tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o] PASSED             [ 53%]
tests/integration/agents/test_agents.py::test_tool_choice_get_boiling_point[openai/gpt-4o] XFAIL [ 57%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0] PASSED [ 61%]
tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o] PASSED             [ 65%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-False] SKIPPED [ 69%]
tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o] PASSED [ 73%]
tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1] PASSED [ 76%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-True] SKIPPED [ 80%]
tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools1-False] SKIPPED [ 84%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-True] SKIPPED [ 88%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools0-False] SKIPPED [ 92%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-True] SKIPPED [ 96%]
tests/integration/agents/test_openai_responses.py::test_responses_store[client_with_models-txt=openai/gpt-4o-tools1-False] SKIPPED [100%]

=============================================== FAILURES ===============================================
___________________________________ test_custom_tool[openai/gpt-4o] ____________________________________
tests/integration/agents/test_agents.py:370: in test_custom_tool
    assert "-100" in logs_str
E   assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series, and it doesn't have a scientifically defined boiling point. If you have any other real liquid in mind, feel free to ask!"
========================================= slowest 10 durations =========================================
5.47s setup    tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True]
4.78s call     tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o]
3.01s call     tests/integration/agents/test_agents.py::test_tool_choice_required[openai/gpt-4o]
2.97s call     tests/integration/agents/test_agents.py::test_agent_simple[openai/gpt-4o]
2.85s call     tests/integration/agents/test_agents.py::test_tool_choice_none[openai/gpt-4o]
2.06s call     tests/integration/agents/test_agents.py::test_multi_tool_calls[openai/gpt-4o]
1.83s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools0]
1.83s call     tests/integration/agents/test_agents.py::test_custom_tool_infinite_loop[openai/gpt-4o]
1.29s call     tests/integration/agents/test_agents.py::test_create_turn_response[openai/gpt-4o-client_tools1]
0.57s call     tests/integration/agents/test_openai_responses.py::test_list_response_input_items[client_with_models-txt=openai/gpt-4o]
======================================= short test summary info ========================================
FAILED tests/integration/agents/test_agents.py::test_custom_tool[openai/gpt-4o] - assert '-100' in "inference> Polyjuice Potion is a fictional substance from the Harry Potter series...
=========== 1 failed, 9 passed, 15 skipped, 6 deselected, 1 xfailed, 139 warnings in 27.18s ============
```
note: the failure is separate from the issue being fixed
2025-10-02 14:30:13 -04:00
.github fix: re-enable conformance skipping ability (#3651) 2025-10-02 15:04:26 +02:00
benchmarking/k8s-benchmark chore(perf): run guidellm benchmarks (#3421) 2025-09-24 10:18:33 -07:00
docs docs: API spec generation for Stainless (#3655) 2025-10-02 09:25:09 -07:00
llama_stack chore!: add double routes for v1/openai/v1 (#3636) 2025-10-02 16:11:05 +02:00
scripts docs: fix broken links (#3540) 2025-09-24 14:16:31 -07:00
tests chore: fix agents tests for non-ollama providers, provide max_tokens (#3657) 2025-10-02 14:30:13 -04:00
.coveragerc test: Measure and track code coverage (#2636) 2025-07-18 18:08:36 +02:00
.gitignore docs: docusaurus setup (#3541) 2025-09-24 14:11:30 -07:00
.pre-commit-config.yaml fix: distro-codegen pre-commit hook file pattern (#3337) 2025-09-04 17:56:32 +02:00
CHANGELOG.md docs: Update changelog (#3343) 2025-09-08 10:01:41 +02:00
CODE_OF_CONDUCT.md Initial commit 2024-07-23 08:32:33 -07:00
CONTRIBUTING.md docs: fix more broken links (#3649) 2025-10-02 10:43:49 +02:00
coverage.svg test: Measure and track code coverage (#2636) 2025-07-18 18:08:36 +02:00
LICENSE Update LICENSE (#47) 2024-08-29 07:39:50 -07:00
MANIFEST.in chore: MANIFEST maintenance (#3454) 2025-09-27 11:28:11 -07:00
pyproject.toml docs: fix more broken links (#3649) 2025-10-02 10:43:49 +02:00
README.md docs: fix more broken links (#3649) 2025-10-02 10:43:49 +02:00
SECURITY.md Create SECURITY.md 2024-10-08 13:30:40 -04:00
uv.lock build: Bump version to 0.2.23 2025-09-26 21:11:51 +00:00

Llama Stack

PyPI version PyPI - Downloads License Discord Unit Tests Integration Tests

Quick Start | Documentation | Colab Notebook | Discord

🎉 Llama 4 Support 🎉

We released Version 0.2.0 with support for the Llama 4 herd of models released by Meta.

👋 Click here to see how to run Llama 4 models on Llama Stack


Note you need 8xH100 GPU-host to run these models

pip install -U llama_stack

MODEL="Llama-4-Scout-17B-16E-Instruct"
# get meta url from llama.com
llama model download --source meta --model-id $MODEL --meta-url <META_URL>

# start a llama stack server
INFERENCE_MODEL=meta-llama/$MODEL llama stack build --run --template meta-reference-gpu

# install client to interact with the server
pip install llama-stack-client

CLI

# Run a chat completion
MODEL="Llama-4-Scout-17B-16E-Instruct"

llama-stack-client --endpoint http://localhost:8321 \
inference chat-completion \
--model-id meta-llama/$MODEL \
--message "write a haiku for meta's llama 4 models"

OpenAIChatCompletion(
    ...
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content='...**Silent minds awaken,**  \n**Whispers of billions of words,**  \n**Reasoning breaks the night.**  \n\n—  \n*This haiku blends the essence of LLaMA 4\'s capabilities with nature-inspired metaphor, evoking its vast training data and transformative potential.*',
                ...
            ),
            ...
        )
    ],
    ...
)

Python SDK

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=f"http://localhost:8321")

model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
prompt = "Write a haiku about coding"

print(f"User> {prompt}")
response = client.chat.completions.create(
    model=model_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt},
    ],
)
print(f"Assistant> {response.choices[0].message.content}")

As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!

🚀 One-Line Installer 🚀

To try Llama Stack locally, run:

curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/scripts/install.sh | bash

Overview

Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

  • Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
  • Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
  • Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
  • Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
Llama Stack

Llama Stack Benefits

  • Flexible Options: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
  • Consistent Experience: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
  • Robust Ecosystem: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.

By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.

API Providers

Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Please checkout for full list

API Provider Builder Environments Agents Inference VectorIO Safety Telemetry Post Training Eval DatasetIO
Meta Reference Single Node
SambaNova Hosted
Cerebras Hosted
Fireworks Hosted
AWS Bedrock Hosted
Together Hosted
Groq Hosted
Ollama Single Node
TGI Hosted/Single Node
NVIDIA NIM Hosted/Single Node
ChromaDB Hosted/Single Node
Milvus Hosted/Single Node
Qdrant Hosted/Single Node
Weaviate Hosted/Single Node
SQLite-vec Single Node
PG Vector Single Node
PyTorch ExecuTorch On-device iOS
vLLM Single Node
OpenAI Hosted
Anthropic Hosted
Gemini Hosted
WatsonX Hosted
HuggingFace Single Node
TorchTune Single Node
NVIDIA NEMO Hosted
NVIDIA Hosted

Note

: Additional providers are available through external packages. See External Providers documentation.

Distributions

A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

Distribution Llama Stack Docker Start This Distribution
Starter Distribution llamastack/distribution-starter Guide
Meta Reference llamastack/distribution-meta-reference-gpu Guide
PostgreSQL llamastack/distribution-postgres-demo

Documentation

Please checkout our Documentation page for more details.

Llama Stack Client SDKs

Language Client SDK Package
Python llama-stack-client-python PyPI version
Swift llama-stack-client-swift Swift Package Index
Typescript llama-stack-client-typescript NPM version
Kotlin llama-stack-client-kotlin Maven version

Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from python, typescript, swift, and kotlin programming languages to quickly build your applications.

You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.

🌟 GitHub Star History

Star History

Star History Chart

Contributors

Thanks to all of our amazing contributors!