# What does this PR do?
Fixed issue where code was injecting `run_config.vector_stores` even
when it was `None`, which overrode the `default_factory` in
`RagToolRuntimeConfig`. Currently, _most_ providers don't have a default
implementation for vectors_stores:
- nvidia
- meta-reference-gpu
- dell
- oci
- open-benchmark
- postgres-demo
- watsonx
The only ones which do are:
- ci-tests
- starter
- starter-gpu
## Test Plan
Prior to the change, I could not start llama-stack with the oci
distribution:
```
Traceback (most recent call last):
File "/home/opc/llama-stack/.venv/bin/llama", line 10, in <module>
sys.exit(main())
^^^^^^
File "/home/opc/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
parser.run(args)
File "/home/opc/llama-stack/src/llama_stack/cli/llama.py", line 46, in run
args.func(args)
File "/home/opc/llama-stack/src/llama_stack/cli/stack/run.py", line 184, in _run_stack_run_cmd
self._uvicorn_run(config_file, args)
File "/home/opc/llama-stack/src/llama_stack/cli/stack/run.py", line 242, in _uvicorn_run
uvicorn.run("llama_stack.core.server.server:create_app", **uvicorn_config) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/main.py", line 580, in run
server.run()
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 67, in run
return asyncio.run(self.serve(sockets=sockets))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 71, in serve
await self._serve(sockets)
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/server.py", line 78, in _serve
config.load()
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/uvicorn/config.py", line 442, in load
self.loaded_app = self.loaded_app()
^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/server/server.py", line 403, in create_app
app = StackApp(
^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/server/server.py", line 161, in __init__
future.result()
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/stack.py", line 534, in initialize
impls = await resolve_impls(
^^^^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 180, in resolve_impls
return await instantiate_providers(sorted_providers, router_apis, dist_registry, run_config, policy, internal_impls)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 321, in instantiate_providers
impl = await instantiate_provider(provider, deps, inner_impls, dist_registry, run_config, policy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/src/llama_stack/core/resolver.py", line 417, in instantiate_provider
config = config_type(**provider_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/opc/llama-stack/.venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for RagToolRuntimeConfig
vector_stores_config
Input should be a valid dictionary or instance of VectorStoresConfig [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.11/v/model_type
```
Afer tracing through and finding a simple solution to the change, I was
able to run the distribution again. I also executed the integration
tests for pytest:
```bash
OCI_COMPARTMENT_OCID="ocid1.compartment.oc1..xxx" OCI_REGION="us-chicago-1" OCI_AUTH_TYPE=instance_principal OCI_CLI_PROFILE=CHICAGO uv run pytest -sv tests/integration/inference/ --stack-config oci --text-model oci/meta.llama-3.3-70b-instruct --inference-mode live
```
|
||
|---|---|---|
| .github | ||
| benchmarking/k8s-benchmark | ||
| client-sdks/stainless | ||
| containers | ||
| docs | ||
| scripts | ||
| src | ||
| tests | ||
| .coveragerc | ||
| .dockerignore | ||
| .gitattributes | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| CHANGELOG.md | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| coverage.svg | ||
| LICENSE | ||
| MANIFEST.in | ||
| pyproject.toml | ||
| README.md | ||
| SECURITY.md | ||
| uv.lock | ||
Llama Stack
Quick Start | Documentation | Colab Notebook | Discord
🚀 One-Line Installer 🚀
To try Llama Stack locally, run:
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
Overview
Llama Stack defines and standardizes the core building blocks that simplify AI application development. It provides a unified set of APIs with implementations from leading service providers. More specifically, it provides:
- Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals.
- Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
- Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
- Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
- Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
Llama Stack Benefits
- Flexibility: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
- Consistent Experience: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
- Robust Ecosystem: Llama Stack is integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
For more information, see the Benefits of Llama Stack documentation.
API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Please checkout for full list
| API Provider | Environments | Agents | Inference | VectorIO | Safety | Post Training | Eval | DatasetIO |
|---|---|---|---|---|---|---|---|---|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | Hosted | ✅ | ✅ | |||||
| Cerebras | Hosted | ✅ | ||||||
| Fireworks | Hosted | ✅ | ✅ | ✅ | ||||
| AWS Bedrock | Hosted | ✅ | ✅ | |||||
| Together | Hosted | ✅ | ✅ | ✅ | ||||
| Groq | Hosted | ✅ | ||||||
| Ollama | Single Node | ✅ | ||||||
| TGI | Hosted/Single Node | ✅ | ||||||
| NVIDIA NIM | Hosted/Single Node | ✅ | ✅ | |||||
| ChromaDB | Hosted/Single Node | ✅ | ||||||
| Milvus | Hosted/Single Node | ✅ | ||||||
| Qdrant | Hosted/Single Node | ✅ | ||||||
| Weaviate | Hosted/Single Node | ✅ | ||||||
| SQLite-vec | Single Node | ✅ | ||||||
| PG Vector | Single Node | ✅ | ||||||
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | |||||
| vLLM | Single Node | ✅ | ||||||
| OpenAI | Hosted | ✅ | ||||||
| Anthropic | Hosted | ✅ | ||||||
| Gemini | Hosted | ✅ | ||||||
| WatsonX | Hosted | ✅ | ||||||
| HuggingFace | Single Node | ✅ | ✅ | |||||
| TorchTune | Single Node | ✅ | ||||||
| NVIDIA NEMO | Hosted | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| NVIDIA | Hosted | ✅ | ✅ | ✅ |
Note
: Additional providers are available through external packages. See External Providers documentation.
Distributions
A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario. For example, you can begin with a local setup of Ollama and seamlessly transition to production, with fireworks, without changing your application code. Here are some of the distributions we support:
| Distribution | Llama Stack Docker | Start This Distribution |
|---|---|---|
| Starter Distribution | llamastack/distribution-starter | Guide |
| Meta Reference | llamastack/distribution-meta-reference-gpu | Guide |
| PostgreSQL | llamastack/distribution-postgres-demo |
For full documentation on the Llama Stack distributions see the Distributions Overview page.
Documentation
Please checkout our Documentation page for more details.
- CLI references
- llama (server-side) CLI Reference: Guide for using the
llamaCLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. - llama (client-side) CLI Reference: Guide for using the
llama-stack-clientCLI, which allows you to query information about the distribution.
- llama (server-side) CLI Reference: Guide for using the
- Getting Started
- Quick guide to start a Llama Stack server.
- Jupyter notebook to walk-through how to use simple text and vision inference llama_stack_client APIs
- The complete Llama Stack lesson Colab notebook of the new Llama 3.2 course on Deeplearning.ai.
- A Zero-to-Hero Guide that guide you through all the key components of llama stack with code samples.
- Contributing
- Adding a new API Provider to walk-through how to add a new API provider.
Llama Stack Client SDKs
Check out our client SDKs for connecting to a Llama Stack server in your preferred language.
| Language | Client SDK | Package |
|---|---|---|
| Python | llama-stack-client-python | |
| Swift | llama-stack-client-swift | |
| Typescript | llama-stack-client-typescript | |
| Kotlin | llama-stack-client-kotlin |
You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.
🌟 GitHub Star History
Star History
✨ Contributors
Thanks to all of our amazing contributors!