mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-23 00:27:26 +00:00
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 22s
Test llama stack list-deps / show-single-provider (push) Failing after 53s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
API Conformance Tests / check-schema-compatibility (push) Successful in 52s
Test llama stack list-deps / generate-matrix (push) Successful in 52s
Test Llama Stack Build / build (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 53s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m30s
Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s
Test llama stack list-deps / list-deps (push) Failing after 1m10s
UI Tests / ui-tests (22) (push) Successful in 2m26s
Pre-commit / pre-commit (push) Successful in 3m8s
# What does this PR do? This PR does a few things outlined in #2878 namely: 1. adds `llama stack list-deps` a command which simply takes the build logic and instead of executing one of the `build_...` scripts, it displays all of the providers' dependencies using the `module` and `uv`. 2. deprecated `llama stack build` in favor of `llama stack list-deps` 3. updates all tests to use `list-deps` alongside `build`. PR 2/2 will migrate `llama stack run`'s default behavior to be `llama stack build --run` and use the new `list-deps` command under the hood before running the server. examples of `llama stack list-deps starter` ``` llama stack list-deps starter --format json { "name": "starter", "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.", "apis": [ { "api": "inference", "provider": "remote::cerebras" }, { "api": "inference", "provider": "remote::ollama" }, { "api": "inference", "provider": "remote::vllm" }, { "api": "inference", "provider": "remote::tgi" }, { "api": "inference", "provider": "remote::fireworks" }, { "api": "inference", "provider": "remote::together" }, { "api": "inference", "provider": "remote::bedrock" }, { "api": "inference", "provider": "remote::nvidia" }, { "api": "inference", "provider": "remote::openai" }, { "api": "inference", "provider": "remote::anthropic" }, { "api": "inference", "provider": "remote::gemini" }, { "api": "inference", "provider": "remote::vertexai" }, { "api": "inference", "provider": "remote::groq" }, { "api": "inference", "provider": "remote::sambanova" }, { "api": "inference", "provider": "remote::azure" }, { "api": "inference", "provider": "inline::sentence-transformers" }, { "api": "vector_io", "provider": "inline::faiss" }, { "api": "vector_io", "provider": "inline::sqlite-vec" }, { "api": "vector_io", "provider": "inline::milvus" }, { "api": "vector_io", "provider": "remote::chromadb" }, { "api": "vector_io", "provider": "remote::pgvector" }, { "api": "files", "provider": "inline::localfs" }, { "api": "safety", "provider": "inline::llama-guard" }, { "api": "safety", "provider": "inline::code-scanner" }, { "api": "agents", "provider": "inline::meta-reference" }, { "api": "telemetry", "provider": "inline::meta-reference" }, { "api": "post_training", "provider": "inline::torchtune-cpu" }, { "api": "eval", "provider": "inline::meta-reference" }, { "api": "datasetio", "provider": "remote::huggingface" }, { "api": "datasetio", "provider": "inline::localfs" }, { "api": "scoring", "provider": "inline::basic" }, { "api": "scoring", "provider": "inline::llm-as-judge" }, { "api": "scoring", "provider": "inline::braintrust" }, { "api": "tool_runtime", "provider": "remote::brave-search" }, { "api": "tool_runtime", "provider": "remote::tavily-search" }, { "api": "tool_runtime", "provider": "inline::rag-runtime" }, { "api": "tool_runtime", "provider": "remote::model-context-protocol" }, { "api": "batches", "provider": "inline::reference" } ], "pip_dependencies": [ "pandas", "opentelemetry-exporter-otlp-proto-http", "matplotlib", "opentelemetry-sdk", "sentence-transformers", "datasets", "pymilvus[milvus-lite]>=2.4.10", "codeshield", "scipy", "torchvision", "tree_sitter", "h11>=0.16.0", "aiohttp", "pymongo", "tqdm", "pythainlp", "pillow", "torch", "emoji", "grpcio>=1.67.1,<1.71.0", "fireworks-ai", "langdetect", "psycopg2-binary", "asyncpg", "redis", "together", "torchao>=0.12.0", "openai", "sentencepiece", "aiosqlite", "google-cloud-aiplatform", "faiss-cpu", "numpy", "sqlite-vec", "nltk", "scikit-learn", "mcp>=1.8.1", "transformers", "boto3", "huggingface_hub", "ollama", "autoevals", "sqlalchemy[asyncio]", "torchtune>=0.5.0", "chromadb-client", "pypdf", "requests", "anthropic", "chardet", "aiosqlite", "fastapi", "fire", "httpx", "uvicorn", "opentelemetry-sdk", "opentelemetry-exporter-otlp-proto-http" ] } ``` <img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM" src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>
152 lines
7.1 KiB
Text
152 lines
7.1 KiB
Text
---
|
|
description: environments.
|
|
sidebar_label: Quickstart
|
|
sidebar_position: 1
|
|
title: Quickstart
|
|
---
|
|
|
|
Get started with Llama Stack in minutes!
|
|
|
|
Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different
|
|
environments. You can build and test using a local server first and deploy to a hosted endpoint for production.
|
|
|
|
In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)
|
|
as the inference [provider](/docs/providers/inference) for a Llama Model.
|
|
|
|
**💡 Notebook Version:** You can also follow this quickstart guide in a Jupyter notebook format: [quick_start.ipynb](https://github.com/meta-llama/llama-stack/blob/main/docs/quick_start.ipynb)
|
|
|
|
#### Step 1: Install and setup
|
|
1. Install [uv](https://docs.astral.sh/uv/)
|
|
2. Run inference on a Llama model with [Ollama](https://ollama.com/download)
|
|
```bash
|
|
ollama run llama3.2:3b --keepalive 60m
|
|
```
|
|
|
|
#### Step 2: Run the Llama Stack server
|
|
|
|
We will use `uv` to install dependencies and run the Llama Stack server.
|
|
```bash
|
|
# Install dependencies for the starter distribution
|
|
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
|
|
|
|
# Run the server
|
|
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack run starter
|
|
```
|
|
#### Step 3: Run the demo
|
|
Now open up a new terminal and copy the following script into a file named `demo_script.py`.
|
|
|
|
```python title="demo_script.py"
|
|
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
# All rights reserved.
|
|
#
|
|
# This source code is licensed under the terms described in the LICENSE file in
|
|
# the root directory of this source tree.
|
|
|
|
from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient
|
|
|
|
vector_db_id = "my_demo_vector_db"
|
|
client = LlamaStackClient(base_url="http://localhost:8321")
|
|
|
|
models = client.models.list()
|
|
|
|
# Select the first LLM and first embedding models
|
|
model_id = next(m for m in models if m.model_type == "llm").identifier
|
|
embedding_model_id = (
|
|
em := next(m for m in models if m.model_type == "embedding")
|
|
).identifier
|
|
embedding_dimension = em.metadata["embedding_dimension"]
|
|
|
|
vector_db = client.vector_dbs.register(
|
|
vector_db_id=vector_db_id,
|
|
embedding_model=embedding_model_id,
|
|
embedding_dimension=embedding_dimension,
|
|
provider_id="faiss",
|
|
)
|
|
vector_db_id = vector_db.identifier
|
|
source = "https://www.paulgraham.com/greatwork.html"
|
|
print("rag_tool> Ingesting document:", source)
|
|
document = RAGDocument(
|
|
document_id="document_1",
|
|
content=source,
|
|
mime_type="text/html",
|
|
metadata={},
|
|
)
|
|
client.tool_runtime.rag_tool.insert(
|
|
documents=[document],
|
|
vector_db_id=vector_db_id,
|
|
chunk_size_in_tokens=100,
|
|
)
|
|
agent = Agent(
|
|
client,
|
|
model=model_id,
|
|
instructions="You are a helpful assistant",
|
|
tools=[
|
|
{
|
|
"name": "builtin::rag/knowledge_search",
|
|
"args": {"vector_db_ids": [vector_db_id]},
|
|
}
|
|
],
|
|
)
|
|
|
|
prompt = "How do you do great work?"
|
|
print("prompt>", prompt)
|
|
|
|
use_stream = True
|
|
response = agent.create_turn(
|
|
messages=[{"role": "user", "content": prompt}],
|
|
session_id=agent.create_session("rag_session"),
|
|
stream=use_stream,
|
|
)
|
|
|
|
# Only call `AgentEventLogger().log(response)` for streaming responses.
|
|
if use_stream:
|
|
for log in AgentEventLogger().log(response):
|
|
log.print()
|
|
else:
|
|
print(response)
|
|
```
|
|
We will use `uv` to run the script
|
|
```
|
|
uv run --with llama-stack-client,fire,requests demo_script.py
|
|
```
|
|
And you should see output like below.
|
|
```
|
|
rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html
|
|
|
|
prompt> How do you do great work?
|
|
|
|
inference> [knowledge_search(query="What is the key to doing great work")]
|
|
|
|
tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}
|
|
|
|
tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]
|
|
|
|
inference> Based on the search results, it seems that doing great work means doing something important so well that you expand people's ideas of what's possible. However, there is no clear threshold for importance, and it can be difficult to judge at the time.
|
|
|
|
To further clarify, I would suggest that doing great work involves:
|
|
|
|
* Completing tasks with high quality and attention to detail
|
|
* Expanding on existing knowledge or ideas
|
|
* Making a positive impact on others through your work
|
|
* Striving for excellence and continuous improvement
|
|
|
|
Ultimately, great work is about making a meaningful contribution and leaving a lasting impression.
|
|
```
|
|
Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳
|
|
|
|
:::tip HuggingFace access
|
|
|
|
If you are getting a **401 Client Error** from HuggingFace for the **all-MiniLM-L6-v2** model, try setting **HF_TOKEN** to a valid HuggingFace token in your environment
|
|
|
|
:::
|
|
|
|
### Next Steps
|
|
|
|
Now you're ready to dive deeper into Llama Stack!
|
|
- Explore the [Detailed Tutorial](./detailed_tutorial).
|
|
- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
|
|
- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).
|
|
- Learn about Llama Stack [Concepts](/docs/concepts).
|
|
- Discover how to [Build Llama Stacks](/docs/distributions).
|
|
- Refer to our [References](/docs/references) for details on the Llama CLI and Python SDK.
|
|
- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials.
|