llama-stack-mirror/docs/source/getting_started/quickstart.md at 2bc96613f918316a5df85925b0c7872127947cae

phoenix-oss/llama-stack-mirror

Fork 1

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-25 05:39:47 +00:00

Francisco Arceo 2bc96613f9

Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s

Details

Coverage Badge / unit-tests (push) Failing after 6s

Details

Integration Tests / discover-tests (push) Successful in 7s

Details

Unit Tests / unit-tests (3.13) (push) Failing after 6s

Details

Test Llama Stack Build / build-single-provider (push) Failing after 8s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 11s

Details

SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 14s

Details

Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 14s

Details

Test Llama Stack Build / generate-matrix (push) Successful in 10s

Details

Test External Providers / test-external-providers (venv) (push) Failing after 9s

Details

Test Llama Stack Build / build-custom-container-distribution (push) Failing after 11s

Details

Unit Tests / unit-tests (3.12) (push) Failing after 10s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s

Details

SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s

Details

Python Package Build Test / build (3.13) (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s

Details

Integration Tests / test-matrix (push) Failing after 13s

Details

Python Package Build Test / build (3.12) (push) Failing after 1m1s

Details

Update ReadTheDocs / update-readthedocs (push) Failing after 1m0s

Details

Test Llama Stack Build / build (push) Failing after 52s

Details

Pre-commit / pre-commit (push) Successful in 2m39s

Details

chore: Adding demo script and importing it into the docs (#2848 )

# What does this PR do?
This PR adds the quickstart as a file to the docs so that it can be more
easily maintained and run, as mentioned in
https://github.com/meta-llama/llama-stack/pull/2800.

## Test Plan
I could add this as a test in the CI but I wasn't sure if we wanted to
add additional jobs there. 😅

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

2025-07-21 22:53:32 -04:00

5.1 KiB

Raw Blame History

Quickstart

Get started with Llama Stack in minutes!

Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different environments. You can build and test using a local server first and deploy to a hosted endpoint for production.

In this guide, we'll walk through how to build a RAG application locally using Llama Stack with Ollama as the inference provider for a Llama Model.

💡 Notebook Version: You can also follow this quickstart guide in a Jupyter notebook format: quick_start.ipynb

Step 1: Install and setup

Install uv
Run inference on a Llama model with Ollama

ollama run llama3.2:3b --keepalive 60m

Step 2: Run the Llama Stack server

We will use uv to run the Llama Stack server.

ENABLE_OLLAMA=ollama OLLAMA_INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template starter --image-type venv --run

Step 3: Run the demo

Now open up a new terminal and copy the following script into a file named demo_script.py.

:language: python

We will use uv to run the script

uv run --with llama-stack-client,fire,requests demo_script.py

And you should see output like below.

rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html

prompt> How do you do great work?

inference> [knowledge_search(query="What is the key to doing great work")]

tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'}

tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent:  work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')]

inference> Based on the search results, it seems that doing great work means doing something important so well that you expand people's ideas of what's possible. However, there is no clear threshold for importance, and it can be difficult to judge at the time.

To further clarify, I would suggest that doing great work involves:

* Completing tasks with high quality and attention to detail
* Expanding on existing knowledge or ideas
* Making a positive impact on others through your work
* Striving for excellence and continuous improvement

Ultimately, great work is about making a meaningful contribution and leaving a lasting impression.

Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳

:class: tip

If you are getting a **401 Client Error** from HuggingFace for the **all-MiniLM-L6-v2** model, try setting **HF_TOKEN** to a valid HuggingFace token in your environment

Next Steps

Now you're ready to dive deeper into Llama Stack!

Explore the Detailed Tutorial.
Try the Getting Started Notebook.
Browse more Notebooks on GitHub.
Learn about Llama Stack Concepts.
Discover how to Build Llama Stacks.
Refer to our References for details on the Llama CLI and Python SDK.
Check out the llama-stack-apps repository for example applications and tutorials.

5.1 KiB Raw Blame History

Quickstart

Step 1: Install and setup

Step 2: Run the Llama Stack server

Step 3: Run the demo

Next Steps

5.1 KiB

Raw Blame History