forked from phoenix-oss/llama-stack-mirror
# What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `<h3>` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 <img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM" src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
110 lines
3.9 KiB
Markdown
110 lines
3.9 KiB
Markdown
# Quickstart
|
|
|
|
Get started with Llama Stack in minutes!
|
|
|
|
Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different
|
|
environments. You can build and test using a local server first and deploy to a hosted endpoint for production.
|
|
|
|
In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)
|
|
as the inference [provider](../providers/index.md#inference) for a Llama Model.
|
|
|
|
## Step 1. Install and Setup
|
|
Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with
|
|
[Ollama](https://ollama.com/download).
|
|
```bash
|
|
uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals
|
|
source .venv/bin/activate
|
|
export INFERENCE_MODEL="llama3.2:3b"
|
|
ollama run llama3.2:3b --keepalive 60m
|
|
```
|
|
## Step 2: Run the Llama Stack Server
|
|
```bash
|
|
INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run
|
|
```
|
|
## Step 3: Run the Demo
|
|
Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell.
|
|
```python
|
|
from termcolor import cprint
|
|
from llama_stack_client.types import Document
|
|
from llama_stack_client import LlamaStackClient
|
|
|
|
|
|
vector_db = "faiss"
|
|
vector_db_id = "test-vector-db"
|
|
model_id = "llama3.2:3b-instruct-fp16"
|
|
query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?"
|
|
documents = [
|
|
Document(
|
|
document_id="document_1",
|
|
content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst",
|
|
mime_type="text/plain",
|
|
metadata={},
|
|
)
|
|
]
|
|
|
|
client = LlamaStackClient(base_url="http://localhost:8321")
|
|
client.vector_dbs.register(
|
|
provider_id=vector_db,
|
|
vector_db_id=vector_db_id,
|
|
embedding_model="all-MiniLM-L6-v2",
|
|
embedding_dimension=384,
|
|
)
|
|
|
|
client.tool_runtime.rag_tool.insert(
|
|
documents=documents,
|
|
vector_db_id=vector_db_id,
|
|
chunk_size_in_tokens=50,
|
|
)
|
|
|
|
response = client.tool_runtime.rag_tool.query(
|
|
vector_db_ids=[vector_db_id],
|
|
content=query,
|
|
)
|
|
|
|
cprint("" + "-" * 50, "yellow")
|
|
cprint(f"Query> {query}", "red")
|
|
cprint("" + "-" * 50, "yellow")
|
|
for chunk in response.content:
|
|
cprint(f"Chunk ID> {chunk.text}", "green")
|
|
cprint("" + "-" * 50, "yellow")
|
|
```
|
|
And you should see output like below.
|
|
```
|
|
--------------------------------------------------
|
|
Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch?
|
|
--------------------------------------------------
|
|
Chunk ID> knowledge_search tool found 5 chunks:
|
|
BEGIN of knowledge_search tool results.
|
|
|
|
--------------------------------------------------
|
|
Chunk ID> Result 1:
|
|
Document_id:docum
|
|
Content: .. _lora_finetune_label:
|
|
|
|
============================
|
|
Fine-Tuning Llama2 with LoRA
|
|
============================
|
|
|
|
This guide will teach you about `LoRA <https://arxiv.org/abs/2106.09685>`_, a
|
|
|
|
--------------------------------------------------
|
|
```
|
|
Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳
|
|
|
|
## Next Steps
|
|
|
|
Now you're ready to dive deeper into Llama Stack!
|
|
- Explore the [Detailed Tutorial](./detailed_tutorial.md).
|
|
- Try the [Getting Started Notebook](https://github.com/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb).
|
|
- Browse more [Notebooks on GitHub](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks).
|
|
- Learn about Llama Stack [Concepts](../concepts/index.md).
|
|
- Discover how to [Build Llama Stacks](../distributions/index.md).
|
|
- Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK.
|
|
- Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials.
|
|
|
|
```{toctree}
|
|
:maxdepth: 0
|
|
:hidden:
|
|
|
|
detailed_tutorial
|
|
```
|