From 24d70cedcaf2cc373ecf2418da80281b0ca6f9fb Mon Sep 17 00:00:00 2001 From: Francisco Arceo Date: Fri, 11 Apr 2025 12:50:36 -0600 Subject: [PATCH] docs: Updated docs to show minimal RAG example and some other minor changes (#1935) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? Incorporating some feedback into the docs. - **`docs/source/getting_started/index.md`:** - Demo actually does RAG now - Simplified the installation command for dependencies. - Updated demo script examples to align with the latest API changes. - Replaced manual document manipulation with `RAGDocument` for clarity and maintainability. - Introduced new logic for model and embedding selection using the Llama Stack Client SDK. - Enhanced examples to showcase proper agent initialization and logging. - **`docs/source/getting_started/detailed_tutorial.md`:** - Updated the section for listing models to include proper code formatting with `bash`. - Removed and reorganized the "Run the Demos" section for clarity. - Adjusted tab-item structures and added new instructions for demo scripts. - **`docs/_static/css/my_theme.css`:** - Updated heading styles to include `h2`, `h3`, and `h4` for consistent font weight. - Added a new style for `pre` tags to wrap text and break long words, this is particularly useful for rendering long output from generation. ## Test Plan Tested locally. Screenshot for reference: Screenshot 2025-04-10 at 10 12 12 PM --------- Signed-off-by: Francisco Javier Arceo --- docs/_static/css/my_theme.css | 6 +- .../getting_started/detailed_tutorial.md | 26 ++--- docs/source/getting_started/index.md | 101 ++++++++---------- 3 files changed, 62 insertions(+), 71 deletions(-) diff --git a/docs/_static/css/my_theme.css b/docs/_static/css/my_theme.css index 6f82f6358..a587f866d 100644 --- a/docs/_static/css/my_theme.css +++ b/docs/_static/css/my_theme.css @@ -17,9 +17,13 @@ display: none; } -h3 { +h2, h3, h4 { font-weight: normal; } html[data-theme="dark"] .rst-content div[class^="highlight"] { background-color: #0b0b0b; } +pre { + white-space: pre-wrap !important; + word-break: break-all; +} diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md index 65582e8d8..911b35437 100644 --- a/docs/source/getting_started/detailed_tutorial.md +++ b/docs/source/getting_started/detailed_tutorial.md @@ -173,9 +173,8 @@ You will see the below: Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321 ``` -#### iii. List Available Models List the models -``` +```bash llama-stack-client models list Available Models @@ -190,15 +189,6 @@ Available Models Total models: 2 ``` - -## Step 4: Run the Demos - -Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md). -Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options. - -::::{tab-set} - -:::{tab-item} Basic Inference with the CLI You can test basic Llama inference completion using the CLI. ```bash @@ -221,10 +211,16 @@ ChatCompletionResponse( ], ) ``` -::: -:::{tab-item} Basic Inference with a Script -Alternatively, you can run inference using the Llama Stack client SDK. +## Step 4: Run the Demos + +Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md). +Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options. + +::::{tab-set} + +:::{tab-item} Basic Inference +Now you can run inference using the Llama Stack client SDK. ### i. Create the Script Create a file `inference.py` and add the following code: @@ -269,7 +265,7 @@ Beauty in the bits ::: :::{tab-item} Build a Simple Agent -Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. +Next we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. ### i. Create the Script Create a file `agent.py` and add the following code: diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 63fa5ae6e..ce7dbe973 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -12,9 +12,8 @@ as the inference [provider](../providers/index.md#inference) for a Llama Model. Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with [Ollama](https://ollama.com/download). ```bash -uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals +uv pip install llama-stack source .venv/bin/activate -export INFERENCE_MODEL="llama3.2:3b" ollama run llama3.2:3b --keepalive 60m ``` ## Step 2: Run the Llama Stack Server @@ -24,70 +23,62 @@ INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type ven ## Step 3: Run the Demo Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. ```python -from termcolor import cprint -from llama_stack_client.types import Document -from llama_stack_client import LlamaStackClient - - -vector_db = "faiss" -vector_db_id = "test-vector-db" -model_id = "llama3.2:3b-instruct-fp16" -query = "Can you give me the arxiv link for Lora Fine Tuning in Pytorch?" -documents = [ - Document( - document_id="document_1", - content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/lora_finetune.rst", - mime_type="text/plain", - metadata={}, - ) -] +from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient +vector_db_id = "my_demo_vector_db" client = LlamaStackClient(base_url="http://localhost:8321") -client.vector_dbs.register( - provider_id=vector_db, - vector_db_id=vector_db_id, - embedding_model="all-MiniLM-L6-v2", - embedding_dimension=384, -) +models = client.models.list() + +# Select the first LLM and first embedding models +model_id = next(m for m in models if m.model_type == "llm").identifier +embedding_model_id = ( + em := next(m for m in models if m.model_type == "embedding") +).identifier +embedding_dimension = em.metadata["embedding_dimension"] + +_ = client.vector_dbs.register( + vector_db_id=vector_db_id, + embedding_model=embedding_model_id, + embedding_dimension=embedding_dimension, + provider_id="faiss", +) +document = RAGDocument( + document_id="document_1", + content="https://www.paulgraham.com/greatwork.html", + mime_type="text/html", + metadata={}, +) client.tool_runtime.rag_tool.insert( - documents=documents, + documents=[document], vector_db_id=vector_db_id, chunk_size_in_tokens=50, ) - -response = client.tool_runtime.rag_tool.query( - vector_db_ids=[vector_db_id], - content=query, +agent = Agent( + client, + model=model_id, + instructions="You are a helpful assistant", + tools=[ + { + "name": "builtin::rag/knowledge_search", + "args": {"vector_db_ids": [vector_db_id]}, + } + ], ) -cprint("" + "-" * 50, "yellow") -cprint(f"Query> {query}", "red") -cprint("" + "-" * 50, "yellow") -for chunk in response.content: - cprint(f"Chunk ID> {chunk.text}", "green") - cprint("" + "-" * 50, "yellow") +response = agent.create_turn( + messages=[{"role": "user", "content": "How do you do great work?"}], + session_id=agent.create_session("rag_session"), +) + +for log in AgentEventLogger().log(response): + log.print() ``` And you should see output like below. -``` --------------------------------------------------- -Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch? --------------------------------------------------- -Chunk ID> knowledge_search tool found 5 chunks: -BEGIN of knowledge_search tool results. - --------------------------------------------------- -Chunk ID> Result 1: -Document_id:docum -Content: .. _lora_finetune_label: - -============================ -Fine-Tuning Llama2 with LoRA -============================ - -This guide will teach you about `LoRA `_, a - --------------------------------------------------- +```bash +inference> [knowledge_search(query="What does it mean to do great work")] +tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'} +tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [1]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.

The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.

So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] ``` Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳