diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 71cd2ef43..eb19454fc 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -1,6 +1,6 @@ # Quick Start -In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple RAG agent. +In this guide, we'll walk through how you can use the Llama Stack (server and client SDK) to test a simple agent. A Llama Stack agent is a simple integrated system that can perform tasks by combining a Llama model for reasoning with tools (e.g., RAG, web search, code execution, etc.) for taking actions. In Llama Stack, we provide a server exposing multiple APIs. These APIs are backed by implementations from different providers. @@ -58,11 +58,7 @@ Llama Stack is a server that exposes multiple APIs, you connect with it using th ```bash uv pip install llama-stack ``` - -### Install the Llama Stack Client -```bash -uv pip install llama-stack-client -``` +Note the Llama Stack Server includes the client SDK as well. ## Step 3: Build and Run Llama Stack Llama Stack uses a [configuration file](../distributions/configuration.md) to define the stack. @@ -91,10 +87,10 @@ Setup venv (llama-stack already includes the client package) ```bash source .venv/bin/activate ``` -Let's use the `llama-stack-client` CLI to check the connectivity to the server. +Now let's use the `llama-stack-client` CLI to check the connectivity to the server. ```bash -llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORT --api-key none +llama-stack-client configure --endpoint http://localhost:8321 --api-key none ``` You will see the below: ``` @@ -105,7 +101,6 @@ Done! You can now use the Llama Stack Client CLI with endpoint http://localhost: List the models ``` llama-stack-client models list -``` Available Models ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ @@ -143,13 +138,13 @@ ChatCompletionResponse( ], ) ``` +### i. Create a Script used by the Llama Stack Client -#### 4.1 Basic Inference Create a file `inference.py` and add the following code: ```python from llama_stack_client import LlamaStackClient -client = LlamaStackClient(base_url=f"http://localhost:8321") +client = LlamaStackClient(base_url="http://localhost:8321") # List available models models = client.models.list() @@ -169,6 +164,7 @@ response = client.inference.chat_completion( ) print(response.completion_message.content) ``` +### ii. Run the Script Let's run the script using `uv` ```bash uv run python inference.py @@ -183,9 +179,10 @@ Logic flows through digital night Beauty in the bits ``` -#### 4.2. Basic Agent - +## Step 5: Run Your First Agent +Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. Create a file `agent.py` and add the following code: + ```python from llama_stack_client import LlamaStackClient from llama_stack_client import Agent, AgentEventLogger @@ -224,12 +221,11 @@ stream = agent.create_turn( for event in AgentEventLogger().log(stream): event.print() ``` - +### ii. Run the Script Let's run the script using `uv` ```bash uv run python agent.py ``` - :::{dropdown} `Sample output` ``` Non-streaming ... @@ -352,8 +348,10 @@ So, who am I? I'm just a computer program designed to help you! ``` ::: -#### 4.3. RAG agent - +## Step 6: Build a RAG Agent +### i. Create the Script +For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents +in a vector database. Create a file `rag_agent.py` and add the following code: ```python @@ -361,6 +359,7 @@ from llama_stack_client import LlamaStackClient from llama_stack_client import Agent, AgentEventLogger from llama_stack_client.types import Document import uuid +from termcolor import cprint client = LlamaStackClient(base_url=f"http://localhost:8321") @@ -404,7 +403,7 @@ llm = next(m for m in client.models.list() if m.model_type == "llm") model = llm.identifier # Create RAG agent -ragagent = Agent( +rag_agent = Agent( client, model=model, instructions="You are a helpful assistant. Use the RAG tool to answer questions as needed.", @@ -416,7 +415,7 @@ ragagent = Agent( ], ) -s_id = ragagent.create_session(session_name=f"s{uuid.uuid4().hex}") +session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}") user_prompts = [ "How to optimize memory usage in torchtune? use the knowledge_search tool to get information.", @@ -429,12 +428,13 @@ for prompt in user_prompts: messages=[{"role": "user", "content": prompt}], session_id=session_id, ) - for event in AgentEventLogger().log(stream): + for event in AgentEventLogger().log(response): event.print() ``` +### ii. Run the Script Let's run the script using `uv` ```bash -uv run python lsagent.py +uv run python rag_agent.py ``` :::{dropdown} `Sample output` ``` diff --git a/docs/source/providers/index.md b/docs/source/providers/index.md index f8997a281..8b6e214e8 100644 --- a/docs/source/providers/index.md +++ b/docs/source/providers/index.md @@ -1,8 +1,8 @@ # Providers Overview The goal of Llama Stack is to build an ecosystem where users can easily swap out different implementations for the same API. Examples for these include: -- LLM inference providers (e.g., Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), -- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, etc.), +- LLM inference providers (e.g., Ollama, Fireworks, Together, AWS Bedrock, Groq, Cerebras, SambaNova, vLLM, etc.), +- Vector databases (e.g., ChromaDB, Weaviate, Qdrant, Milvus, FAISS, PGVector, SQLite-Vec, etc.), - Safety providers (e.g., Meta's Llama Guard, AWS Bedrock Guardrails, etc.) Providers come in two flavors: