diff --git a/docs/source/distributions/starting_llama_stack_server.md b/docs/source/distributions/starting_llama_stack_server.md index 9be2e9ec5..f74de6d48 100644 --- a/docs/source/distributions/starting_llama_stack_server.md +++ b/docs/source/distributions/starting_llama_stack_server.md @@ -2,22 +2,22 @@ You can run a Llama Stack server in one of the following ways: -**As a Library**: +## As a Library: This is the simplest way to get started. Using Llama Stack as a library means you do not need to start a server. This is especially useful when you are not running inference locally and relying on an external inference service (eg. fireworks, together, groq, etc.) See [Using Llama Stack as a Library](importing_as_library) -**Container**: +## Container: Another simple way to start interacting with Llama Stack is to just spin up a container (via Docker or Podman) which is pre-built with all the providers you need. We provide a number of pre-built images so you can start a Llama Stack server instantly. You can also build your own custom container. Which distribution to choose depends on the hardware you have. See [Selection of a Distribution](selection) for more details. -**Conda**: +## Conda: If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. -**Kubernetes**: +## Kubernetes: If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 6ba208bf1..5ac8fa2eb 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -75,13 +75,13 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit ``` ### ii. Using the Llama Stack Client -Now you can use the llama stack client to run inference and build agents! +Now you can use the llama stack client to run inference and build agents! You can reuse the server setup or use the +[Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/). Note that the client package is already +included in the `llama-stack` package. -_Note: You can reuse the server setup or the Llama Stack Client_ +Open a new terminal and navigate to the same directory you started the server from. Then set up or activate your +virtual environment. -Open a new terminal and navigate to the same directory you started the server from. - -Setup venv (llama-stack already includes the client package) ```bash source .venv/bin/activate ``` @@ -113,8 +113,8 @@ Total models: 2 ``` -## Step 4: Run Inference with Llama Stack -You can test basic Llama inference completion using the CLI too. +## Step 4: Run Basic Inference +You can test basic Llama inference completion using the CLI. ```bash llama-stack-client inference chat-completion --message "tell me a joke" @@ -136,8 +136,9 @@ ChatCompletionResponse( ], ) ``` -### i. Create the Script +Alternatively, you can run inference using the Llama Stack client SDK. +### i. Create the Script Create a file `inference.py` and add the following code: ```python from llama_stack_client import LlamaStackClient @@ -177,9 +178,9 @@ Logic flows through digital night Beauty in the bits ``` -## Step 5: Run Your First Agent -### i. Create the Script +## Step 5: Build a Simple Agent Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server. +### i. Create the Script Create a file `agent.py` and add the following code: ```python @@ -348,9 +349,9 @@ So, who am I? I'm just a computer program designed to help you! ::: ## Step 6: Build a RAG Agent -### i. Create the Script For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents in a vector database. +### i. Create the Script Create a file `rag_agent.py` and add the following code: ```python @@ -360,11 +361,11 @@ from llama_stack_client.types import Document import uuid from termcolor import cprint -client = LlamaStackClient(base_url=f"http://localhost:8321") +client = LlamaStackClient(base_url="http://localhost:8321") # Create a vector database instance -embedlm = next(m for m in client.models.list() if m.model_type == "embedding") -embedding_model = embedlm.identifier +embed_lm = next(m for m in client.models.list() if m.model_type == "embedding") +embedding_model = embed_lm.identifier vector_db_id = f"v{uuid.uuid4().hex}" client.vector_dbs.register( vector_db_id=vector_db_id, @@ -401,7 +402,7 @@ client.tool_runtime.rag_tool.insert( llm = next(m for m in client.models.list() if m.model_type == "llm") model = llm.identifier -# Create RAG agent +# Create the RAG agent rag_agent = Agent( client, model=model, @@ -416,18 +417,14 @@ rag_agent = Agent( session_id = rag_agent.create_session(session_name=f"s{uuid.uuid4().hex}") -user_prompts = [ - "How to optimize memory usage in torchtune? use the knowledge_search tool to get information.", -] +turns = ["what is torchtune", "tell me about dora"] -# Run the agent loop by calling the `create_turn` method -for prompt in user_prompts: - cprint(f"User> {prompt}", "green") - response = rag_agent.create_turn( - messages=[{"role": "user", "content": prompt}], - session_id=session_id, +for t in turns: + print("user>", t) + stream = rag_agent.create_turn( + messages=[{"role": "user", "content": t}], session_id=session_id, stream=True ) - for event in AgentEventLogger().log(response): + for event in AgentEventLogger().log(stream): event.print() ``` ### ii. Run the Script