mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-05 10:13:05 +00:00
added tabs for the tutorial output and rephrased thing based on feedback
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
0961987962
commit
5bdd767e8d
1 changed files with 166 additions and 148 deletions
|
@ -45,32 +45,33 @@ Setup your virtual environment.
|
|||
uv venv --python 3.10
|
||||
source .venv/bin/activate
|
||||
```
|
||||
## Step 2: Install Llama Stack
|
||||
## Step 2: Run Llama Stack
|
||||
Llama Stack is a server that exposes multiple APIs, you connect with it using the Llama Stack client SDK.
|
||||
|
||||
```bash
|
||||
uv pip install llama-stack
|
||||
```
|
||||
Note the Llama Stack Server includes the client SDK as well.
|
||||
|
||||
## Step 3: Build and Run the Llama Stack Server
|
||||
Llama Stack uses a [configuration file](../distributions/configuration.md) to define the stack.
|
||||
The config file is a YAML file that specifies the providers and their configurations. Now let's build and run the
|
||||
Llama Stack config for Ollama.
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Using Python
|
||||
You can use Python to build and run the Llama Stack server. This is useful for testing and development purposes.
|
||||
:::{tab-item} Using `venv`
|
||||
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
|
||||
|
||||
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
|
||||
which defines the providers and their settings.
|
||||
Now let's build and run the Llama Stack config for Ollama.
|
||||
|
||||
```bash
|
||||
INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run
|
||||
```
|
||||
You will see output like below:
|
||||
```
|
||||
INFO: Application startup complete.
|
||||
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
|
||||
:::
|
||||
:::{tab-item} Using `conda`
|
||||
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
|
||||
|
||||
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
|
||||
which defines the providers and their settings.
|
||||
Now let's build and run the Llama Stack config for Ollama.
|
||||
|
||||
```bash
|
||||
INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type conda --run
|
||||
```
|
||||
:::
|
||||
|
||||
:::{tab-item} Using a Container
|
||||
You can use a container image to run the Llama Stack server. We provide several container images for the server
|
||||
component that works with different inference providers out of the box. For this guide, we will use
|
||||
|
@ -101,8 +102,7 @@ with `host.containers.internal`.
|
|||
|
||||
The configuration YAML for the Ollama distribution is available at `distributions/ollama/run.yaml`.
|
||||
|
||||
```{admonition} Note
|
||||
:class: note
|
||||
```{tip}
|
||||
|
||||
Docker containers run in their own isolated network namespaces on Linux. To allow the container to communicate with services running on the host via `localhost`, you need `--network=host`. This makes the container use the host’s network directly so it can connect to Ollama running on `localhost:11434`.
|
||||
|
||||
|
@ -120,27 +120,32 @@ docker run -it \
|
|||
```
|
||||
:::
|
||||
::::
|
||||
You will see output like below:
|
||||
```
|
||||
INFO: Application startup complete.
|
||||
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
|
||||
```
|
||||
|
||||
Now you can use the Llama Stack client to run inference and build agents!
|
||||
|
||||
You can reuse the server setup or use the [Llama Stack Client](https://github.com/meta-llama/llama-stack-client-python/).
|
||||
Note that the client package is already included in the `llama-stack` package.
|
||||
|
||||
### ii. Using the Llama Stack Client
|
||||
## Step 3: Run Client CLI
|
||||
|
||||
Open a new terminal and navigate to the same directory you started the server from. Then set up a new or activate your
|
||||
existing server virtual environment.
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Reuse the Server Setup
|
||||
:::{tab-item} Reuse Server `venv`
|
||||
```bash
|
||||
# As mentioned, the client is included in the llama-stack package so we can just activate the server virtual environment
|
||||
# The client is included in the llama-stack package so we just activate the server venv
|
||||
source .venv/bin/activate
|
||||
```
|
||||
:::
|
||||
|
||||
:::{tab-item} Install the Llama Stack Client (venv)
|
||||
:::{tab-item} Install with `venv`
|
||||
```bash
|
||||
uv venv client --python 3.10
|
||||
source client/bin/activate
|
||||
|
@ -148,7 +153,7 @@ pip install llama-stack-client
|
|||
```
|
||||
:::
|
||||
|
||||
:::{tab-item} Install the Llama Stack Client (conda)
|
||||
:::{tab-item} Install with `conda`
|
||||
```bash
|
||||
yes | conda create -n stack-client python=3.10
|
||||
conda activate stack-client
|
||||
|
@ -157,7 +162,8 @@ pip install llama-stack-client
|
|||
:::
|
||||
::::
|
||||
|
||||
Now let's use the `llama-stack-client` CLI to check the connectivity to the server.
|
||||
Now let's use the `llama-stack-client` [CLI](../references/llama_stack_client_cli_reference.md) to check the
|
||||
connectivity to the server.
|
||||
|
||||
```bash
|
||||
llama-stack-client configure --endpoint http://localhost:8321 --api-key none
|
||||
|
@ -185,7 +191,14 @@ Total models: 2
|
|||
|
||||
```
|
||||
|
||||
## Step 4: Run Basic Inference
|
||||
## Step 4: Run the Demos
|
||||
|
||||
Note that these demos show the [Python Client SDK](../references/python_sdk_reference/index.md).
|
||||
Other SDKs are also available, please refer to the [Client SDK](../index.md#client-sdks) list for the complete options.
|
||||
|
||||
::::{tab-set}
|
||||
|
||||
:::{tab-item} Basic Inference with the CLI
|
||||
You can test basic Llama inference completion using the CLI.
|
||||
|
||||
```bash
|
||||
|
@ -208,6 +221,9 @@ ChatCompletionResponse(
|
|||
],
|
||||
)
|
||||
```
|
||||
:::
|
||||
|
||||
:::{tab-item} Basic Inference with a Script
|
||||
Alternatively, you can run inference using the Llama Stack client SDK.
|
||||
|
||||
### i. Create the Script
|
||||
|
@ -250,8 +266,9 @@ Lines of code unfold
|
|||
Logic flows through digital night
|
||||
Beauty in the bits
|
||||
```
|
||||
:::
|
||||
|
||||
## Step 5: Build a Simple Agent
|
||||
:::{tab-item} Build a Simple Agent
|
||||
Now we can move beyond simple inference and build an agent that can perform tasks using the Llama Stack server.
|
||||
### i. Create the Script
|
||||
Create a file `agent.py` and add the following code:
|
||||
|
@ -299,8 +316,8 @@ Let's run the script using `uv`
|
|||
```bash
|
||||
uv run python agent.py
|
||||
```
|
||||
:::{dropdown} `👋 Click here to see the sample output`
|
||||
```
|
||||
|
||||
```{dropdown} 👋 Click here to see the sample output
|
||||
Non-streaming ...
|
||||
agent> I'm an artificial intelligence designed to assist and communicate with users like you. I don't have a personal identity, but I'm here to provide information, answer questions, and help with tasks to the best of my abilities.
|
||||
|
||||
|
@ -417,11 +434,11 @@ I'm a computer program designed to simulate human-like conversations, using natu
|
|||
Think of me as a virtual companion, a helpful tool designed to make your interactions more efficient and enjoyable. I don't have personal opinions, emotions, or biases, but I'm here to provide accurate and informative responses to the best of my abilities.
|
||||
|
||||
So, who am I? I'm just a computer program designed to help you!
|
||||
|
||||
```
|
||||
:::
|
||||
|
||||
## Step 6: Build a RAG Agent
|
||||
:::{tab-item} Build a RAG Agent
|
||||
|
||||
For our last demo, we can build a RAG agent that can answer questions about the Torchtune project using the documents
|
||||
in a vector database.
|
||||
### i. Create the Script
|
||||
|
@ -505,9 +522,8 @@ Let's run the script using `uv`
|
|||
```bash
|
||||
uv run python rag_agent.py
|
||||
```
|
||||
:::{dropdown} `👋 Click here to see the sample output`
|
||||
|
||||
```
|
||||
```{dropdown} 👋 Click here to see the sample output
|
||||
user> what is torchtune
|
||||
inference> [knowledge_search(query='TorchTune')]
|
||||
tool_execution> Tool:knowledge_search Args:{'query': 'TorchTune'}
|
||||
|
@ -522,6 +538,8 @@ Overall, DORA is a powerful reinforcement learning algorithm that can learn comp
|
|||
```
|
||||
:::
|
||||
|
||||
::::
|
||||
|
||||
## Build Your Own Apps!
|
||||
|
||||
Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue