mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-05 10:13:05 +00:00
adding server example back in and restructuring steps
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
662483f360
commit
0961987962
2 changed files with 18 additions and 13 deletions
|
@ -2,27 +2,32 @@
|
||||||
|
|
||||||
Get started with Llama Stack in minutes!
|
Get started with Llama Stack in minutes!
|
||||||
|
|
||||||
|
Llama Stack is a stateful service with REST APIs to support the seamless transition of AI applications across different
|
||||||
|
environments. You can build and test using a local server first and deploy to a hosted endpoint for production.
|
||||||
|
|
||||||
In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)
|
In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/)
|
||||||
as the inference [provider](../providers/index.md#inference) for a Llama Model.
|
as the inference [provider](../providers/index.md#inference) for a Llama Model.
|
||||||
|
|
||||||
## Step 1. Install [uv](https://docs.astral.sh/uv/) and setup your virtual environment
|
## Step 1. Install and Setup
|
||||||
|
Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with
|
||||||
|
[Ollama](https://ollama.com/download).
|
||||||
```bash
|
```bash
|
||||||
uv pip install llama-stack aiosqlite faiss-cpu ollama \
|
uv pip install llama-stack aiosqlite faiss-cpu ollama openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals
|
||||||
openai datasets opentelemetry-exporter-otlp-proto-http mcp autoevals
|
|
||||||
source .venv/bin/activate
|
source .venv/bin/activate
|
||||||
export INFERENCE_MODEL="llama3.2:3b"
|
export INFERENCE_MODEL="llama3.2:3b"
|
||||||
```
|
|
||||||
## Step 2: Run inference locally with Ollama
|
|
||||||
```bash
|
|
||||||
# make sure to run this in a separate terminal
|
|
||||||
ollama run llama3.2:3b --keepalive 60m
|
ollama run llama3.2:3b --keepalive 60m
|
||||||
```
|
```
|
||||||
|
## Step 2: Run the Llama Stack Server
|
||||||
|
```bash
|
||||||
|
INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run
|
||||||
|
```
|
||||||
## Step 3: Run the Demo
|
## Step 3: Run the Demo
|
||||||
You can run this with `uv run demo_script.py` or in an interactive shell.
|
Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell.
|
||||||
```python
|
```python
|
||||||
from termcolor import cprint
|
from termcolor import cprint
|
||||||
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
|
|
||||||
from llama_stack_client.types import Document
|
from llama_stack_client.types import Document
|
||||||
|
from llama_stack_client import LlamaStackClient
|
||||||
|
|
||||||
|
|
||||||
vector_db = "faiss"
|
vector_db = "faiss"
|
||||||
vector_db_id = "test-vector-db"
|
vector_db_id = "test-vector-db"
|
||||||
|
@ -37,8 +42,7 @@ documents = [
|
||||||
)
|
)
|
||||||
]
|
]
|
||||||
|
|
||||||
client = LlamaStackAsLibraryClient("ollama")
|
client = LlamaStackClient(base_url="http://localhost:8321")
|
||||||
_ = client.initialize()
|
|
||||||
client.vector_dbs.register(
|
client.vector_dbs.register(
|
||||||
provider_id=vector_db,
|
provider_id=vector_db,
|
||||||
vector_db_id=vector_db_id,
|
vector_db_id=vector_db_id,
|
||||||
|
@ -65,7 +69,7 @@ for chunk in response.content:
|
||||||
cprint("" + "-" * 50, "yellow")
|
cprint("" + "-" * 50, "yellow")
|
||||||
```
|
```
|
||||||
And you should see output like below.
|
And you should see output like below.
|
||||||
```bash
|
```
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch?
|
Query> Can you give me the arxiv link for Lora Fine Tuning in Pytorch?
|
||||||
--------------------------------------------------
|
--------------------------------------------------
|
||||||
|
|
|
@ -1,3 +1,5 @@
|
||||||
|
# Llama Stack
|
||||||
|
Welcome to Llama Stack, the open-source framework for building generative AI applications.
|
||||||
```{admonition} Llama 4 is here!
|
```{admonition} Llama 4 is here!
|
||||||
:class: tip
|
:class: tip
|
||||||
|
|
||||||
|
@ -9,7 +11,6 @@ Check out [Getting Started with Llama 4](https://colab.research.google.com/githu
|
||||||
Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details.
|
Llama Stack {{ llama_stack_version }} is now available! See the {{ llama_stack_version_link }} for more details.
|
||||||
```
|
```
|
||||||
|
|
||||||
# Llama Stack
|
|
||||||
|
|
||||||
## What is Llama Stack?
|
## What is Llama Stack?
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue