mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-30 07:39:38 +00:00
doc enhancements, converted md into jupyter, reorganize files
This commit is contained in:
parent
0f08f77565
commit
ecad16b904
13 changed files with 450 additions and 113 deletions
|
@ -1,111 +0,0 @@
|
|||
|
||||
# Getting Started with Llama Stack
|
||||
|
||||
This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps.
|
||||
|
||||
## Installation
|
||||
|
||||
The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path.
|
||||
|
||||
You can install this repository in two ways:
|
||||
|
||||
1. **Install as a package**:
|
||||
Install directly from [PyPI](https://pypi.org/project/llama-stack/) with:
|
||||
```bash
|
||||
pip install llama-stack
|
||||
```
|
||||
|
||||
2. **Install from source**:
|
||||
Follow these steps to install from the source code:
|
||||
```bash
|
||||
mkdir -p ~/local
|
||||
cd ~/local
|
||||
git clone git@github.com:meta-llama/llama-stack.git
|
||||
|
||||
conda create -n stack python=3.10
|
||||
conda activate stack
|
||||
|
||||
cd llama-stack
|
||||
$CONDA_PREFIX/bin/pip install -e .
|
||||
```
|
||||
|
||||
Refer to the [CLI Reference](./cli_reference.md) for details on Llama CLI commands.
|
||||
|
||||
## Starting Up Llama Stack Server
|
||||
|
||||
There are two ways to start the Llama Stack server:
|
||||
|
||||
1. **Using Docker**:
|
||||
We provide a pre-built Docker image of Llama Stack, available in the [distributions](../distributions/) folder.
|
||||
|
||||
> **Note:** For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
|
||||
```bash
|
||||
export LLAMA_CHECKPOINT_DIR=~/.llama
|
||||
```
|
||||
Download Llama models with:
|
||||
```
|
||||
llama download --model-id Llama3.1-8B-Instruct
|
||||
```
|
||||
Start a Docker container with:
|
||||
```bash
|
||||
cd llama-stack/distributions/meta-reference-gpu
|
||||
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
|
||||
```
|
||||
|
||||
**Tip:** For remote providers, use `docker compose up` with scripts in the [distributions folder](../distributions/).
|
||||
|
||||
2. **Build->Configure->Run via Conda**:
|
||||
For development, build a LlamaStack distribution from scratch.
|
||||
|
||||
**`llama stack build`**
|
||||
Enter build information interactively:
|
||||
```bash
|
||||
llama stack build
|
||||
```
|
||||
|
||||
**`llama stack configure`**
|
||||
Run `llama stack configure <name>` using the name from the build step.
|
||||
```bash
|
||||
llama stack configure my-local-stack
|
||||
```
|
||||
|
||||
**`llama stack run`**
|
||||
Start the server with:
|
||||
```bash
|
||||
llama stack run my-local-stack
|
||||
```
|
||||
|
||||
## Testing with Client
|
||||
|
||||
After setup, test the server with a client:
|
||||
```bash
|
||||
cd /path/to/llama-stack
|
||||
conda activate <env>
|
||||
|
||||
python -m llama_stack.apis.inference.client localhost 5000
|
||||
```
|
||||
|
||||
You can also send a POST request:
|
||||
```bash
|
||||
curl http://localhost:5000/inference/chat_completion \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "Llama3.1-8B-Instruct",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Write me a 2-sentence poem about the moon"}
|
||||
],
|
||||
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
|
||||
}'
|
||||
```
|
||||
|
||||
For testing safety, run:
|
||||
```bash
|
||||
python -m llama_stack.apis.safety.client localhost 5000
|
||||
```
|
||||
|
||||
Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin).
|
||||
|
||||
## Advanced Guides
|
||||
|
||||
For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide.
|
247
docs/zero_to_hero_guide/00_Inference101.ipynb
Normal file
247
docs/zero_to_hero_guide/00_Inference101.ipynb
Normal file
|
@ -0,0 +1,247 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c1e7571c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Llama Stack Inference Guide\n",
|
||||
"\n",
|
||||
"This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).\n",
|
||||
"\n",
|
||||
"### Table of Contents\n",
|
||||
"1. [Quickstart](#quickstart)\n",
|
||||
"2. [Building Effective Prompts](#building-effective-prompts)\n",
|
||||
"3. [Conversation Loop](#conversation-loop)\n",
|
||||
"4. [Conversation History](#conversation-history)\n",
|
||||
"5. [Streaming Responses](#streaming-responses)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "414301dc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Quickstart\n",
|
||||
"\n",
|
||||
"This section walks through each step to set up and make a simple text generation request.\n",
|
||||
"\n",
|
||||
"### 1. Set Up the Client\n",
|
||||
"\n",
|
||||
"Begin by importing the necessary components from Llama Stack’s client library:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7a573752",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"from llama_stack_client.types import SystemMessage, UserMessage\n",
|
||||
"\n",
|
||||
"client = LlamaStackClient(base_url='http://localhost:5000')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "86366383",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2. Create a Chat Completion Request\n",
|
||||
"\n",
|
||||
"Use the `chat_completion` function to define the conversation context. Each message you include should have a specific role and content:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "77c29dba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"response = client.inference.chat_completion(\n",
|
||||
" messages=[\n",
|
||||
" SystemMessage(content='You are a friendly assistant.', role='system'),\n",
|
||||
" UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
|
||||
" ],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(response.completion_message.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "e5f16949",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Building Effective Prompts\n",
|
||||
"\n",
|
||||
"Effective prompt creation (often called 'prompt engineering') is essential for quality responses. Here are best practices for structuring your prompts to get the most out of the Llama Stack model:\n",
|
||||
"\n",
|
||||
"1. **System Messages**: Use `SystemMessage` to set the model's behavior. This is similar to providing top-level instructions for tone, format, or specific behavior.\n",
|
||||
" - **Example**: `SystemMessage(content='You are a friendly assistant that explains complex topics simply.')`\n",
|
||||
"2. **User Messages**: Define the task or question you want to ask the model with a `UserMessage`. The clearer and more direct you are, the better the response.\n",
|
||||
" - **Example**: `UserMessage(content='Explain recursion in programming in simple terms.')`\n",
|
||||
"\n",
|
||||
"### Sample Prompt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5c6812da",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"response = client.inference.chat_completion(\n",
|
||||
" messages=[\n",
|
||||
" SystemMessage(content='You are shakespeare.', role='system'),\n",
|
||||
" UserMessage(content='Write a two-sentence poem about llama.', role='user')\n",
|
||||
" ],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"print(response.completion_message.content)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c8690ef0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conversation Loop\n",
|
||||
"\n",
|
||||
"To create a continuous conversation loop, where users can input multiple messages in a session, use the following structure. This example runs an asynchronous loop, ending when the user types 'exit,' 'quit,' or 'bye.'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "02211625",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"from llama_stack_client.types import UserMessage\n",
|
||||
"from termcolor import cprint\n",
|
||||
"\n",
|
||||
"client = LlamaStackClient(base_url='http://localhost:5000')\n",
|
||||
"\n",
|
||||
"async def chat_loop():\n",
|
||||
" while True:\n",
|
||||
" user_input = input('User> ')\n",
|
||||
" if user_input.lower() in ['exit', 'quit', 'bye']:\n",
|
||||
" cprint('Ending conversation. Goodbye!', 'yellow')\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
" message = UserMessage(content=user_input, role='user')\n",
|
||||
" response = client.inference.chat_completion(\n",
|
||||
" messages=[message],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
" )\n",
|
||||
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
|
||||
"\n",
|
||||
"asyncio.run(chat_loop())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8cf0d555",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conversation History\n",
|
||||
"\n",
|
||||
"Maintaining a conversation history allows the model to retain context from previous interactions. Use a list to accumulate messages, enabling continuity throughout the chat session."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "9496f75c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"async def chat_loop():\n",
|
||||
" conversation_history = []\n",
|
||||
" while True:\n",
|
||||
" user_input = input('User> ')\n",
|
||||
" if user_input.lower() in ['exit', 'quit', 'bye']:\n",
|
||||
" cprint('Ending conversation. Goodbye!', 'yellow')\n",
|
||||
" break\n",
|
||||
"\n",
|
||||
" user_message = UserMessage(content=user_input, role='user')\n",
|
||||
" conversation_history.append(user_message)\n",
|
||||
"\n",
|
||||
" response = client.inference.chat_completion(\n",
|
||||
" messages=conversation_history,\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
" )\n",
|
||||
" cprint(f'> Response: {response.completion_message.content}', 'cyan')\n",
|
||||
"\n",
|
||||
" assistant_message = UserMessage(content=response.completion_message.content, role='user')\n",
|
||||
" conversation_history.append(assistant_message)\n",
|
||||
"\n",
|
||||
"asyncio.run(chat_loop())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "03fcf5e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Streaming Responses\n",
|
||||
"\n",
|
||||
"Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.\n",
|
||||
"\n",
|
||||
"### Example: Streaming Responses"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d119026e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
|
||||
"from llama_stack_client.types import UserMessage\n",
|
||||
"from termcolor import cprint\n",
|
||||
"\n",
|
||||
"async def run_main(stream: bool = True):\n",
|
||||
" client = LlamaStackClient(base_url='http://localhost:5000')\n",
|
||||
"\n",
|
||||
" message = UserMessage(\n",
|
||||
" content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
|
||||
" )\n",
|
||||
" print(f'User>{message.content}', 'green')\n",
|
||||
"\n",
|
||||
" response = client.inference.chat_completion(\n",
|
||||
" messages=[message],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
" stream=stream,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" if not stream:\n",
|
||||
" cprint(f'> Response: {response}', 'cyan')\n",
|
||||
" else:\n",
|
||||
" async for log in EventLogger().log(response):\n",
|
||||
" log.print()\n",
|
||||
"\n",
|
||||
" models_response = client.models.list()\n",
|
||||
" print(models_response)\n",
|
||||
"\n",
|
||||
"if __name__ == '__main__':\n",
|
||||
" asyncio.run(run_main())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
201
docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb
Normal file
201
docs/zero_to_hero_guide/00_Local_Cloud_Inference101.ipynb
Normal file
|
@ -0,0 +1,201 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a0ed972d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Switching between Local and Cloud Model with Llama Stack\n",
|
||||
"\n",
|
||||
"This guide provides a streamlined setup to switch between local and cloud clients for text generation with Llama Stack’s `chat_completion` API. This setup enables automatic fallback to a cloud instance if the local client is unavailable.\n",
|
||||
"\n",
|
||||
"### Pre-requisite\n",
|
||||
"Before you begin, please ensure Llama Stack is installed and the distribution is set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/). You will need to run two distributions, a local and a cloud distribution, for this demo to work.\n",
|
||||
"\n",
|
||||
"### Implementation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "df89cff7",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 1. Set Up Local and Cloud Clients\n",
|
||||
"\n",
|
||||
"Initialize both clients, specifying the `base_url` for each instance. In this case, we have the local distribution running on `http://localhost:5000` and the cloud distribution running on `http://localhost:5001`.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "7f868dfe",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"\n",
|
||||
"# Configure local and cloud clients\n",
|
||||
"local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
|
||||
"cloud_client = LlamaStackClient(base_url='http://localhost:5001')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "894689c1",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 2. Client Selection with Fallback\n",
|
||||
"\n",
|
||||
"The `select_client` function checks if the local client is available using a lightweight `/health` check. If the local client is unavailable, it automatically switches to the cloud client.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ff0c8277",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import httpx\n",
|
||||
"from termcolor import cprint\n",
|
||||
"\n",
|
||||
"async def select_client() -> LlamaStackClient:\n",
|
||||
" \"\"\"Use local client if available; otherwise, switch to cloud client.\"\"\"\n",
|
||||
" try:\n",
|
||||
" async with httpx.AsyncClient() as http_client:\n",
|
||||
" response = await http_client.get(f'{local_client.base_url}/health')\n",
|
||||
" if response.status_code == 200:\n",
|
||||
" cprint('Using local client.', 'yellow')\n",
|
||||
" return local_client\n",
|
||||
" except httpx.RequestError:\n",
|
||||
" pass\n",
|
||||
" cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
|
||||
" return cloud_client"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9ccfe66f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 3. Generate a Response\n",
|
||||
"\n",
|
||||
"After selecting the client, you can generate text using `chat_completion`. This example sends a sample prompt to the model and prints the response.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5e19cc20",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_stack_client.types import UserMessage\n",
|
||||
"\n",
|
||||
"async def get_llama_response(stream: bool = True):\n",
|
||||
" client = await select_client() # Selects the available client\n",
|
||||
" message = UserMessage(content='hello world, write me a 2 sentence poem about the moon', role='user')\n",
|
||||
" cprint(f'User> {message.content}', 'green')\n",
|
||||
"\n",
|
||||
" response = client.inference.chat_completion(\n",
|
||||
" messages=[message],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
" stream=stream,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" if not stream:\n",
|
||||
" cprint(f'> Response: {response}', 'cyan')\n",
|
||||
" else:\n",
|
||||
" # Stream tokens progressively\n",
|
||||
" async for log in EventLogger().log(response):\n",
|
||||
" log.print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6edf5e57",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### 4. Run the Asynchronous Response Generation\n",
|
||||
"\n",
|
||||
"Use `asyncio.run()` to execute `get_llama_response` in an asynchronous event loop.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c10f487e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"\n",
|
||||
"# Initiate the response generation process\n",
|
||||
"asyncio.run(get_llama_response())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "56aa9a09",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complete code\n",
|
||||
"Summing it up, here's the complete code for local-cloud model implementation with Llama Stack:\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d9fd74ff",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import asyncio\n",
|
||||
"import httpx\n",
|
||||
"from llama_stack_client import LlamaStackClient\n",
|
||||
"from llama_stack_client.lib.inference.event_logger import EventLogger\n",
|
||||
"from llama_stack_client.types import UserMessage\n",
|
||||
"from termcolor import cprint\n",
|
||||
"\n",
|
||||
"local_client = LlamaStackClient(base_url='http://localhost:5000')\n",
|
||||
"cloud_client = LlamaStackClient(base_url='http://localhost:5001')\n",
|
||||
"\n",
|
||||
"async def select_client() -> LlamaStackClient:\n",
|
||||
" try:\n",
|
||||
" async with httpx.AsyncClient() as http_client:\n",
|
||||
" response = await http_client.get(f'{local_client.base_url}/health')\n",
|
||||
" if response.status_code == 200:\n",
|
||||
" cprint('Using local client.', 'yellow')\n",
|
||||
" return local_client\n",
|
||||
" except httpx.RequestError:\n",
|
||||
" pass\n",
|
||||
" cprint('Local client unavailable. Switching to cloud client.', 'yellow')\n",
|
||||
" return cloud_client\n",
|
||||
"\n",
|
||||
"async def get_llama_response(stream: bool = True):\n",
|
||||
" client = await select_client()\n",
|
||||
" message = UserMessage(\n",
|
||||
" content='hello world, write me a 2 sentence poem about the moon', role='user'\n",
|
||||
" )\n",
|
||||
" cprint(f'User> {message.content}', 'green')\n",
|
||||
"\n",
|
||||
" response = client.inference.chat_completion(\n",
|
||||
" messages=[message],\n",
|
||||
" model='Llama3.2-11B-Vision-Instruct',\n",
|
||||
" stream=stream,\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" if not stream:\n",
|
||||
" cprint(f'> Response: {response}', 'cyan')\n",
|
||||
" else:\n",
|
||||
" async for log in EventLogger().log(response):\n",
|
||||
" log.print()\n",
|
||||
"\n",
|
||||
"asyncio.run(get_llama_response())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
|
@ -1,7 +1,7 @@
|
|||
|
||||
# Llama Stack Text Generation Guide
|
||||
# Llama Stack Inference Guide
|
||||
|
||||
This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).
|
||||
This document provides instructions on how to use Llama Stack's `chat_completion` function for generating text using the `Llama3.2-11B-Vision-Instruct` model. Before you begin, please ensure Llama Stack is installed and set up by following the [Getting Started Guide](https://llama-stack.readthedocs.io/en/latest/).
|
||||
|
||||
### Table of Contents
|
||||
1. [Quickstart](#quickstart)
|
Loading…
Add table
Add a link
Reference in a new issue