refactor notebooks

This commit is contained in:
Kai Wu 2024-11-05 11:10:08 -08:00
parent 6d3769aedd
commit 36d3520bf2
7 changed files with 3 additions and 259 deletions

View file

@ -9,7 +9,7 @@
"This document talks about the Safety APIs in Llama Stack.\n",
"\n",
"As outlined in our [Responsible Use Guide](https://www.llama.com/docs/how-to-guides/responsible-use-guide-resources/), LLM apps should deploy appropriate system level safeguards to mitigate safety and security risks of LLM system, similar to the following diagram:\n",
"![Figure 1: Safety System](./safety_system.webp)\n",
"![Figure 1: Safety System](./_static/safety_system.webp)\n",
"\n",
"To that goal, Llama Stack uses **Prompt Guard** and **Llama Guard 3** to secure our system. Here are the quick introduction about them."
]
@ -81,12 +81,12 @@
"\n",
"Now, you can start the server by `llama stack run my-local-stack --port 5000`\n",
"\n",
"After the server started, you can test safety (if you configured llama-guard and/or prompt-guard shields) by:"
"After the server started, you can test safety example using the follow code:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"metadata": {},
"outputs": [
{

View file

Before

Width:  |  Height:  |  Size: 31 KiB

After

Width:  |  Height:  |  Size: 31 KiB

Before After
Before After

View file

@ -1,256 +0,0 @@
# Getting Started with Llama Stack
This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](../README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack.
## Installation
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
You have two ways to install this repository:
1. **Install as a package**:
You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
```bash
pip install llama-stack
```
2. **Install from source**:
If you prefer to install from the source code, follow these steps:
```bash
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n stack python=3.10
conda activate stack
cd llama-stack
$CONDA_PREFIX/bin/pip install -e .
```
For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
## Starting Up Llama Stack Server
You have two ways to start up Llama stack server:
1. **Starting up server via docker**:
We provide pre-built Docker image of Llama Stack distribution, which can be found in the following links in the [distributions](../distributions/) folder.
> [!NOTE]
> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.
```
export LLAMA_CHECKPOINT_DIR=~/.llama
```
> [!NOTE]
> `~/.llama` should be the path containing downloaded weights of Llama models.
To download llama models, use
```
llama download --model-id Llama3.1-8B-Instruct
```
To download and start running a pre-built docker container, you may use the following commands:
```
cd llama-stack/distributions/meta-reference-gpu
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
```
> [!TIP]
> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started.
2. **Build->Configure->Run Llama Stack server via conda**:
You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
**`llama stack build`**
- You'll be prompted to enter build information interactively.
```
llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack): my-local-stack
> Enter the image type you want your Llama Stack to be built as (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's select
the provider types (implementations) you want to use for these APIs.
Tip: use <TAB> to see options for the providers.
> Enter provider for API inference: meta-reference
> Enter provider for API safety: meta-reference
> Enter provider for API agents: meta-reference
> Enter provider for API memory: meta-reference
> Enter provider for API datasetio: meta-reference
> Enter provider for API scoring: meta-reference
> Enter provider for API eval: meta-reference
> Enter provider for API telemetry: meta-reference
> (Optional) Enter a short description for your Llama Stack:
Conda environment 'llamastack-my-local-stack' does not exist. Creating with Python 3.10...
...
Build spec configuration saved at ~/.conda/envsllamastack-my-local-stack/my-local-stack-build.yaml
You can now run `llama stack configure my-local-stack`
```
**`llama stack configure`**
- Run `llama stack configure <name>` with the name you have previously defined in `build` step.
```
llama stack configure <name>
```
- You will be prompted to enter configurations for your Llama Stack
```
$ llama stack configure my-local-stack
llama stack configure my-local-stack
Using ~/.conda/envsllamastack-my-local-stack/my-local-stack-build.yaml...
Llama Stack is composed of several APIs working together. For each API served by the Stack,
we need to configure the providers (implementations) you want to use for these APIs.
Configuring API `inference`...
> Configuring provider `(meta-reference)`
Enter value for model (default: Llama3.2-3B-Instruct) (required): Llama3.2-3B-Instruct
Enter value for torch_seed (optional):
Enter value for max_seq_len (default: 4096) (required):
Enter value for max_batch_size (default: 1) (required):
Enter value for create_distributed_process_group (default: True) (required):
Enter value for checkpoint_dir (optional):
Configuring API `safety`...
> Configuring provider `(meta-reference)`
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-1B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for enable_prompt_guard (default: False) (optional):
Configuring API `agents`...
> Configuring provider `(meta-reference)`
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
Configuring SqliteKVStoreConfig:
Enter value for namespace (optional):
Enter value for db_path (default: /home/kaiwu/.llama/runtime/kvstore.db) (required):
Configuring API `memory`...
> Configuring provider `(meta-reference)`
Configuring API `datasetio`...
> Configuring provider `(meta-reference)`
Configuring API `scoring`...
> Configuring provider `(meta-reference)`
Configuring API `eval`...
> Configuring provider `(meta-reference)`
Configuring API `telemetry`...
> Configuring provider `(meta-reference)`
> YAML configuration has been written to `/home/kaiwu/.llama/builds/conda/my-local-stack-run.yaml`.
You can now run `llama stack run my-local-stack --port PORT`
```
**`llama stack run`**
- Run `llama stack run <name>` with the name you have previously defined.
```
llama stack run my-local-stack
...
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
...
Finished model load YES READY
Serving POST /inference/chat_completion
Serving POST /inference/completion
Serving POST /inference/embeddings
Serving POST /memory_banks/create
Serving DELETE /memory_bank/documents/delete
Serving DELETE /memory_banks/drop
Serving GET /memory_bank/documents/get
Serving GET /memory_banks/get
Serving POST /memory_bank/insert
Serving GET /memory_banks/list
Serving POST /memory_bank/query
Serving POST /memory_bank/update
Serving POST /safety/run_shield
Serving POST /agentic_system/create
Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete
Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get
Serving GET /telemetry/get_trace
Serving POST /telemetry/log_event
Listening on :::5000
INFO: Started server process [587053]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
```
## Testing with client
Once the server is setup, we can test it with a client to see the example outputs.
```
cd /path/to/llama-stack
conda activate <env> # any environment containing the llama-stack pip package will work
python -m llama_stack.apis.inference.client localhost 5000
```
This will run the chat completion client and query the distributions `/inference/chat_completion` API.
Here is an example output:
```
User>hello world, write me a 2 sentence poem about the moon
Assistant> Here's a 2-sentence poem about the moon:
The moon glows softly in the midnight sky,
A beacon of wonder, as it passes by.
```
You may also send a POST request to the server:
```
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
"model": "Llama3.2-3B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
],
"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}'
Output:
{'completion_message': {'role': 'assistant',
'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
'stop_reason': 'out_of_tokens',
'tool_calls': []},
'logprobs': null}
```
Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:
```
python -m llama_stack.apis.safety.client localhost 5000
```
Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
## Advanced Guides
Please see our [Building a LLama Stack Distribution](./building_distro.md) guide for more details on how to assemble your own Llama Stack Distribution.