From 36d3520bf2b2a2a1db1fa1051014330a56f84143 Mon Sep 17 00:00:00 2001 From: Kai Wu Date: Tue, 5 Nov 2024 11:10:08 -0800 Subject: [PATCH] refactor notebooks --- ...e_chat101.ipynb => 01_Image_Chat101.ipynb} | 0 ...ling101.ipynb => 02_Tool_Calling101.ipynb} | 0 docs/{Memory101.ipynb => 03_Memory101.ipynb} | 0 docs/{safety101.ipynb => 04_Safety101.ipynb} | 6 +- docs/{agents101.ipynb => 05_Agents101.ipynb} | 0 docs/{ => _static}/safety_system.webp | Bin docs/getting_started.md | 256 ------------------ 7 files changed, 3 insertions(+), 259 deletions(-) rename docs/{image_chat101.ipynb => 01_Image_Chat101.ipynb} (100%) rename docs/{tool_Calling101.ipynb => 02_Tool_Calling101.ipynb} (100%) rename docs/{Memory101.ipynb => 03_Memory101.ipynb} (100%) rename docs/{safety101.ipynb => 04_Safety101.ipynb} (97%) rename docs/{agents101.ipynb => 05_Agents101.ipynb} (100%) rename docs/{ => _static}/safety_system.webp (100%) delete mode 100644 docs/getting_started.md diff --git a/docs/image_chat101.ipynb b/docs/01_Image_Chat101.ipynb similarity index 100% rename from docs/image_chat101.ipynb rename to docs/01_Image_Chat101.ipynb diff --git a/docs/tool_Calling101.ipynb b/docs/02_Tool_Calling101.ipynb similarity index 100% rename from docs/tool_Calling101.ipynb rename to docs/02_Tool_Calling101.ipynb diff --git a/docs/Memory101.ipynb b/docs/03_Memory101.ipynb similarity index 100% rename from docs/Memory101.ipynb rename to docs/03_Memory101.ipynb diff --git a/docs/safety101.ipynb b/docs/04_Safety101.ipynb similarity index 97% rename from docs/safety101.ipynb rename to docs/04_Safety101.ipynb index 095eb0fa2..4d86d9a8a 100644 --- a/docs/safety101.ipynb +++ b/docs/04_Safety101.ipynb @@ -9,7 +9,7 @@ "This document talks about the Safety APIs in Llama Stack.\n", "\n", "As outlined in our [Responsible Use Guide](https://www.llama.com/docs/how-to-guides/responsible-use-guide-resources/), LLM apps should deploy appropriate system level safeguards to mitigate safety and security risks of LLM system, similar to the following diagram:\n", - "![Figure 1: Safety System](./safety_system.webp)\n", + "![Figure 1: Safety System](./_static/safety_system.webp)\n", "\n", "To that goal, Llama Stack uses **Prompt Guard** and **Llama Guard 3** to secure our system. Here are the quick introduction about them." ] @@ -81,12 +81,12 @@ "\n", "Now, you can start the server by `llama stack run my-local-stack --port 5000`\n", "\n", - "After the server started, you can test safety (if you configured llama-guard and/or prompt-guard shields) by:" + "After the server started, you can test safety example using the follow code:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 1, "metadata": {}, "outputs": [ { diff --git a/docs/agents101.ipynb b/docs/05_Agents101.ipynb similarity index 100% rename from docs/agents101.ipynb rename to docs/05_Agents101.ipynb diff --git a/docs/safety_system.webp b/docs/_static/safety_system.webp similarity index 100% rename from docs/safety_system.webp rename to docs/_static/safety_system.webp diff --git a/docs/getting_started.md b/docs/getting_started.md deleted file mode 100644 index b2c06b54d..000000000 --- a/docs/getting_started.md +++ /dev/null @@ -1,256 +0,0 @@ -# Getting Started with Llama Stack - -This guide will walk you though the steps to get started on end-to-end flow for LlamaStack. This guide mainly focuses on getting started with building a LlamaStack distribution, and starting up a LlamaStack server. Please see our [documentations](../README.md) on what you can do with Llama Stack, and [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) on examples apps built with Llama Stack. - -## Installation -The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. - -You have two ways to install this repository: - -1. **Install as a package**: - You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command: - ```bash - pip install llama-stack - ``` - -2. **Install from source**: - If you prefer to install from the source code, follow these steps: - ```bash - mkdir -p ~/local - cd ~/local - git clone git@github.com:meta-llama/llama-stack.git - - conda create -n stack python=3.10 - conda activate stack - - cd llama-stack - $CONDA_PREFIX/bin/pip install -e . - ``` - -For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md). - -## Starting Up Llama Stack Server - -You have two ways to start up Llama stack server: - -1. **Starting up server via docker**: - -We provide pre-built Docker image of Llama Stack distribution, which can be found in the following links in the [distributions](../distributions/) folder. - -> [!NOTE] -> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container. -``` -export LLAMA_CHECKPOINT_DIR=~/.llama -``` - -> [!NOTE] -> `~/.llama` should be the path containing downloaded weights of Llama models. - -To download llama models, use -``` -llama download --model-id Llama3.1-8B-Instruct -``` - -To download and start running a pre-built docker container, you may use the following commands: - -``` -cd llama-stack/distributions/meta-reference-gpu -docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml -``` - -> [!TIP] -> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started. - - -2. **Build->Configure->Run Llama Stack server via conda**: - - You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack. - - **`llama stack build`** - - You'll be prompted to enter build information interactively. - ``` - llama stack build - > Enter a name for your Llama Stack (e.g. my-local-stack): my-local-stack - > Enter the image type you want your Llama Stack to be built as (docker or conda): conda - - Llama Stack is composed of several APIs working together. Let's select - the provider types (implementations) you want to use for these APIs. - - Tip: use to see options for the providers. - - > Enter provider for API inference: meta-reference - > Enter provider for API safety: meta-reference - > Enter provider for API agents: meta-reference - > Enter provider for API memory: meta-reference - > Enter provider for API datasetio: meta-reference - > Enter provider for API scoring: meta-reference - > Enter provider for API eval: meta-reference - > Enter provider for API telemetry: meta-reference - - > (Optional) Enter a short description for your Llama Stack: - Conda environment 'llamastack-my-local-stack' does not exist. Creating with Python 3.10... - ... - - Build spec configuration saved at ~/.conda/envsllamastack-my-local-stack/my-local-stack-build.yaml - You can now run `llama stack configure my-local-stack` - ``` - - **`llama stack configure`** - - Run `llama stack configure ` with the name you have previously defined in `build` step. - ``` - llama stack configure - ``` - - You will be prompted to enter configurations for your Llama Stack - - ``` - $ llama stack configure my-local-stack - - llama stack configure my-local-stack - Using ~/.conda/envsllamastack-my-local-stack/my-local-stack-build.yaml... - - Llama Stack is composed of several APIs working together. For each API served by the Stack, - we need to configure the providers (implementations) you want to use for these APIs. - - Configuring API `inference`... - > Configuring provider `(meta-reference)` - Enter value for model (default: Llama3.2-3B-Instruct) (required): Llama3.2-3B-Instruct - Enter value for torch_seed (optional): - Enter value for max_seq_len (default: 4096) (required): - Enter value for max_batch_size (default: 1) (required): - Enter value for create_distributed_process_group (default: True) (required): - Enter value for checkpoint_dir (optional): - - Configuring API `safety`... - > Configuring provider `(meta-reference)` - Do you want to configure llama_guard_shield? (y/n): y - Entering sub-configuration for llama_guard_shield: - Enter value for model (default: Llama-Guard-3-1B) (required): - Enter value for excluded_categories (default: []) (required): - Enter value for enable_prompt_guard (default: False) (optional): - - Configuring API `agents`... - > Configuring provider `(meta-reference)` - Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): - - Configuring SqliteKVStoreConfig: - Enter value for namespace (optional): - Enter value for db_path (default: /home/kaiwu/.llama/runtime/kvstore.db) (required): - - Configuring API `memory`... - > Configuring provider `(meta-reference)` - - Configuring API `datasetio`... - > Configuring provider `(meta-reference)` - - Configuring API `scoring`... - > Configuring provider `(meta-reference)` - - Configuring API `eval`... - > Configuring provider `(meta-reference)` - - Configuring API `telemetry`... - > Configuring provider `(meta-reference)` - - > YAML configuration has been written to `/home/kaiwu/.llama/builds/conda/my-local-stack-run.yaml`. - You can now run `llama stack run my-local-stack --port PORT` - ``` - - **`llama stack run`** - - Run `llama stack run ` with the name you have previously defined. - ``` - llama stack run my-local-stack - - ... - > initializing model parallel with size 1 - > initializing ddp with size 1 - > initializing pipeline with size 1 - ... - Finished model load YES READY - Serving POST /inference/chat_completion - Serving POST /inference/completion - Serving POST /inference/embeddings - Serving POST /memory_banks/create - Serving DELETE /memory_bank/documents/delete - Serving DELETE /memory_banks/drop - Serving GET /memory_bank/documents/get - Serving GET /memory_banks/get - Serving POST /memory_bank/insert - Serving GET /memory_banks/list - Serving POST /memory_bank/query - Serving POST /memory_bank/update - Serving POST /safety/run_shield - Serving POST /agentic_system/create - Serving POST /agentic_system/session/create - Serving POST /agentic_system/turn/create - Serving POST /agentic_system/delete - Serving POST /agentic_system/session/delete - Serving POST /agentic_system/session/get - Serving POST /agentic_system/step/get - Serving POST /agentic_system/turn/get - Serving GET /telemetry/get_trace - Serving POST /telemetry/log_event - Listening on :::5000 - INFO: Started server process [587053] - INFO: Waiting for application startup. - INFO: Application startup complete. - INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) - ``` - - -## Testing with client -Once the server is setup, we can test it with a client to see the example outputs. -``` -cd /path/to/llama-stack -conda activate # any environment containing the llama-stack pip package will work - -python -m llama_stack.apis.inference.client localhost 5000 -``` - -This will run the chat completion client and query the distribution’s `/inference/chat_completion` API. - -Here is an example output: -``` -User>hello world, write me a 2 sentence poem about the moon -Assistant> Here's a 2-sentence poem about the moon: - -The moon glows softly in the midnight sky, -A beacon of wonder, as it passes by. -``` - -You may also send a POST request to the server: -``` -curl http://localhost:5000/inference/chat_completion \ --H "Content-Type: application/json" \ --d '{ - "model": "Llama3.2-3B-Instruct", - "messages": [ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "Write me a 2 sentence poem about the moon"} - ], - "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} -}' - -Output: -{'completion_message': {'role': 'assistant', - 'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.', - 'stop_reason': 'out_of_tokens', - 'tool_calls': []}, - 'logprobs': null} - -``` - - -Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by: - -``` -python -m llama_stack.apis.safety.client localhost 5000 -``` - - -Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications. - -You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo. - - -## Advanced Guides -Please see our [Building a LLama Stack Distribution](./building_distro.md) guide for more details on how to assemble your own Llama Stack Distribution.