quickstart guide (might be dated)

2025-10-14 22:33:48 +00:00 · 2024-10-31 15:49:05 -07:00 · 2024-10-31 15:49:05 -07:00 · b514f1ec3a
commit b514f1ec3a
parent 703d7ebb6e
2 changed files with 210 additions and 0 deletions
--- a/docs/source/comprehensive-start.md
+++ b/docs/source/comprehensive-start.md
@ -0,0 +1,111 @@
+
+# Getting Started with Llama Stack
+
+This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps.
+
+## Installation
+
+The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path.
+
+You can install this repository in two ways:
+
+1. **Install as a package**:
+   Install directly from [PyPI](https://pypi.org/project/llama-stack/) with:
+   ```bash
+   pip install llama-stack
+   ```
+
+2. **Install from source**:
+   Follow these steps to install from the source code:
+   ```bash
+   mkdir -p ~/local
+   cd ~/local
+   git clone git@github.com:meta-llama/llama-stack.git
+
+   conda create -n stack python=3.10
+   conda activate stack
+
+   cd llama-stack
+   $CONDA_PREFIX/bin/pip install -e .
+   ```
+
+Refer to the [CLI Reference](./cli_reference.md) for details on Llama CLI commands.
+
+## Starting Up Llama Stack Server
+
+There are two ways to start the Llama Stack server:
+
+1. **Using Docker**:
+   We provide a pre-built Docker image of Llama Stack, available in the [distributions](../distributions/) folder.
+
+   > **Note:** For GPU inference, set environment variables to specify the local directory with your model checkpoints and enable GPU inference.
+   ```bash
+   export LLAMA_CHECKPOINT_DIR=~/.llama
+   ```
+   Download Llama models with:
+   ```
+   llama download --model-id Llama3.1-8B-Instruct
+   ```
+   Start a Docker container with:
+   ```bash
+   cd llama-stack/distributions/meta-reference-gpu
+   docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
+   ```
+
+   **Tip:** For remote providers, use `docker compose up` with scripts in the [distributions folder](../distributions/).
+
+2. **Build->Configure->Run via Conda**:
+   For development, build a LlamaStack distribution from scratch.
+
+   **`llama stack build`**
+   Enter build information interactively:
+   ```bash
+   llama stack build
+   ```
+
+   **`llama stack configure`**
+   Run `llama stack configure <name>` using the name from the build step.
+   ```bash
+   llama stack configure my-local-stack
+   ```
+
+   **`llama stack run`**
+   Start the server with:
+   ```bash
+   llama stack run my-local-stack
+   ```
+
+## Testing with Client
+
+After setup, test the server with a client:
+```bash
+cd /path/to/llama-stack
+conda activate <env>
+
+python -m llama_stack.apis.inference.client localhost 5000
+```
+
+You can also send a POST request:
+```bash
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "Llama3.1-8B-Instruct",
+    "messages": [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
+    ],
+    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+```
+
+For testing safety, run:
+```bash
+python -m llama_stack.apis.safety.client localhost 5000
+```
+
+Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin).
+
+## Advanced Guides
+
+For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide.
--- a/docs/source/quickstart.md
+++ b/docs/source/quickstart.md
@ -0,0 +1,99 @@
+
+# Quickstart
+
+This guide will walk you through the steps to set up an end-to-end workflow with Llama Stack. It focuses on building a Llama Stack distribution and starting up a Llama Stack server. See our [documentation](../README.md) for more on Llama Stack's capabilities, or visit [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main) for example apps.
+
+
+## 0. Prerequsite
+Feel free to skip this step if you already have the prerequsite installed.
+
+1. conda (steps to install)
+2.
+
+
+## 1. Installation
+
+The `llama` CLI tool helps you manage the Llama toolchain & agentic systems. After installing the `llama-stack` package, the `llama` command should be available in your path.
+
+**Install as a package**:
+   Install directly from [PyPI](https://pypi.org/project/llama-stack/) with:
+   ```bash
+   pip install llama-stack
+   ```
+
+## 2. Download Llama models:
+
+
+   ```
+   llama download --model-id Llama3.1-8B-Instruct
+   ```
+   You will have to follow the instructions in the cli to complete the download, get a instant license here: URL to license.
+
+## 3. Build->Configure->Run via Conda:
+   For development, build a LlamaStack distribution from scratch.
+
+   **`llama stack build`**
+   Enter build information interactively:
+   ```bash
+   llama stack build
+   ```
+
+   **`llama stack configure`**
+   Run `llama stack configure <name>` using the name from the build step.
+   ```bash
+   llama stack configure my-local-stack
+   ```
+
+   **`llama stack run`**
+   Start the server with:
+   ```bash
+   llama stack run my-local-stack
+   ```
+
+## 4. Testing with Client
+
+After setup, test the server with a POST request:
+```bash
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "Llama3.1-8B-Instruct",
+    "messages": [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
+    ],
+    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+```
+
+
+## 5. Inference
+
+After setup, test the server with a POST request:
+```bash
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "Llama3.1-8B-Instruct",
+    "messages": [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write me a 2-sentence poem about the moon"}
+    ],
+    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+```
+
+
+
+Check our client SDKs for various languages: [Python](https://github.com/meta-llama/llama-stack-client-python), [Node](https://github.com/meta-llama/llama-stack-client-node), [Swift](https://github.com/meta-llama/llama-stack-client-swift), and [Kotlin](https://github.com/meta-llama/llama-stack-client-kotlin).
+
+## Advanced Guides
+
+For more on custom Llama Stack distributions, refer to our [Building a Llama Stack Distribution](./building_distro.md) guide.
+
+
+## Next Steps:
+check out
+
+1.
+2.