forked from phoenix-oss/llama-stack-mirror
		
	added templates and enhanced readme (#307)
Co-authored-by: Justin Lee <justinai@fb.com>
This commit is contained in:
		
							parent
							
								
									3e1c3fdb3f
								
							
						
					
					
						commit
						b6d8246b82
					
				
					 5 changed files with 293 additions and 136 deletions
				
			
		|  | @ -5,163 +5,174 @@ This guide will walk you though the steps to get started on end-to-end flow for | |||
| ## Installation | ||||
| The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. | ||||
| 
 | ||||
| You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack` | ||||
| You have two ways to install this repository: | ||||
| 
 | ||||
| If you want to install from source: | ||||
| 1. **Install as a package**: | ||||
|    You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command: | ||||
|    ```bash | ||||
|    pip install llama-stack | ||||
|    ``` | ||||
| 
 | ||||
| ```bash | ||||
| mkdir -p ~/local | ||||
| cd ~/local | ||||
| git clone git@github.com:meta-llama/llama-stack.git | ||||
| 2. **Install from source**: | ||||
|    If you prefer to install from the source code, follow these steps: | ||||
|    ```bash | ||||
|     mkdir -p ~/local | ||||
|     cd ~/local | ||||
|     git clone git@github.com:meta-llama/llama-stack.git | ||||
| 
 | ||||
| conda create -n stack python=3.10 | ||||
| conda activate stack | ||||
|     conda create -n stack python=3.10 | ||||
|     conda activate stack | ||||
| 
 | ||||
| cd llama-stack | ||||
| $CONDA_PREFIX/bin/pip install -e . | ||||
| ``` | ||||
|     cd llama-stack | ||||
|     $CONDA_PREFIX/bin/pip install -e . | ||||
|    ``` | ||||
| 
 | ||||
| For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md). | ||||
| 
 | ||||
| ## Starting Up Llama Stack Server | ||||
| #### Starting up server via docker | ||||
| 
 | ||||
| We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. | ||||
| - [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general) | ||||
|   - This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints. | ||||
| - [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general) | ||||
|    - This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU. | ||||
| You have two ways to start up Llama stack server: | ||||
| 
 | ||||
| > [!NOTE] | ||||
| > For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container. | ||||
| ``` | ||||
| export LLAMA_CHECKPOINT_DIR=~/.llama | ||||
| ``` | ||||
| 1. **Starting up server via docker**: | ||||
| 
 | ||||
| > [!NOTE] | ||||
| > `~/.llama` should be the path containing downloaded weights of Llama models. | ||||
| 	We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. | ||||
| 	- [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general) | ||||
| 	- This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints. | ||||
| 	- [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general) | ||||
| 	- This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU. | ||||
| 
 | ||||
| To download llama models, use | ||||
| ``` | ||||
| llama download --model-id Llama3.1-8B-Instruct | ||||
| ``` | ||||
| 	> [!NOTE] | ||||
| 	> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container. | ||||
| 	``` | ||||
| 	export LLAMA_CHECKPOINT_DIR=~/.llama | ||||
| 	``` | ||||
| 
 | ||||
| To download and start running a pre-built docker container, you may use the following commands: | ||||
| 	> [!NOTE] | ||||
| 	> `~/.llama` should be the path containing downloaded weights of Llama models. | ||||
| 
 | ||||
| ``` | ||||
| docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu | ||||
| ``` | ||||
| 	To download llama models, use | ||||
| 	``` | ||||
| 	llama download --model-id Llama3.1-8B-Instruct | ||||
| 	``` | ||||
| 
 | ||||
| > [!TIP] | ||||
| > Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started. | ||||
| 	To download and start running a pre-built docker container, you may use the following commands: | ||||
| 
 | ||||
| #### Build->Configure->Run Llama Stack server via conda | ||||
| You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack. | ||||
| 	``` | ||||
| 	docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu | ||||
| 	``` | ||||
| 
 | ||||
| **`llama stack build`** | ||||
| - You'll be prompted to enter build information interactively. | ||||
| ``` | ||||
| llama stack build | ||||
| 	> [!TIP] | ||||
| 	> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started. | ||||
| 
 | ||||
| > Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack | ||||
| > Enter the image type you want your distribution to be built with (docker or conda): conda | ||||
| 
 | ||||
|  Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. | ||||
| > Enter the API provider for the inference API: (default=meta-reference): meta-reference | ||||
| > Enter the API provider for the safety API: (default=meta-reference): meta-reference | ||||
| > Enter the API provider for the agents API: (default=meta-reference): meta-reference | ||||
| > Enter the API provider for the memory API: (default=meta-reference): meta-reference | ||||
| > Enter the API provider for the telemetry API: (default=meta-reference): meta-reference | ||||
| 2. **Build->Configure->Run Llama Stack server via conda**: | ||||
| 
 | ||||
|  > (Optional) Enter a short description for your Llama Stack distribution: | ||||
| 	You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack. | ||||
| 
 | ||||
| Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml | ||||
| You can now run `llama stack configure my-local-stack` | ||||
| ``` | ||||
| 	**`llama stack build`** | ||||
| 	- You'll be prompted to enter build information interactively. | ||||
| 	``` | ||||
| 	llama stack build | ||||
| 
 | ||||
| **`llama stack configure`** | ||||
| - Run `llama stack configure <name>` with the name you have previously defined in `build` step. | ||||
| ``` | ||||
| llama stack configure <name> | ||||
| ``` | ||||
| - You will be prompted to enter configurations for your Llama Stack | ||||
| 	> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack | ||||
| 	> Enter the image type you want your distribution to be built with (docker or conda): conda | ||||
| 
 | ||||
| ``` | ||||
| $ llama stack configure my-local-stack | ||||
| 	Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. | ||||
| 	> Enter the API provider for the inference API: (default=meta-reference): meta-reference | ||||
| 	> Enter the API provider for the safety API: (default=meta-reference): meta-reference | ||||
| 	> Enter the API provider for the agents API: (default=meta-reference): meta-reference | ||||
| 	> Enter the API provider for the memory API: (default=meta-reference): meta-reference | ||||
| 	> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference | ||||
| 
 | ||||
| Could not find my-local-stack. Trying conda build name instead... | ||||
| Configuring API `inference`... | ||||
| === Configuring provider `meta-reference` for API inference... | ||||
| Enter value for model (default: Llama3.1-8B-Instruct) (required): | ||||
| Do you want to configure quantization? (y/n): n | ||||
| Enter value for torch_seed (optional): | ||||
| Enter value for max_seq_len (default: 4096) (required): | ||||
| Enter value for max_batch_size (default: 1) (required): | ||||
| 	> (Optional) Enter a short description for your Llama Stack distribution: | ||||
| 
 | ||||
| Configuring API `safety`... | ||||
| === Configuring provider `meta-reference` for API safety... | ||||
| Do you want to configure llama_guard_shield? (y/n): n | ||||
| Do you want to configure prompt_guard_shield? (y/n): n | ||||
| 	Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml | ||||
| 	You can now run `llama stack configure my-local-stack` | ||||
| 	``` | ||||
| 
 | ||||
| Configuring API `agents`... | ||||
| === Configuring provider `meta-reference` for API agents... | ||||
| Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): | ||||
| 	**`llama stack configure`** | ||||
| 	- Run `llama stack configure <name>` with the name you have previously defined in `build` step. | ||||
| 	``` | ||||
| 	llama stack configure <name> | ||||
| 	``` | ||||
| 	- You will be prompted to enter configurations for your Llama Stack | ||||
| 
 | ||||
| Configuring SqliteKVStoreConfig: | ||||
| Enter value for namespace (optional): | ||||
| Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required): | ||||
| 	``` | ||||
| 	$ llama stack configure my-local-stack | ||||
| 
 | ||||
| Configuring API `memory`... | ||||
| === Configuring provider `meta-reference` for API memory... | ||||
| > Please enter the supported memory bank type your provider has for memory: vector | ||||
| 	Could not find my-local-stack. Trying conda build name instead... | ||||
| 	Configuring API `inference`... | ||||
| 	=== Configuring provider `meta-reference` for API inference... | ||||
| 	Enter value for model (default: Llama3.1-8B-Instruct) (required): | ||||
| 	Do you want to configure quantization? (y/n): n | ||||
| 	Enter value for torch_seed (optional): | ||||
| 	Enter value for max_seq_len (default: 4096) (required): | ||||
| 	Enter value for max_batch_size (default: 1) (required): | ||||
| 
 | ||||
| Configuring API `telemetry`... | ||||
| === Configuring provider `meta-reference` for API telemetry... | ||||
| 	Configuring API `safety`... | ||||
| 	=== Configuring provider `meta-reference` for API safety... | ||||
| 	Do you want to configure llama_guard_shield? (y/n): n | ||||
| 	Do you want to configure prompt_guard_shield? (y/n): n | ||||
| 
 | ||||
| > YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml. | ||||
| You can now run `llama stack run my-local-stack --port PORT` | ||||
| ``` | ||||
| 	Configuring API `agents`... | ||||
| 	=== Configuring provider `meta-reference` for API agents... | ||||
| 	Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): | ||||
| 
 | ||||
| **`llama stack run`** | ||||
| - Run `llama stack run <name>` with the name you have previously defined. | ||||
| ``` | ||||
| llama stack run my-local-stack | ||||
| 	Configuring SqliteKVStoreConfig: | ||||
| 	Enter value for namespace (optional): | ||||
| 	Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required): | ||||
| 
 | ||||
| ... | ||||
| > initializing model parallel with size 1 | ||||
| > initializing ddp with size 1 | ||||
| > initializing pipeline with size 1 | ||||
| ... | ||||
| Finished model load YES READY | ||||
| Serving POST /inference/chat_completion | ||||
| Serving POST /inference/completion | ||||
| Serving POST /inference/embeddings | ||||
| Serving POST /memory_banks/create | ||||
| Serving DELETE /memory_bank/documents/delete | ||||
| Serving DELETE /memory_banks/drop | ||||
| Serving GET /memory_bank/documents/get | ||||
| Serving GET /memory_banks/get | ||||
| Serving POST /memory_bank/insert | ||||
| Serving GET /memory_banks/list | ||||
| Serving POST /memory_bank/query | ||||
| Serving POST /memory_bank/update | ||||
| Serving POST /safety/run_shield | ||||
| Serving POST /agentic_system/create | ||||
| Serving POST /agentic_system/session/create | ||||
| Serving POST /agentic_system/turn/create | ||||
| Serving POST /agentic_system/delete | ||||
| Serving POST /agentic_system/session/delete | ||||
| Serving POST /agentic_system/session/get | ||||
| Serving POST /agentic_system/step/get | ||||
| Serving POST /agentic_system/turn/get | ||||
| Serving GET /telemetry/get_trace | ||||
| Serving POST /telemetry/log_event | ||||
| Listening on :::5000 | ||||
| INFO:     Started server process [587053] | ||||
| INFO:     Waiting for application startup. | ||||
| INFO:     Application startup complete. | ||||
| INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) | ||||
| ``` | ||||
| 	Configuring API `memory`... | ||||
| 	=== Configuring provider `meta-reference` for API memory... | ||||
| 	> Please enter the supported memory bank type your provider has for memory: vector | ||||
| 
 | ||||
| 	Configuring API `telemetry`... | ||||
| 	=== Configuring provider `meta-reference` for API telemetry... | ||||
| 
 | ||||
| 	> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml. | ||||
| 	You can now run `llama stack run my-local-stack --port PORT` | ||||
| 	``` | ||||
| 
 | ||||
| 	**`llama stack run`** | ||||
| 	- Run `llama stack run <name>` with the name you have previously defined. | ||||
| 	``` | ||||
| 	llama stack run my-local-stack | ||||
| 
 | ||||
| 	... | ||||
| 	> initializing model parallel with size 1 | ||||
| 	> initializing ddp with size 1 | ||||
| 	> initializing pipeline with size 1 | ||||
| 	... | ||||
| 	Finished model load YES READY | ||||
| 	Serving POST /inference/chat_completion | ||||
| 	Serving POST /inference/completion | ||||
| 	Serving POST /inference/embeddings | ||||
| 	Serving POST /memory_banks/create | ||||
| 	Serving DELETE /memory_bank/documents/delete | ||||
| 	Serving DELETE /memory_banks/drop | ||||
| 	Serving GET /memory_bank/documents/get | ||||
| 	Serving GET /memory_banks/get | ||||
| 	Serving POST /memory_bank/insert | ||||
| 	Serving GET /memory_banks/list | ||||
| 	Serving POST /memory_bank/query | ||||
| 	Serving POST /memory_bank/update | ||||
| 	Serving POST /safety/run_shield | ||||
| 	Serving POST /agentic_system/create | ||||
| 	Serving POST /agentic_system/session/create | ||||
| 	Serving POST /agentic_system/turn/create | ||||
| 	Serving POST /agentic_system/delete | ||||
| 	Serving POST /agentic_system/session/delete | ||||
| 	Serving POST /agentic_system/session/get | ||||
| 	Serving POST /agentic_system/step/get | ||||
| 	Serving POST /agentic_system/turn/get | ||||
| 	Serving GET /telemetry/get_trace | ||||
| 	Serving POST /telemetry/log_event | ||||
| 	Listening on :::5000 | ||||
| 	INFO:     Started server process [587053] | ||||
| 	INFO:     Waiting for application startup. | ||||
| 	INFO:     Application startup complete. | ||||
| 	INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) | ||||
| 	``` | ||||
| 
 | ||||
| 
 | ||||
| ## Testing with client | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue