This commit is contained in:
Xi Yan 2024-11-05 11:32:28 -08:00
parent a2ae96c520
commit f5f2936bbb
3 changed files with 4 additions and 89 deletions

View file

@ -61,49 +61,7 @@
"```\n", "```\n",
"For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n", "For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n",
"$ export LLAMA_CHECKPOINT_DIR=~/.llama\n", "$ export LLAMA_CHECKPOINT_DIR=~/.llama\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"```\n", "```\n",
"Follow the prompts as part of configure.\n",
"Here is a sample output \n",
"```\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"\n",
"Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n",
"+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n",
"\n",
"Configuring API `inference`...\n",
"=== Configuring provider `meta-reference` for API inference...\n",
"Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n",
"Do you want to configure quantization? (y/n): n\n",
"Enter value for torch_seed (optional): \n",
"Enter value for max_seq_len (default: 4096) (required): \n",
"Enter value for max_batch_size (default: 1) (required): \n",
"\n",
"Configuring API `safety`...\n",
"=== Configuring provider `meta-reference` for API safety...\n",
"Do you want to configure llama_guard_shield? (y/n): n\n",
"Do you want to configure prompt_guard_shield? (y/n): n\n",
"\n",
"Configuring API `agents`...\n",
"=== Configuring provider `meta-reference` for API agents...\n",
"Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n",
"\n",
"Configuring SqliteKVStoreConfig:\n",
"Enter value for namespace (optional): \n",
"Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n",
"\n",
"Configuring API `memory`...\n",
"=== Configuring provider `meta-reference` for API memory...\n",
"> Please enter the supported memory bank type your provider has for memory: vector\n",
"\n",
"Configuring API `telemetry`...\n",
"=== Configuring provider `meta-reference` for API telemetry...\n",
"\n",
"> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n",
"You can now run `llama stack run local-gpu --port PORT`\n",
"YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n",
"```\n",
"NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n",
"\n", "\n",
"5. Run the Stack Server\n", "5. Run the Stack Server\n",
"```\n", "```\n",

View file

@ -183,8 +183,7 @@ llama stack build --template tgi
$ llama stack build --template tgi $ llama stack build --template tgi
... ...
... ...
Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
``` ```
#### Building from config file #### Building from config file
@ -250,49 +249,6 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
``` ```
## Step 2. Configure
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
```
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
- Run `docker images` to check list of available images on your machine.
```
$ llama stack configure tgi
Configuring API: inference (meta-reference)
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (existing: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-1B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield:
Enter value for model (default: Prompt-Guard-86M) (required):
Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
```
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings. After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
As you can see, we did basic configuration above and configured: As you can see, we did basic configuration above and configured:
@ -305,8 +261,8 @@ For how these configurations are stored as yaml, checkout the file printed at th
Note that all configurations as well as models are stored in `~/.llama` Note that all configurations as well as models are stored in `~/.llama`
## Step 3. Run ## Step 2. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step. Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
``` ```
llama stack run 8b-instruct llama stack run 8b-instruct

View file

@ -14,6 +14,7 @@ from pydantic import BaseModel, Field
class TGIImplConfig(BaseModel): class TGIImplConfig(BaseModel):
url: str = Field( url: str = Field(
description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')", description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')",
default="http://localhost:8080",
) )
api_token: Optional[str] = Field( api_token: Optional[str] = Field(
default=None, default=None,