This commit is contained in:
Xi Yan 2024-11-05 11:32:28 -08:00
parent a2ae96c520
commit f5f2936bbb
3 changed files with 4 additions and 89 deletions

View file

@ -61,49 +61,7 @@
"```\n",
"For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n",
"$ export LLAMA_CHECKPOINT_DIR=~/.llama\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"```\n",
"Follow the prompts as part of configure.\n",
"Here is a sample output \n",
"```\n",
"$ llama stack configure llamastack-meta-reference-gpu\n",
"\n",
"Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n",
"+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n",
"\n",
"Configuring API `inference`...\n",
"=== Configuring provider `meta-reference` for API inference...\n",
"Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n",
"Do you want to configure quantization? (y/n): n\n",
"Enter value for torch_seed (optional): \n",
"Enter value for max_seq_len (default: 4096) (required): \n",
"Enter value for max_batch_size (default: 1) (required): \n",
"\n",
"Configuring API `safety`...\n",
"=== Configuring provider `meta-reference` for API safety...\n",
"Do you want to configure llama_guard_shield? (y/n): n\n",
"Do you want to configure prompt_guard_shield? (y/n): n\n",
"\n",
"Configuring API `agents`...\n",
"=== Configuring provider `meta-reference` for API agents...\n",
"Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n",
"\n",
"Configuring SqliteKVStoreConfig:\n",
"Enter value for namespace (optional): \n",
"Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n",
"\n",
"Configuring API `memory`...\n",
"=== Configuring provider `meta-reference` for API memory...\n",
"> Please enter the supported memory bank type your provider has for memory: vector\n",
"\n",
"Configuring API `telemetry`...\n",
"=== Configuring provider `meta-reference` for API telemetry...\n",
"\n",
"> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n",
"You can now run `llama stack run local-gpu --port PORT`\n",
"YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n",
"```\n",
"NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n",
"\n",
"5. Run the Stack Server\n",
"```\n",

View file

@ -183,8 +183,7 @@ llama stack build --template tgi
$ llama stack build --template tgi
...
...
Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
```
#### Building from config file
@ -250,49 +249,6 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
```
## Step 2. Configure
After our distribution is built (either in form of docker or conda environment), we will run the following command to
```
llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
```
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
- Run `docker images` to check list of available images on your machine.
```
$ llama stack configure tgi
Configuring API: inference (meta-reference)
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
Enter value for quantization (optional):
Enter value for torch_seed (optional):
Enter value for max_seq_len (existing: 4096) (required):
Enter value for max_batch_size (existing: 1) (required):
Configuring API: memory (meta-reference-faiss)
Configuring API: safety (meta-reference)
Do you want to configure llama_guard_shield? (y/n): y
Entering sub-configuration for llama_guard_shield:
Enter value for model (default: Llama-Guard-3-1B) (required):
Enter value for excluded_categories (default: []) (required):
Enter value for disable_input_check (default: False) (required):
Enter value for disable_output_check (default: False) (required):
Do you want to configure prompt_guard_shield? (y/n): y
Entering sub-configuration for prompt_guard_shield:
Enter value for model (default: Prompt-Guard-86M) (required):
Configuring API: agentic_system (meta-reference)
Enter value for brave_search_api_key (optional):
Enter value for bing_search_api_key (optional):
Enter value for wolfram_api_key (optional):
Configuring API: telemetry (console)
YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
```
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
As you can see, we did basic configuration above and configured:
@ -305,8 +261,8 @@ For how these configurations are stored as yaml, checkout the file printed at th
Note that all configurations as well as models are stored in `~/.llama`
## Step 3. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
## Step 2. Run
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
```
llama stack run 8b-instruct

View file

@ -14,6 +14,7 @@ from pydantic import BaseModel, Field
class TGIImplConfig(BaseModel):
url: str = Field(
description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')",
default="http://localhost:8080",
)
api_token: Optional[str] = Field(
default=None,