mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-30 07:39:38 +00:00
docs
This commit is contained in:
parent
a2ae96c520
commit
f5f2936bbb
3 changed files with 4 additions and 89 deletions
|
@ -61,49 +61,7 @@
|
|||
"```\n",
|
||||
"For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n",
|
||||
"$ export LLAMA_CHECKPOINT_DIR=~/.llama\n",
|
||||
"$ llama stack configure llamastack-meta-reference-gpu\n",
|
||||
"```\n",
|
||||
"Follow the prompts as part of configure.\n",
|
||||
"Here is a sample output \n",
|
||||
"```\n",
|
||||
"$ llama stack configure llamastack-meta-reference-gpu\n",
|
||||
"\n",
|
||||
"Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n",
|
||||
"+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n",
|
||||
"\n",
|
||||
"Configuring API `inference`...\n",
|
||||
"=== Configuring provider `meta-reference` for API inference...\n",
|
||||
"Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n",
|
||||
"Do you want to configure quantization? (y/n): n\n",
|
||||
"Enter value for torch_seed (optional): \n",
|
||||
"Enter value for max_seq_len (default: 4096) (required): \n",
|
||||
"Enter value for max_batch_size (default: 1) (required): \n",
|
||||
"\n",
|
||||
"Configuring API `safety`...\n",
|
||||
"=== Configuring provider `meta-reference` for API safety...\n",
|
||||
"Do you want to configure llama_guard_shield? (y/n): n\n",
|
||||
"Do you want to configure prompt_guard_shield? (y/n): n\n",
|
||||
"\n",
|
||||
"Configuring API `agents`...\n",
|
||||
"=== Configuring provider `meta-reference` for API agents...\n",
|
||||
"Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n",
|
||||
"\n",
|
||||
"Configuring SqliteKVStoreConfig:\n",
|
||||
"Enter value for namespace (optional): \n",
|
||||
"Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n",
|
||||
"\n",
|
||||
"Configuring API `memory`...\n",
|
||||
"=== Configuring provider `meta-reference` for API memory...\n",
|
||||
"> Please enter the supported memory bank type your provider has for memory: vector\n",
|
||||
"\n",
|
||||
"Configuring API `telemetry`...\n",
|
||||
"=== Configuring provider `meta-reference` for API telemetry...\n",
|
||||
"\n",
|
||||
"> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n",
|
||||
"You can now run `llama stack run local-gpu --port PORT`\n",
|
||||
"YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n",
|
||||
"```\n",
|
||||
"NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n",
|
||||
"\n",
|
||||
"5. Run the Stack Server\n",
|
||||
"```\n",
|
||||
|
|
|
@ -183,8 +183,7 @@ llama stack build --template tgi
|
|||
$ llama stack build --template tgi
|
||||
...
|
||||
...
|
||||
Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
|
||||
You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
|
||||
You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
|
||||
```
|
||||
|
||||
#### Building from config file
|
||||
|
@ -250,49 +249,6 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
|
|||
Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
|
||||
```
|
||||
|
||||
|
||||
## Step 2. Configure
|
||||
After our distribution is built (either in form of docker or conda environment), we will run the following command to
|
||||
```
|
||||
llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
|
||||
```
|
||||
- For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
|
||||
- For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
|
||||
- Run `docker images` to check list of available images on your machine.
|
||||
|
||||
```
|
||||
$ llama stack configure tgi
|
||||
|
||||
Configuring API: inference (meta-reference)
|
||||
Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
|
||||
Enter value for quantization (optional):
|
||||
Enter value for torch_seed (optional):
|
||||
Enter value for max_seq_len (existing: 4096) (required):
|
||||
Enter value for max_batch_size (existing: 1) (required):
|
||||
|
||||
Configuring API: memory (meta-reference-faiss)
|
||||
|
||||
Configuring API: safety (meta-reference)
|
||||
Do you want to configure llama_guard_shield? (y/n): y
|
||||
Entering sub-configuration for llama_guard_shield:
|
||||
Enter value for model (default: Llama-Guard-3-1B) (required):
|
||||
Enter value for excluded_categories (default: []) (required):
|
||||
Enter value for disable_input_check (default: False) (required):
|
||||
Enter value for disable_output_check (default: False) (required):
|
||||
Do you want to configure prompt_guard_shield? (y/n): y
|
||||
Entering sub-configuration for prompt_guard_shield:
|
||||
Enter value for model (default: Prompt-Guard-86M) (required):
|
||||
|
||||
Configuring API: agentic_system (meta-reference)
|
||||
Enter value for brave_search_api_key (optional):
|
||||
Enter value for bing_search_api_key (optional):
|
||||
Enter value for wolfram_api_key (optional):
|
||||
|
||||
Configuring API: telemetry (console)
|
||||
|
||||
YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
|
||||
```
|
||||
|
||||
After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
|
||||
|
||||
As you can see, we did basic configuration above and configured:
|
||||
|
@ -305,8 +261,8 @@ For how these configurations are stored as yaml, checkout the file printed at th
|
|||
Note that all configurations as well as models are stored in `~/.llama`
|
||||
|
||||
|
||||
## Step 3. Run
|
||||
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
|
||||
## Step 2. Run
|
||||
Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
|
||||
|
||||
```
|
||||
llama stack run 8b-instruct
|
||||
|
|
|
@ -14,6 +14,7 @@ from pydantic import BaseModel, Field
|
|||
class TGIImplConfig(BaseModel):
|
||||
url: str = Field(
|
||||
description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')",
|
||||
default="http://localhost:8080",
|
||||
)
|
||||
api_token: Optional[str] = Field(
|
||||
default=None,
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue