docs

2025-10-15 06:37:58 +00:00 · 2024-11-05 11:32:28 -08:00 · 2024-11-05 11:32:28 -08:00 · f5f2936bbb
commit f5f2936bbb
parent a2ae96c520
3 changed files with 4 additions and 89 deletions
--- a/docs/getting_started.ipynb
+++ b/docs/getting_started.ipynb
@ -61,49 +61,7 @@
    "```\n",
    "For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n",
    "$ export LLAMA_CHECKPOINT_DIR=~/.llama\n",
    "$ llama stack configure llamastack-meta-reference-gpu\n",
    "```\n",
    "Follow the prompts as part of configure.\n",
    "Here is a sample output \n",
    "```\n",
    "$ llama stack configure llamastack-meta-reference-gpu\n",
    "\n",
    "Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n",
    "+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n",
    "\n",
    "Configuring API `inference`...\n",
    "=== Configuring provider `meta-reference` for API inference...\n",
    "Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n",
    "Do you want to configure quantization? (y/n): n\n",
    "Enter value for torch_seed (optional): \n",
    "Enter value for max_seq_len (default: 4096) (required): \n",
    "Enter value for max_batch_size (default: 1) (required): \n",
    "\n",
    "Configuring API `safety`...\n",
    "=== Configuring provider `meta-reference` for API safety...\n",
    "Do you want to configure llama_guard_shield? (y/n): n\n",
    "Do you want to configure prompt_guard_shield? (y/n): n\n",
    "\n",
    "Configuring API `agents`...\n",
    "=== Configuring provider `meta-reference` for API agents...\n",
    "Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n",
    "\n",
    "Configuring SqliteKVStoreConfig:\n",
    "Enter value for namespace (optional): \n",
    "Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n",
    "\n",
    "Configuring API `memory`...\n",
    "=== Configuring provider `meta-reference` for API memory...\n",
    "> Please enter the supported memory bank type your provider has for memory: vector\n",
    "\n",
    "Configuring API `telemetry`...\n",
    "=== Configuring provider `meta-reference` for API telemetry...\n",
    "\n",
    "> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n",
    "You can now run `llama stack run local-gpu --port PORT`\n",
    "YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n",
    "```\n",
    "NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n",
    "\n",
    "5.  Run the Stack Server\n",
    "```\n",
--- a/docs/source/distribution_dev/building_distro.md
+++ b/docs/source/distribution_dev/building_distro.md
@ -183,8 +183,7 @@ llama stack build --template tgi
 $ llama stack build --template tgi
 ...
 ...
-Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml
+You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml`
 You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml`
 ```
 #### Building from config file
@ -250,49 +249,6 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local
 Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml
 ```
 ## Step 2. Configure
 After our distribution is built (either in form of docker or conda environment), we will run the following command to
 ```
 llama stack configure [ <docker-image-name> | <path/to/name.build.yaml>]
 ```
 - For `conda` environments: <path/to/name.build.yaml> would be the generated build spec saved from Step 1.
 - For `docker` images downloaded from Dockerhub, you could also use <docker-image-name> as the argument.
   - Run `docker images` to check list of available images on your machine.
 ```
 $ llama stack configure tgi
 Configuring API: inference (meta-reference)
 Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required):
 Enter value for quantization (optional):
 Enter value for torch_seed (optional):
 Enter value for max_seq_len (existing: 4096) (required):
 Enter value for max_batch_size (existing: 1) (required):
 Configuring API: memory (meta-reference-faiss)
 Configuring API: safety (meta-reference)
 Do you want to configure llama_guard_shield? (y/n): y
 Entering sub-configuration for llama_guard_shield:
 Enter value for model (default: Llama-Guard-3-1B) (required):
 Enter value for excluded_categories (default: []) (required):
 Enter value for disable_input_check (default: False) (required):
 Enter value for disable_output_check (default: False) (required):
 Do you want to configure prompt_guard_shield? (y/n): y
 Entering sub-configuration for prompt_guard_shield:
 Enter value for model (default: Prompt-Guard-86M) (required):
 Configuring API: agentic_system (meta-reference)
 Enter value for brave_search_api_key (optional):
 Enter value for bing_search_api_key (optional):
 Enter value for wolfram_api_key (optional):
 Configuring API: telemetry (console)
 YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml
 ```
 After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings.
 As you can see, we did basic configuration above and configured:
@ -305,8 +261,8 @@ For how these configurations are stored as yaml, checkout the file printed at th
 Note that all configurations as well as models are stored in `~/.llama`
-## Step 3. Run
+## Step 2. Run
-Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step.
+Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step.
 ```
 llama stack run 8b-instruct
--- a/llama_stack/providers/adapters/inference/tgi/config.py
+++ b/llama_stack/providers/adapters/inference/tgi/config.py
@ -14,6 +14,7 @@ from pydantic import BaseModel, Field
 class TGIImplConfig(BaseModel):
    url: str = Field(
        description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')",
        default="http://localhost:8080",
    )
    api_token: Optional[str] = Field(
        default=None,