diff --git a/docs/getting_started.ipynb b/docs/getting_started.ipynb index 5a330a598..6c36475d9 100644 --- a/docs/getting_started.ipynb +++ b/docs/getting_started.ipynb @@ -61,49 +61,7 @@ "```\n", "For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.\n", "$ export LLAMA_CHECKPOINT_DIR=~/.llama\n", - "$ llama stack configure llamastack-meta-reference-gpu\n", "```\n", - "Follow the prompts as part of configure.\n", - "Here is a sample output \n", - "```\n", - "$ llama stack configure llamastack-meta-reference-gpu\n", - "\n", - "Could not find ~/.conda/envs/llamastack-llamastack-meta-reference-gpu/llamastack-meta-reference-gpu-build.yaml. Trying docker image name instead...\n", - "+ podman run --network host -it -v ~/.llama/builds/docker:/app/builds llamastack-meta-reference-gpu llama stack configure ./llamastack-build.yaml --output-dir /app/builds\n", - "\n", - "Configuring API `inference`...\n", - "=== Configuring provider `meta-reference` for API inference...\n", - "Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-11B-Vision-Instruct\n", - "Do you want to configure quantization? (y/n): n\n", - "Enter value for torch_seed (optional): \n", - "Enter value for max_seq_len (default: 4096) (required): \n", - "Enter value for max_batch_size (default: 1) (required): \n", - "\n", - "Configuring API `safety`...\n", - "=== Configuring provider `meta-reference` for API safety...\n", - "Do you want to configure llama_guard_shield? (y/n): n\n", - "Do you want to configure prompt_guard_shield? (y/n): n\n", - "\n", - "Configuring API `agents`...\n", - "=== Configuring provider `meta-reference` for API agents...\n", - "Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): \n", - "\n", - "Configuring SqliteKVStoreConfig:\n", - "Enter value for namespace (optional): \n", - "Enter value for db_path (default: /root/.llama/runtime/kvstore.db) (required): \n", - "\n", - "Configuring API `memory`...\n", - "=== Configuring provider `meta-reference` for API memory...\n", - "> Please enter the supported memory bank type your provider has for memory: vector\n", - "\n", - "Configuring API `telemetry`...\n", - "=== Configuring provider `meta-reference` for API telemetry...\n", - "\n", - "> YAML configuration has been written to /app/builds/local-gpu-run.yaml.\n", - "You can now run `llama stack run local-gpu --port PORT`\n", - "YAML configuration has been written to /home/hjshah/.llama/builds/docker/local-gpu-run.yaml. You can now run `llama stack run /home/hjshah/.llama/builds/docker/local-gpu-run.yaml`\n", - "```\n", - "NOTE: For this example, we use all local meta-reference implementations and have not setup safety. \n", "\n", "5. Run the Stack Server\n", "```\n", diff --git a/docs/source/distribution_dev/building_distro.md b/docs/source/distribution_dev/building_distro.md index 2f1f1b752..f3a9bd7bb 100644 --- a/docs/source/distribution_dev/building_distro.md +++ b/docs/source/distribution_dev/building_distro.md @@ -183,8 +183,7 @@ llama stack build --template tgi $ llama stack build --template tgi ... ... -Build spec configuration saved at ~/.conda/envs/llamastack-tgi/tgi-build.yaml -You may now run `llama stack configure tgi` or `llama stack configure ~/.conda/envs/llamastack-tgi/tgi-build.yaml` +You can now edit ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml and run `llama stack run ~/meta-llama/llama-stack/tmp/configs/tgi-run.yaml` ``` #### Building from config file @@ -250,49 +249,6 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local Build spec configuration saved at ~/.llama/distributions/docker/docker-local-build.yaml ``` - -## Step 2. Configure -After our distribution is built (either in form of docker or conda environment), we will run the following command to -``` -llama stack configure [ | ] -``` -- For `conda` environments: would be the generated build spec saved from Step 1. -- For `docker` images downloaded from Dockerhub, you could also use as the argument. - - Run `docker images` to check list of available images on your machine. - -``` -$ llama stack configure tgi - -Configuring API: inference (meta-reference) -Enter value for model (existing: Meta-Llama3.1-8B-Instruct) (required): -Enter value for quantization (optional): -Enter value for torch_seed (optional): -Enter value for max_seq_len (existing: 4096) (required): -Enter value for max_batch_size (existing: 1) (required): - -Configuring API: memory (meta-reference-faiss) - -Configuring API: safety (meta-reference) -Do you want to configure llama_guard_shield? (y/n): y -Entering sub-configuration for llama_guard_shield: -Enter value for model (default: Llama-Guard-3-1B) (required): -Enter value for excluded_categories (default: []) (required): -Enter value for disable_input_check (default: False) (required): -Enter value for disable_output_check (default: False) (required): -Do you want to configure prompt_guard_shield? (y/n): y -Entering sub-configuration for prompt_guard_shield: -Enter value for model (default: Prompt-Guard-86M) (required): - -Configuring API: agentic_system (meta-reference) -Enter value for brave_search_api_key (optional): -Enter value for bing_search_api_key (optional): -Enter value for wolfram_api_key (optional): - -Configuring API: telemetry (console) - -YAML configuration has been written to ~/.llama/builds/conda/tgi-run.yaml -``` - After this step is successful, you should be able to find a run configuration spec in `~/.llama/builds/conda/tgi-run.yaml` with the following contents. You may edit this file to change the settings. As you can see, we did basic configuration above and configured: @@ -305,8 +261,8 @@ For how these configurations are stored as yaml, checkout the file printed at th Note that all configurations as well as models are stored in `~/.llama` -## Step 3. Run -Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack configure` step. +## Step 2. Run +Now, let's start the Llama Stack Distribution Server. You will need the YAML configuration file which was written out at the end by the `llama stack build` step. ``` llama stack run 8b-instruct diff --git a/llama_stack/providers/adapters/inference/tgi/config.py b/llama_stack/providers/adapters/inference/tgi/config.py index 6ce2b9dc6..8245c2c17 100644 --- a/llama_stack/providers/adapters/inference/tgi/config.py +++ b/llama_stack/providers/adapters/inference/tgi/config.py @@ -14,6 +14,7 @@ from pydantic import BaseModel, Field class TGIImplConfig(BaseModel): url: str = Field( description="The URL for the TGI endpoint (e.g. 'http://localhost:8080')", + default="http://localhost:8080", ) api_token: Optional[str] = Field( default=None,