chore(package): migrate to src/ layout (#3920)

Migrates package structure to src/ layout following Python packaging best practices. All code moved from `llama_stack/` to `src/llama_stack/`. Public API unchanged - imports remain `import llama_stack.*`. Updated build configs, pre-commit hooks, scripts, and GitHub workflows accordingly. All hooks pass, package builds cleanly. **Developer note**: Reinstall after pulling: `pip install -e .`
2025-12-04 10:10:36 +00:00 · 2025-10-27 12:02:21 -07:00 · 2025-10-27 12:02:21 -07:00 · 471b1b248b
commit 471b1b248b
parent 98a5047f9d
791 changed files with 2983 additions and 456 deletions
--- a/llama_stack/distributions/nvidia/init.py
+++ b/llama_stack/distributions/nvidia/init.py
@ -1,7 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from .nvidia import get_distribution_template  # noqa: F401
--- a/llama_stack/distributions/nvidia/doc_template.md
+++ b/llama_stack/distributions/nvidia/doc_template.md
@ -1,141 +0,0 @@
---
-orphan: true
---
-# NVIDIA Distribution
-
-The `llamastack/distribution-{{ name }}` distribution consists of the following provider configurations.
-
-{{ providers_table }}
-
-{% if run_config_env_vars %}
-### Environment Variables
-
-The following environment variables can be configured:
-
-{% for var, (default_value, description) in run_config_env_vars.items() %}
- `{{ var }}`: {{ description }} (default: `{{ default_value }}`)
-{% endfor %}
-{% endif %}
-
-{% if default_models %}
-### Models
-
-The following models are available by default:
-
-{% for model in default_models %}
- `{{ model.model_id }} {{ model.doc_string }}`
-{% endfor %}
-{% endif %}
-
-
-## Prerequisites
-### NVIDIA API Keys
-
-Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). Use this key for the `NVIDIA_API_KEY` environment variable.
-
-### Deploy NeMo Microservices Platform
-The NVIDIA NeMo microservices platform supports end-to-end microservice deployment of a complete AI flywheel on your Kubernetes cluster through the NeMo Microservices Helm Chart. Please reference the [NVIDIA NeMo Microservices documentation](https://docs.nvidia.com/nemo/microservices/latest/about/index.html) for platform prerequisites and instructions to install and deploy the platform.
-
-## Supported Services
-Each Llama Stack API corresponds to a specific NeMo microservice. The core microservices (Customizer, Evaluator, Guardrails) are exposed by the same endpoint. The platform components (Data Store) are each exposed by separate endpoints.
-
-### Inference: NVIDIA NIM
-NVIDIA NIM is used for running inference with registered models. There are two ways to access NVIDIA NIMs:
-  1. Hosted (default): Preview APIs hosted at https://integrate.api.nvidia.com (Requires an API key)
-  2. Self-hosted: NVIDIA NIMs that run on your own infrastructure.
-
-The deployed platform includes the NIM Proxy microservice, which is the service that provides to access your NIMs (for example, to run inference on a model). Set the `NVIDIA_BASE_URL` environment variable to use your NVIDIA NIM Proxy deployment.
-
-### Datasetio API: NeMo Data Store
-The NeMo Data Store microservice serves as the default file storage solution for the NeMo microservices platform. It exposts APIs compatible with the Hugging Face Hub client (`HfApi`), so you can use the client to interact with Data Store. The `NVIDIA_DATASETS_URL` environment variable should point to your NeMo Data Store endpoint.
-
-See the [NVIDIA Datasetio docs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/datasetio/nvidia/README.md) for supported features and example usage.
-
-### Eval API: NeMo Evaluator
-The NeMo Evaluator microservice supports evaluation of LLMs. Launching an Evaluation job with NeMo Evaluator requires an Evaluation Config (an object that contains metadata needed by the job). A Llama Stack Benchmark maps to an Evaluation Config, so registering a Benchmark creates an Evaluation Config in NeMo Evaluator. The `NVIDIA_EVALUATOR_URL` environment variable should point to your NeMo Microservices endpoint.
-
-See the [NVIDIA Eval docs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/eval/nvidia/README.md) for supported features and example usage.
-
-### Post-Training API: NeMo Customizer
-The NeMo Customizer microservice supports fine-tuning models. You can reference [this list of supported models](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/post_training/nvidia/models.py) that can be fine-tuned using Llama Stack. The `NVIDIA_CUSTOMIZER_URL` environment variable should point to your NeMo Microservices endpoint.
-
-See the [NVIDIA Post-Training docs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/post_training/nvidia/README.md) for supported features and example usage.
-
-### Safety API: NeMo Guardrails
-The NeMo Guardrails microservice sits between your application and the LLM, and adds checks and content moderation to a model. The `GUARDRAILS_SERVICE_URL` environment variable should point to your NeMo Microservices endpoint.
-
-See the [NVIDIA Safety docs](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/safety/nvidia/README.md) for supported features and example usage.
-
-## Deploying models
-In order to use a registered model with the Llama Stack APIs, ensure the corresponding NIM is deployed to your environment. For example, you can use the NIM Proxy microservice to deploy `meta/llama-3.2-1b-instruct`.
-
-Note: For improved inference speeds, we need to use NIM with `fast_outlines` guided decoding system (specified in the request body). This is the default if you deployed the platform with the NeMo Microservices Helm Chart.
-```sh
-# URL to NeMo NIM Proxy service
-export NEMO_URL="http://nemo.test"
-
-curl --location "$NEMO_URL/v1/deployment/model-deployments" \
-   -H 'accept: application/json' \
-   -H 'Content-Type: application/json' \
-   -d '{
-      "name": "llama-3.2-1b-instruct",
-      "namespace": "meta",
-      "config": {
-         "model": "meta/llama-3.2-1b-instruct",
-         "nim_deployment": {
-            "image_name": "nvcr.io/nim/meta/llama-3.2-1b-instruct",
-            "image_tag": "1.8.3",
-            "pvc_size": "25Gi",
-            "gpu": 1,
-            "additional_envs": {
-               "NIM_GUIDED_DECODING_BACKEND": "fast_outlines"
-            }
-         }
-      }
-   }'
-```
-This NIM deployment should take approximately 10 minutes to go live. [See the docs](https://docs.nvidia.com/nemo/microservices/latest/get-started/tutorials/deploy-nims.html) for more information on how to deploy a NIM and verify it's available for inference.
-
-You can also remove a deployed NIM to free up GPU resources, if needed.
-```sh
-export NEMO_URL="http://nemo.test"
-
-curl -X DELETE "$NEMO_URL/v1/deployment/model-deployments/meta/llama-3.1-8b-instruct"
-```
-
-## Running Llama Stack with NVIDIA
-
-You can do this via venv (build code), or Docker which has a pre-built image.
-
-### Via Docker
-
-This method allows you to get started quickly without having to build the distribution code.
-
-```bash
-LLAMA_STACK_PORT=8321
-docker run \
-  -it \
-  --pull always \
-  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
-  -e NVIDIA_API_KEY=$NVIDIA_API_KEY \
-  llamastack/distribution-{{ name }} \
-  --config /root/my-run.yaml \
-  --port $LLAMA_STACK_PORT
-```
-
-### Via venv
-
-If you've set up your local development environment, you can also install the distribution dependencies using your local virtual environment.
-
-```bash
-INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-llama stack list-deps nvidia | xargs -L1 uv pip install
-NVIDIA_API_KEY=$NVIDIA_API_KEY \
-INFERENCE_MODEL=$INFERENCE_MODEL \
-llama stack run ./run.yaml \
-  --port 8321
-```
-
-## Example Notebooks
-For examples of how to use the NVIDIA Distribution to run inference, fine-tune, evaluate, and run safety checks on your LLMs, you can reference the example notebooks in [docs/notebooks/nvidia](https://github.com/meta-llama/llama-stack/tree/main/docs/notebooks/nvidia).
--- a/llama_stack/distributions/nvidia/nvidia.py
+++ b/llama_stack/distributions/nvidia/nvidia.py
@ -1,154 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from pathlib import Path
-
-from llama_stack.core.datatypes import BuildProvider, ModelInput, Provider, ShieldInput, ToolGroupInput
-from llama_stack.distributions.template import DistributionTemplate, RunConfigSettings
-from llama_stack.providers.inline.files.localfs.config import LocalfsFilesImplConfig
-from llama_stack.providers.remote.datasetio.nvidia import NvidiaDatasetIOConfig
-from llama_stack.providers.remote.eval.nvidia import NVIDIAEvalConfig
-from llama_stack.providers.remote.inference.nvidia import NVIDIAConfig
-from llama_stack.providers.remote.safety.nvidia import NVIDIASafetyConfig
-
-
-def get_distribution_template(name: str = "nvidia") -> DistributionTemplate:
-    providers = {
-        "inference": [BuildProvider(provider_type="remote::nvidia")],
-        "vector_io": [BuildProvider(provider_type="inline::faiss")],
-        "safety": [BuildProvider(provider_type="remote::nvidia")],
-        "agents": [BuildProvider(provider_type="inline::meta-reference")],
-        "eval": [BuildProvider(provider_type="remote::nvidia")],
-        "post_training": [BuildProvider(provider_type="remote::nvidia")],
-        "datasetio": [
-            BuildProvider(provider_type="inline::localfs"),
-            BuildProvider(provider_type="remote::nvidia"),
-        ],
-        "scoring": [BuildProvider(provider_type="inline::basic")],
-        "tool_runtime": [BuildProvider(provider_type="inline::rag-runtime")],
-        "files": [BuildProvider(provider_type="inline::localfs")],
-    }
-
-    inference_provider = Provider(
-        provider_id="nvidia",
-        provider_type="remote::nvidia",
-        config=NVIDIAConfig.sample_run_config(),
-    )
-    safety_provider = Provider(
-        provider_id="nvidia",
-        provider_type="remote::nvidia",
-        config=NVIDIASafetyConfig.sample_run_config(),
-    )
-    datasetio_provider = Provider(
-        provider_id="nvidia",
-        provider_type="remote::nvidia",
-        config=NvidiaDatasetIOConfig.sample_run_config(),
-    )
-    eval_provider = Provider(
-        provider_id="nvidia",
-        provider_type="remote::nvidia",
-        config=NVIDIAEvalConfig.sample_run_config(),
-    )
-    files_provider = Provider(
-        provider_id="meta-reference-files",
-        provider_type="inline::localfs",
-        config=LocalfsFilesImplConfig.sample_run_config(f"~/.llama/distributions/{name}"),
-    )
-    inference_model = ModelInput(
-        model_id="${env.INFERENCE_MODEL}",
-        provider_id="nvidia",
-    )
-    safety_model = ModelInput(
-        model_id="${env.SAFETY_MODEL}",
-        provider_id="nvidia",
-    )
-
-    default_tool_groups = [
-        ToolGroupInput(
-            toolgroup_id="builtin::rag",
-            provider_id="rag-runtime",
-        ),
-    ]
-
-    return DistributionTemplate(
-        name=name,
-        distro_type="self_hosted",
-        description="Use NVIDIA NIM for running LLM inference, evaluation and safety",
-        container_image=None,
-        template_path=Path(__file__).parent / "doc_template.md",
-        providers=providers,
-        run_configs={
-            "run.yaml": RunConfigSettings(
-                provider_overrides={
-                    "inference": [inference_provider],
-                    "datasetio": [datasetio_provider],
-                    "eval": [eval_provider],
-                    "files": [files_provider],
-                },
-                default_tool_groups=default_tool_groups,
-            ),
-            "run-with-safety.yaml": RunConfigSettings(
-                provider_overrides={
-                    "inference": [
-                        inference_provider,
-                        safety_provider,
-                    ],
-                    "eval": [eval_provider],
-                    "files": [files_provider],
-                },
-                default_models=[inference_model, safety_model],
-                default_shields=[ShieldInput(shield_id="${env.SAFETY_MODEL}", provider_id="nvidia")],
-                default_tool_groups=default_tool_groups,
-            ),
-        },
-        run_config_env_vars={
-            "NVIDIA_API_KEY": (
-                "",
-                "NVIDIA API Key",
-            ),
-            "NVIDIA_APPEND_API_VERSION": (
-                "True",
-                "Whether to append the API version to the base_url",
-            ),
-            ## Nemo Customizer related variables
-            "NVIDIA_DATASET_NAMESPACE": (
-                "default",
-                "NVIDIA Dataset Namespace",
-            ),
-            "NVIDIA_PROJECT_ID": (
-                "test-project",
-                "NVIDIA Project ID",
-            ),
-            "NVIDIA_CUSTOMIZER_URL": (
-                "https://customizer.api.nvidia.com",
-                "NVIDIA Customizer URL",
-            ),
-            "NVIDIA_OUTPUT_MODEL_DIR": (
-                "test-example-model@v1",
-                "NVIDIA Output Model Directory",
-            ),
-            "GUARDRAILS_SERVICE_URL": (
-                "http://0.0.0.0:7331",
-                "URL for the NeMo Guardrails Service",
-            ),
-            "NVIDIA_GUARDRAILS_CONFIG_ID": (
-                "self-check",
-                "NVIDIA Guardrail Configuration ID",
-            ),
-            "NVIDIA_EVALUATOR_URL": (
-                "http://0.0.0.0:7331",
-                "URL for the NeMo Evaluator Service",
-            ),
-            "INFERENCE_MODEL": (
-                "Llama3.1-8B-Instruct",
-                "Inference model",
-            ),
-            "SAFETY_MODEL": (
-                "meta/llama-3.1-8b-instruct",
-                "Name of the model to use for safety",
-            ),
-        },
-    )