diff --git a/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb b/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb index 17d370ce3..b3b8daf15 100644 --- a/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb +++ b/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb @@ -4,7 +4,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This notebook contains Llama Stack implementation of a common end-to-end workflow for customizing and evaluating LLMs using the NVIDIA provider." + "## NVIDIA E2E Flow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook contains a Llama Stack implementation for an end-to-end workflow for running inference, customizing, and evaluating LLMs using the NVIDIA provider.\n", + "\n", + "The NVIDIA provider leverages the NeMo Microservices platform, a collection of microservices that you can use to build AI workflows on your Kubernetes cluster on-prem or in cloud.\n", + "\n", + "This notebook covers the following workflows:\n", + "- Creating a dataset and uploading files\n", + "- Customizing models\n", + "- Evaluating base and customized models, with and without guardrails\n", + "- Running inference on base and customized models, with and without guardrails\n", + "\n" ] }, { @@ -12,7 +28,7 @@ "metadata": {}, "source": [ "## Prerequisites\n", - "First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)" + "First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.1-8b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)" ] }, { @@ -70,7 +86,7 @@ "source": [ "Configure the environment variables for each service.\n", "\n", - "If needed, update the URLs for each service to point to your deployment.\n", + "Ensure the URLs for each service point to your deployment.\n", "- NDS_URL: NeMo Data Store URL\n", "- NEMO_URL: NeMo Microservices Platform URL\n", "- NIM_URL: NIM URL\n", @@ -94,7 +110,7 @@ "USER_ID = \"llama-stack-user\"\n", "NAMESPACE = \"default\"\n", "PROJECT_ID = \"\"\n", - "CUSTOMIZED_MODEL_DIR = \"jg-test-llama-stack@v2\"\n", + "CUSTOMIZED_MODEL_DIR = \"test-llama-stack@v1\"\n", "\n", "# Inference env vars\n", "os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n", @@ -156,13 +172,19 @@ "client.initialize()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, we define helper functions that wait for async jobs to complete." + ] + }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ - "# Helper functions for waiting on jobs\n", "from llama_stack.apis.common.job_types import JobStatus\n", "\n", "def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n", @@ -204,6 +226,8 @@ "\n", " return job_status\n", "\n", + "# When creating a customized model, NIM asynchronously loads the model in its model registry.\n", + "# After this, we can run inference with the new model. This helper function waits for NIM to pick up the new model.\n", "def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n", " found = False\n", " start_time = time()\n",