Add high-level instructions

2025-07-21 12:09:40 +00:00 · 2025-04-10 11:14:17 -04:00 · 2025-04-10 11:14:17 -04:00 · 84e85e824a
commit 84e85e824a
parent 7faec2380a
1 changed files with 29 additions and 5 deletions
--- a/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb
+++ b/docs/notebooks/nvidia/Llama_Stack_NVIDIA_E2E_Flow.ipynb
@ -4,7 +4,23 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This notebook contains Llama Stack implementation of a common end-to-end workflow for customizing and evaluating LLMs using the NVIDIA provider."
+    "## NVIDIA E2E Flow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook contains a Llama Stack implementation for an end-to-end workflow for running inference, customizing, and evaluating LLMs using the NVIDIA provider.\n",
    "\n",
    "The NVIDIA provider leverages the NeMo Microservices platform, a collection of microservices that you can use to build AI workflows on your Kubernetes cluster on-prem or in cloud.\n",
    "\n",
    "This notebook covers the following workflows:\n",
    "- Creating a dataset and uploading files\n",
    "- Customizing models\n",
    "- Evaluating base and customized models, with and without guardrails\n",
    "- Running inference on base and customized models, with and without guardrails\n",
    "\n"
   ]
  },
  {
@ -12,7 +28,7 @@
   "metadata": {},
   "source": [
    "## Prerequisites\n",
-    "First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)"
+    "First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.1-8b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)"
   ]
  },
  {
@ -70,7 +86,7 @@
   "source": [
    "Configure the environment variables for each service.\n",
    "\n",
-    "If needed, update the URLs for each service to point to your deployment.\n",
+    "Ensure the URLs for each service point to your deployment.\n",
    "- NDS_URL: NeMo Data Store URL\n",
    "- NEMO_URL: NeMo Microservices Platform URL\n",
    "- NIM_URL: NIM URL\n",
@ -94,7 +110,7 @@
    "USER_ID = \"llama-stack-user\"\n",
    "NAMESPACE = \"default\"\n",
    "PROJECT_ID = \"\"\n",
-    "CUSTOMIZED_MODEL_DIR = \"jg-test-llama-stack@v2\"\n",
+    "CUSTOMIZED_MODEL_DIR = \"test-llama-stack@v1\"\n",
    "\n",
    "# Inference env vars\n",
    "os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n",
@ -156,13 +172,19 @@
    "client.initialize()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we define helper functions that wait for async jobs to complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper functions for waiting on jobs\n",
    "from llama_stack.apis.common.job_types import JobStatus\n",
    "\n",
    "def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n",
@ -204,6 +226,8 @@
    "\n",
    "    return job_status\n",
    "\n",
    "# When creating a customized model, NIM asynchronously loads the model in its model registry.\n",
    "# After this, we can run inference with the new model. This helper function waits for NIM to pick up the new model.\n",
    "def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n",
    "    found = False\n",
    "    start_time = time()\n",