Add high-level instructions

This commit is contained in:
Jash Gulabrai 2025-04-10 11:14:17 -04:00
parent 7faec2380a
commit 84e85e824a

View file

@ -4,7 +4,23 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"This notebook contains Llama Stack implementation of a common end-to-end workflow for customizing and evaluating LLMs using the NVIDIA provider." "## NVIDIA E2E Flow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook contains a Llama Stack implementation for an end-to-end workflow for running inference, customizing, and evaluating LLMs using the NVIDIA provider.\n",
"\n",
"The NVIDIA provider leverages the NeMo Microservices platform, a collection of microservices that you can use to build AI workflows on your Kubernetes cluster on-prem or in cloud.\n",
"\n",
"This notebook covers the following workflows:\n",
"- Creating a dataset and uploading files\n",
"- Customizing models\n",
"- Evaluating base and customized models, with and without guardrails\n",
"- Running inference on base and customized models, with and without guardrails\n",
"\n"
] ]
}, },
{ {
@ -12,7 +28,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Prerequisites\n", "## Prerequisites\n",
"First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)" "First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.1-8b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)"
] ]
}, },
{ {
@ -70,7 +86,7 @@
"source": [ "source": [
"Configure the environment variables for each service.\n", "Configure the environment variables for each service.\n",
"\n", "\n",
"If needed, update the URLs for each service to point to your deployment.\n", "Ensure the URLs for each service point to your deployment.\n",
"- NDS_URL: NeMo Data Store URL\n", "- NDS_URL: NeMo Data Store URL\n",
"- NEMO_URL: NeMo Microservices Platform URL\n", "- NEMO_URL: NeMo Microservices Platform URL\n",
"- NIM_URL: NIM URL\n", "- NIM_URL: NIM URL\n",
@ -94,7 +110,7 @@
"USER_ID = \"llama-stack-user\"\n", "USER_ID = \"llama-stack-user\"\n",
"NAMESPACE = \"default\"\n", "NAMESPACE = \"default\"\n",
"PROJECT_ID = \"\"\n", "PROJECT_ID = \"\"\n",
"CUSTOMIZED_MODEL_DIR = \"jg-test-llama-stack@v2\"\n", "CUSTOMIZED_MODEL_DIR = \"test-llama-stack@v1\"\n",
"\n", "\n",
"# Inference env vars\n", "# Inference env vars\n",
"os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n", "os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n",
@ -156,13 +172,19 @@
"client.initialize()" "client.initialize()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, we define helper functions that wait for async jobs to complete."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 25, "execution_count": 25,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Helper functions for waiting on jobs\n",
"from llama_stack.apis.common.job_types import JobStatus\n", "from llama_stack.apis.common.job_types import JobStatus\n",
"\n", "\n",
"def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n", "def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n",
@ -204,6 +226,8 @@
"\n", "\n",
" return job_status\n", " return job_status\n",
"\n", "\n",
"# When creating a customized model, NIM asynchronously loads the model in its model registry.\n",
"# After this, we can run inference with the new model. This helper function waits for NIM to pick up the new model.\n",
"def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n", "def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n",
" found = False\n", " found = False\n",
" start_time = time()\n", " start_time = time()\n",