mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-21 12:09:40 +00:00
Add high-level instructions
This commit is contained in:
parent
7faec2380a
commit
84e85e824a
1 changed files with 29 additions and 5 deletions
|
@ -4,7 +4,23 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This notebook contains Llama Stack implementation of a common end-to-end workflow for customizing and evaluating LLMs using the NVIDIA provider."
|
"## NVIDIA E2E Flow"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This notebook contains a Llama Stack implementation for an end-to-end workflow for running inference, customizing, and evaluating LLMs using the NVIDIA provider.\n",
|
||||||
|
"\n",
|
||||||
|
"The NVIDIA provider leverages the NeMo Microservices platform, a collection of microservices that you can use to build AI workflows on your Kubernetes cluster on-prem or in cloud.\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook covers the following workflows:\n",
|
||||||
|
"- Creating a dataset and uploading files\n",
|
||||||
|
"- Customizing models\n",
|
||||||
|
"- Evaluating base and customized models, with and without guardrails\n",
|
||||||
|
"- Running inference on base and customized models, with and without guardrails\n",
|
||||||
|
"\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -12,7 +28,7 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)"
|
"First, ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.1-8b-instruct`. See installation instructions: https://aire.gitlab-master-pages.nvidia.com/microservices/documentation/latest/nemo-microservices/latest-internal/set-up/deploy-as-platform/index.html (TODO: Update to public docs)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -70,7 +86,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"Configure the environment variables for each service.\n",
|
"Configure the environment variables for each service.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If needed, update the URLs for each service to point to your deployment.\n",
|
"Ensure the URLs for each service point to your deployment.\n",
|
||||||
"- NDS_URL: NeMo Data Store URL\n",
|
"- NDS_URL: NeMo Data Store URL\n",
|
||||||
"- NEMO_URL: NeMo Microservices Platform URL\n",
|
"- NEMO_URL: NeMo Microservices Platform URL\n",
|
||||||
"- NIM_URL: NIM URL\n",
|
"- NIM_URL: NIM URL\n",
|
||||||
|
@ -94,7 +110,7 @@
|
||||||
"USER_ID = \"llama-stack-user\"\n",
|
"USER_ID = \"llama-stack-user\"\n",
|
||||||
"NAMESPACE = \"default\"\n",
|
"NAMESPACE = \"default\"\n",
|
||||||
"PROJECT_ID = \"\"\n",
|
"PROJECT_ID = \"\"\n",
|
||||||
"CUSTOMIZED_MODEL_DIR = \"jg-test-llama-stack@v2\"\n",
|
"CUSTOMIZED_MODEL_DIR = \"test-llama-stack@v1\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Inference env vars\n",
|
"# Inference env vars\n",
|
||||||
"os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n",
|
"os.environ[\"NVIDIA_BASE_URL\"] = NIM_URL\n",
|
||||||
|
@ -156,13 +172,19 @@
|
||||||
"client.initialize()"
|
"client.initialize()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Here, we define helper functions that wait for async jobs to complete."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 25,
|
"execution_count": 25,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Helper functions for waiting on jobs\n",
|
|
||||||
"from llama_stack.apis.common.job_types import JobStatus\n",
|
"from llama_stack.apis.common.job_types import JobStatus\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n",
|
"def wait_customization_job(job_id: str, polling_interval: int = 10, timeout: int = 6000):\n",
|
||||||
|
@ -204,6 +226,8 @@
|
||||||
"\n",
|
"\n",
|
||||||
" return job_status\n",
|
" return job_status\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"# When creating a customized model, NIM asynchronously loads the model in its model registry.\n",
|
||||||
|
"# After this, we can run inference with the new model. This helper function waits for NIM to pick up the new model.\n",
|
||||||
"def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n",
|
"def wait_nim_loads_customized_model(model_id: str, namespace: str, polling_interval: int = 10, timeout: int = 300):\n",
|
||||||
" found = False\n",
|
" found = False\n",
|
||||||
" start_time = time()\n",
|
" start_time = time()\n",
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue