mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-26 09:15:40 +00:00 
			
		
		
		
	As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change.
		
			
				
	
	
		
			595 lines
		
	
	
	
		
			20 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			595 lines
		
	
	
	
		
			20 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| {
 | |
|  "cells": [
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "# Part 1: Preparing Datasets for Fine-tuning and Evaluation"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "This notebook showcases transforming a dataset for finetuning and evaluating an LLM for tool calling with NeMo Microservices."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Prerequisites"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Deploy NeMo Microservices\n",
 | |
|     "Ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. Please refer to the [installation guide](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-platform/index.html) for instructions."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "You can verify the `meta/llama-3.1-8b-instruct` is deployed by querying the NIM endpoint. The response should include a model with an `id` of `meta/llama-3.1-8b-instruct`."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "```bash\n",
 | |
|     "# URL to NeMo deployment management service\n",
 | |
|     "export NEMO_URL=\"http://nemo.test\"\n",
 | |
|     "\n",
 | |
|     "curl -X GET \"$NEMO_URL/v1/models\" \\\n",
 | |
|     "  -H \"Accept: application/json\"\n",
 | |
|     "```"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Set up Developer Environment\n",
 | |
|     "Set up your development environment on your machine. The project uses `uv` to manage Python dependencies. From the root of the project, install dependencies and create your virtual environment:"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "```bash\n",
 | |
|     "uv sync --extra dev\n",
 | |
|     "uv pip install -e .\n",
 | |
|     "source .venv/bin/activate\n",
 | |
|     "```"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "### Build Llama Stack Image\n",
 | |
|     "Build the Llama Stack image using the virtual environment you just created. For local development, set `LLAMA_STACK_DIR` to ensure your local code is use in the image. To use the production version of `llama-stack`, omit `LLAMA_STACK_DIR`."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "```bash\n",
 | |
|     "LLAMA_STACK_DIR=$(pwd) llama stack build --distro nvidia --image-type venv\n",
 | |
|     "```"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Setup"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "First, import the necessary libraries."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "import os\n",
 | |
|     "import json\n",
 | |
|     "import random\n",
 | |
|     "from pprint import pprint\n",
 | |
|     "from typing import Any, Dict, List, Union\n",
 | |
|     "\n",
 | |
|     "import numpy as np\n",
 | |
|     "import torch\n",
 | |
|     "from datasets import load_dataset"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Set a random seed for reproducibility."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 8,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "SEED = 1234\n",
 | |
|     "\n",
 | |
|     "# Limits to at most N tool properties\n",
 | |
|     "LIMIT_TOOL_PROPERTIES = 8\n",
 | |
|     "\n",
 | |
|     "torch.manual_seed(SEED)\n",
 | |
|     "torch.cuda.manual_seed_all(SEED)\n",
 | |
|     "np.random.seed(SEED)\n",
 | |
|     "random.seed(SEED)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "Define the data root directory and create necessary directoryies for storing processed data."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Processed data will be stored here\n",
 | |
|     "DATA_ROOT = os.path.join(os.getcwd(), \"sample_data\")\n",
 | |
|     "CUSTOMIZATION_DATA_ROOT = os.path.join(DATA_ROOT, \"customization\")\n",
 | |
|     "VALIDATION_DATA_ROOT = os.path.join(DATA_ROOT, \"validation\")\n",
 | |
|     "EVALUATION_DATA_ROOT = os.path.join(DATA_ROOT, \"evaluation\")\n",
 | |
|     "\n",
 | |
|     "os.makedirs(DATA_ROOT, exist_ok=True)\n",
 | |
|     "os.makedirs(CUSTOMIZATION_DATA_ROOT, exist_ok=True)\n",
 | |
|     "os.makedirs(VALIDATION_DATA_ROOT, exist_ok=True)\n",
 | |
|     "os.makedirs(EVALUATION_DATA_ROOT, exist_ok=True)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Step 1: Download xLAM Data"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "This step loads the xLAM dataset from Hugging Face.\n",
 | |
|     "\n",
 | |
|     "Ensure that you have followed the prerequisites mentioned above, obtained a Hugging Face access token, and configured it in config.py. In addition to getting an access token, you need to apply for access to the xLAM dataset [here](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k), which will be approved instantly."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 19,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "from config import HF_TOKEN\n",
 | |
|     "\n",
 | |
|     "os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
 | |
|     "os.environ[\"HF_ENDPOINT\"] = \"https://huggingface.co\""
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": null,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Download from Hugging Face\n",
 | |
|     "dataset = load_dataset(\"Salesforce/xlam-function-calling-60k\")\n",
 | |
|     "\n",
 | |
|     "# Inspect a sample\n",
 | |
|     "example = dataset['train'][0]\n",
 | |
|     "pprint(example)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "For more details on the structure of this data, refer to the [data structure of the xLAM dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k#structure) in the Hugging Face documentation."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Step 2: Prepare Data for Customization"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "For Customization, the NeMo Microservices platform leverages the OpenAI data format, comprised of messages and tools:\n",
 | |
|     "- `messages` include the user query, as well as the ground truth `assistant` response to the query. This response contains the function name(s) and associated argument(s) in a `tool_calls` dict\n",
 | |
|     "- `tools` include a list of functions and parameters available to the LLM to choose from, as well as their descriptions."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The following helper functions convert a single xLAM JSON data point into OpenAI format."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 12,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "def normalize_type(param_type: str) -> str:\n",
 | |
|     "    \"\"\"\n",
 | |
|     "    Normalize Python type hints and parameter definitions to OpenAI function spec types.\n",
 | |
|     "\n",
 | |
|     "    Args:\n",
 | |
|     "        param_type: Type string that could include default values or complex types\n",
 | |
|     "\n",
 | |
|     "    Returns:\n",
 | |
|     "        Normalized type string according to OpenAI function spec\n",
 | |
|     "    \"\"\"\n",
 | |
|     "    # Remove whitespace\n",
 | |
|     "    param_type = param_type.strip()\n",
 | |
|     "\n",
 | |
|     "    # Handle types with default values (e.g. \"str, default='London'\")\n",
 | |
|     "    if \",\" in param_type and \"default\" in param_type:\n",
 | |
|     "        param_type = param_type.split(\",\")[0].strip()\n",
 | |
|     "\n",
 | |
|     "    # Handle types with just default values (e.g. \"default='London'\")\n",
 | |
|     "    if param_type.startswith(\"default=\"):\n",
 | |
|     "        return \"string\"  # Default to string if only default value is given\n",
 | |
|     "\n",
 | |
|     "    # Remove \", optional\" suffix if present\n",
 | |
|     "    param_type = param_type.replace(\", optional\", \"\").strip()\n",
 | |
|     "\n",
 | |
|     "    # Handle complex types\n",
 | |
|     "    if param_type.startswith(\"Callable\"):\n",
 | |
|     "        return \"string\"  # Represent callable as string in JSON schema\n",
 | |
|     "    if param_type.startswith(\"Tuple\"):\n",
 | |
|     "        return \"array\"  # Represent tuple as array in JSON schema\n",
 | |
|     "    if param_type.startswith(\"List[\"):\n",
 | |
|     "        return \"array\"\n",
 | |
|     "    if param_type.startswith(\"Set\") or param_type == \"set\":\n",
 | |
|     "        return \"array\"  # Represent set as array in JSON schema\n",
 | |
|     "\n",
 | |
|     "    # Map common type variations to OpenAI spec types\n",
 | |
|     "    type_mapping: Dict[str, str] = {\n",
 | |
|     "        \"str\": \"string\",\n",
 | |
|     "        \"int\": \"integer\",\n",
 | |
|     "        \"float\": \"number\",\n",
 | |
|     "        \"bool\": \"boolean\",\n",
 | |
|     "        \"list\": \"array\",\n",
 | |
|     "        \"dict\": \"object\",\n",
 | |
|     "        \"List\": \"array\",\n",
 | |
|     "        \"Dict\": \"object\",\n",
 | |
|     "        \"set\": \"array\",\n",
 | |
|     "        \"Set\": \"array\"\n",
 | |
|     "    }\n",
 | |
|     "\n",
 | |
|     "    if param_type in type_mapping:\n",
 | |
|     "        return type_mapping[param_type]\n",
 | |
|     "    else:\n",
 | |
|     "        print(f\"Unknown type: {param_type}\")\n",
 | |
|     "        return \"string\"  # Default to string for unknown types\n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "def convert_tools_to_openai_spec(tools: Union[str, List[Dict[str, Any]]]) -> List[Dict[str, Any]]:\n",
 | |
|     "    # If tools is a string, try to parse it as JSON\n",
 | |
|     "    if isinstance(tools, str):\n",
 | |
|     "        try:\n",
 | |
|     "            tools = json.loads(tools)\n",
 | |
|     "        except json.JSONDecodeError as e:\n",
 | |
|     "            print(f\"Failed to parse tools string as JSON: {e}\")\n",
 | |
|     "            return []\n",
 | |
|     "\n",
 | |
|     "    # Ensure tools is a list\n",
 | |
|     "    if not isinstance(tools, list):\n",
 | |
|     "        print(f\"Expected tools to be a list, but got {type(tools)}\")\n",
 | |
|     "        return []\n",
 | |
|     "\n",
 | |
|     "    openai_tools: List[Dict[str, Any]] = []\n",
 | |
|     "    for tool in tools:\n",
 | |
|     "        # Check if tool is a dictionary\n",
 | |
|     "        if not isinstance(tool, dict):\n",
 | |
|     "            print(f\"Expected tool to be a dictionary, but got {type(tool)}\")\n",
 | |
|     "            continue\n",
 | |
|     "\n",
 | |
|     "        # Check if 'parameters' is a dictionary\n",
 | |
|     "        if not isinstance(tool.get(\"parameters\"), dict):\n",
 | |
|     "            print(f\"Expected 'parameters' to be a dictionary, but got {type(tool.get('parameters'))} for tool: {tool}\")\n",
 | |
|     "            continue\n",
 | |
|     "\n",
 | |
|     "    \n",
 | |
|     "\n",
 | |
|     "        normalized_parameters: Dict[str, Dict[str, Any]] = {}\n",
 | |
|     "        for param_name, param_info in tool[\"parameters\"].items():\n",
 | |
|     "            if not isinstance(param_info, dict):\n",
 | |
|     "                print(\n",
 | |
|     "                    f\"Expected parameter info to be a dictionary, but got {type(param_info)} for parameter: {param_name}\"\n",
 | |
|     "                )\n",
 | |
|     "                continue\n",
 | |
|     "\n",
 | |
|     "            # Create parameter info without default first\n",
 | |
|     "            param_dict = {\n",
 | |
|     "                \"description\": param_info.get(\"description\", \"\"),\n",
 | |
|     "                \"type\": normalize_type(param_info.get(\"type\", \"\")),\n",
 | |
|     "            }\n",
 | |
|     "\n",
 | |
|     "            # Only add default if it exists, is not None, and is not an empty string\n",
 | |
|     "            default_value = param_info.get(\"default\")\n",
 | |
|     "            if default_value is not None and default_value != \"\":\n",
 | |
|     "                param_dict[\"default\"] = default_value\n",
 | |
|     "\n",
 | |
|     "            normalized_parameters[param_name] = param_dict\n",
 | |
|     "\n",
 | |
|     "        openai_tool = {\n",
 | |
|     "            \"type\": \"function\",\n",
 | |
|     "            \"function\": {\n",
 | |
|     "                \"name\": tool[\"name\"],\n",
 | |
|     "                \"description\": tool[\"description\"],\n",
 | |
|     "                \"parameters\": {\"type\": \"object\", \"properties\": normalized_parameters},\n",
 | |
|     "            },\n",
 | |
|     "        }\n",
 | |
|     "        openai_tools.append(openai_tool)\n",
 | |
|     "    return openai_tools\n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "def save_jsonl(filename, data):\n",
 | |
|     "    \"\"\"Write a list of json objects to a .jsonl file\"\"\"\n",
 | |
|     "    with open(filename, \"w\") as f:\n",
 | |
|     "        for entry in data:\n",
 | |
|     "            f.write(json.dumps(entry) + \"\\n\")\n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "def convert_tool_calls(xlam_tools):\n",
 | |
|     "    \"\"\"Convert XLAM tool format to OpenAI's tool schema.\"\"\"\n",
 | |
|     "    tools = []\n",
 | |
|     "    for tool in json.loads(xlam_tools):\n",
 | |
|     "        tools.append({\"type\": \"function\", \"function\": {\"name\": tool[\"name\"], \"arguments\": tool.get(\"arguments\", {})}})\n",
 | |
|     "    return tools\n",
 | |
|     "\n",
 | |
|     "\n",
 | |
|     "def convert_example(example, dataset_type='single'):\n",
 | |
|     "    \"\"\"Convert an XLAM dataset example to OpenAI format.\"\"\"\n",
 | |
|     "    obj = {\"messages\": []}\n",
 | |
|     "\n",
 | |
|     "    # User message\n",
 | |
|     "    obj[\"messages\"].append({\"role\": \"user\", \"content\": example[\"query\"]})\n",
 | |
|     "\n",
 | |
|     "    # Tools\n",
 | |
|     "    if example.get(\"tools\"):\n",
 | |
|     "        obj[\"tools\"] = convert_tools_to_openai_spec(example[\"tools\"])\n",
 | |
|     "\n",
 | |
|     "    # Assistant message\n",
 | |
|     "    assistant_message = {\"role\": \"assistant\", \"content\": \"\"}\n",
 | |
|     "    if example.get(\"answers\"):\n",
 | |
|     "        tool_calls = convert_tool_calls(example[\"answers\"])\n",
 | |
|     "        \n",
 | |
|     "        if dataset_type == \"single\":\n",
 | |
|     "            # Only include examples with a single tool call\n",
 | |
|     "            if len(tool_calls) == 1:\n",
 | |
|     "                assistant_message[\"tool_calls\"] = tool_calls\n",
 | |
|     "            else:\n",
 | |
|     "                return None\n",
 | |
|     "        else:\n",
 | |
|     "            # For other dataset types, include all tool calls\n",
 | |
|     "            assistant_message[\"tool_calls\"] = tool_calls\n",
 | |
|     "                \n",
 | |
|     "    obj[\"messages\"].append(assistant_message)\n",
 | |
|     "\n",
 | |
|     "    return obj"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The following code cell converts the example data to the OpenAI format required by NeMo Customizer."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 13,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "convert_example(example)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "**NOTE**: The convert_example function by default only retains data points that have exactly one tool_call in the output.\n",
 | |
|     "The llama-3.2-1b-instruct model does not support parallel tool calls.\n",
 | |
|     "For more information, refer to the [supported models](https://docs.nvidia.com/nim/large-language-models/latest/function-calling.html#supported-models) in the NeMo documentation."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Process Entire Dataset\n",
 | |
|     "Convert each example by looping through the dataset."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 14,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "all_examples = []\n",
 | |
|     "with open(os.path.join(DATA_ROOT, \"xlam_openai_format.jsonl\"), \"w\") as f:\n",
 | |
|     "    for example in dataset[\"train\"]:\n",
 | |
|     "        converted = convert_example(example)\n",
 | |
|     "        if converted is not None:\n",
 | |
|     "            all_examples.append(converted)\n",
 | |
|     "            f.write(json.dumps(converted) + \"\\n\")"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Split Dataset\n",
 | |
|     "This step splits the dataset into a train, validation, and test set. For demonstration, we use a smaller subset of all the examples.\n",
 | |
|     "You may choose to modify `NUM_EXAMPLES` to leverage a larger subset."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 15,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "# Configure to change the size of dataset to use\n",
 | |
|     "NUM_EXAMPLES = 5000\n",
 | |
|     "\n",
 | |
|     "assert NUM_EXAMPLES <= len(all_examples), f\"{NUM_EXAMPLES} exceeds the total number of available ({len(all_examples)}) data points\""
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 16,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     " # Randomly choose a subset\n",
 | |
|     "sampled_examples = random.sample(all_examples, NUM_EXAMPLES)\n",
 | |
|     "\n",
 | |
|     "# Split into 70% training, 15% validation, 15% testing\n",
 | |
|     "train_size = int(0.7 * len(sampled_examples))\n",
 | |
|     "val_size = int(0.15 * len(sampled_examples))\n",
 | |
|     "\n",
 | |
|     "train_data = sampled_examples[:train_size]\n",
 | |
|     "val_data = sampled_examples[train_size : train_size + val_size]\n",
 | |
|     "test_data = sampled_examples[train_size + val_size :]\n",
 | |
|     "\n",
 | |
|     "# Save the training and validation splits. We will use test split in the next section\n",
 | |
|     "save_jsonl(os.path.join(CUSTOMIZATION_DATA_ROOT, \"training.jsonl\"), train_data)\n",
 | |
|     "save_jsonl(os.path.join(VALIDATION_DATA_ROOT,\"validation.jsonl\"), val_data)"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "## Step 3: Prepare Data for Evaluation"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "For evaluation, the NeMo Microservices platform uses a format with a minor modification to the OpenAI format. This requires `tools_calls` to be brought out of messages to create a distinct parallel field.\n",
 | |
|     "- `messages` includes the user querytools includes a list of functions and parameters available to the LLM to choose from, as well as their descriptions.\n",
 | |
|     "- `tool_calls` is the ground truth response to the user query. This response contains the function name(s) and associated argument(s) in a \"tool_calls\" dict."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "The following steps transform the test dataset into a format compatible with the NeMo Evaluator microservice.\n",
 | |
|     "This dataset is for measuring accuracy metrics before and after customization."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 17,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "def convert_example_eval(entry):\n",
 | |
|     "    \"\"\"Convert a single entry in the dataset to the evaluator format\"\"\"\n",
 | |
|     "\n",
 | |
|     "    # Note: This is a WAR for a known bug with tool calling in NIM\n",
 | |
|     "    for tool in entry[\"tools\"]:\n",
 | |
|     "        if len(tool[\"function\"][\"parameters\"][\"properties\"]) > LIMIT_TOOL_PROPERTIES:\n",
 | |
|     "            return None\n",
 | |
|     "    \n",
 | |
|     "    new_entry = {\n",
 | |
|     "        \"messages\": [],\n",
 | |
|     "        \"tools\": entry[\"tools\"],\n",
 | |
|     "        \"tool_calls\": []\n",
 | |
|     "    }\n",
 | |
|     "    \n",
 | |
|     "    for msg in entry[\"messages\"]:\n",
 | |
|     "        if msg[\"role\"] == \"assistant\" and \"tool_calls\" in msg:\n",
 | |
|     "            new_entry[\"tool_calls\"] = msg[\"tool_calls\"]\n",
 | |
|     "        else:\n",
 | |
|     "            new_entry[\"messages\"].append(msg)\n",
 | |
|     "    \n",
 | |
|     "    return new_entry\n",
 | |
|     "\n",
 | |
|     "def convert_dataset_eval(data):\n",
 | |
|     "    \"\"\"Convert the entire dataset for evaluation by restructuring the data format.\"\"\"\n",
 | |
|     "    return [result for entry in data if (result := convert_example_eval(entry)) is not None]"
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "markdown",
 | |
|    "metadata": {},
 | |
|    "source": [
 | |
|     "`NOTE`: We have implemented a workaround for a known bug where tool calls freeze the NIM if a tool description includes a function with a larger number of parameters. As such, we have limited the dataset to use examples with available tools having at most 8 parameters. This will be resolved in the next NIM release."
 | |
|    ]
 | |
|   },
 | |
|   {
 | |
|    "cell_type": "code",
 | |
|    "execution_count": 18,
 | |
|    "metadata": {},
 | |
|    "outputs": [],
 | |
|    "source": [
 | |
|     "test_data_eval = convert_dataset_eval(test_data)\n",
 | |
|     "save_jsonl(os.path.join(EVALUATION_DATA_ROOT, \"xlam-test-single.jsonl\"), test_data_eval)"
 | |
|    ]
 | |
|   }
 | |
|  ],
 | |
|  "metadata": {
 | |
|   "kernelspec": {
 | |
|    "display_name": ".venv",
 | |
|    "language": "python",
 | |
|    "name": "python3"
 | |
|   },
 | |
|   "language_info": {
 | |
|    "codemirror_mode": {
 | |
|     "name": "ipython",
 | |
|     "version": 3
 | |
|    },
 | |
|    "file_extension": ".py",
 | |
|    "mimetype": "text/x-python",
 | |
|    "name": "python",
 | |
|    "nbconvert_exporter": "python",
 | |
|    "pygments_lexer": "ipython3",
 | |
|    "version": "3.10.2"
 | |
|   }
 | |
|  },
 | |
|  "nbformat": 4,
 | |
|  "nbformat_minor": 2
 | |
| }
 |