mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-20 19:56:59 +00:00
Clean up instructions and implementation; reorganize notebooks
This commit is contained in:
parent
0d9d333a4e
commit
4131e8146f
29 changed files with 2756 additions and 89 deletions
595
docs/notebooks/nvidia/tool_calling/1_data_preparation.ipynb
Normal file
595
docs/notebooks/nvidia/tool_calling/1_data_preparation.ipynb
Normal file
|
@ -0,0 +1,595 @@
|
|||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Part 1: Preparing Datasets for Fine-tuning and Evaluation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook showcases transforming a dataset for finetuning and evaluating an LLM for tool calling with NeMo Microservices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Deploy NeMo Microservices\n",
|
||||
"Ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. Please refer to the [installation guide](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-platform/index.html) for instructions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can verify the `meta/llama-3.1-8b-instruct` is deployed by querying the NIM endpoint. The response should include a model with an `id` of `meta/llama-3.1-8b-instruct`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```bash\n",
|
||||
"# URL to NeMo deployment management service\n",
|
||||
"export NEMO_URL=\"http://nemo.test\"\n",
|
||||
"\n",
|
||||
"curl -X GET \"$NEMO_URL/v1/models\" \\\n",
|
||||
" -H \"Accept: application/json\"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Set up Developer Environment\n",
|
||||
"Set up your development environment on your machine. The project uses `uv` to manage Python dependencies. From the root of the project, install dependencies and create your virtual environment:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```bash\n",
|
||||
"uv sync --extra dev\n",
|
||||
"uv pip install -e .\n",
|
||||
"source .venv/bin/activate\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Build Llama Stack Image\n",
|
||||
"Build the Llama Stack image using the virtual environment you just created. For local development, set `LLAMA_STACK_DIR` to ensure your local code is use in the image. To use the production version of `llama-stack`, omit `LLAMA_STACK_DIR`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"```bash\n",
|
||||
"LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, import the necessary libraries."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"import json\n",
|
||||
"import random\n",
|
||||
"from pprint import pprint\n",
|
||||
"from typing import Any, Dict, List, Union\n",
|
||||
"\n",
|
||||
"import numpy as np\n",
|
||||
"import torch\n",
|
||||
"from datasets import load_dataset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Set a random seed for reproducibility."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"SEED = 1234\n",
|
||||
"\n",
|
||||
"# Limits to at most N tool properties\n",
|
||||
"LIMIT_TOOL_PROPERTIES = 8\n",
|
||||
"\n",
|
||||
"torch.manual_seed(SEED)\n",
|
||||
"torch.cuda.manual_seed_all(SEED)\n",
|
||||
"np.random.seed(SEED)\n",
|
||||
"random.seed(SEED)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Define the data root directory and create necessary directoryies for storing processed data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Processed data will be stored here\n",
|
||||
"DATA_ROOT = os.path.join(os.getcwd(), \"tmp\")\n",
|
||||
"CUSTOMIZATION_DATA_ROOT = os.path.join(DATA_ROOT, \"customization\")\n",
|
||||
"VALIDATION_DATA_ROOT = os.path.join(DATA_ROOT, \"validation\")\n",
|
||||
"EVALUATION_DATA_ROOT = os.path.join(DATA_ROOT, \"evaluation\")\n",
|
||||
"\n",
|
||||
"os.makedirs(DATA_ROOT, exist_ok=True)\n",
|
||||
"os.makedirs(CUSTOMIZATION_DATA_ROOT, exist_ok=True)\n",
|
||||
"os.makedirs(VALIDATION_DATA_ROOT, exist_ok=True)\n",
|
||||
"os.makedirs(EVALUATION_DATA_ROOT, exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 1: Download xLAM Data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This step loads the xLAM dataset from Hugging Face.\n",
|
||||
"\n",
|
||||
"Ensure that you have followed the prerequisites mentioned above, obtained a Hugging Face access token, and configured it in config.py. In addition to getting an access token, you need to apply for access to the xLAM dataset [here](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k), which will be approved instantly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from config import HF_TOKEN\n",
|
||||
"\n",
|
||||
"os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
|
||||
"os.environ[\"HF_ENDPOINT\"] = \"https://huggingface.co\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Download from Hugging Face\n",
|
||||
"dataset = load_dataset(\"Salesforce/xlam-function-calling-60k\")\n",
|
||||
"\n",
|
||||
"# Inspect a sample\n",
|
||||
"example = dataset['train'][0]\n",
|
||||
"pprint(example)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For more details on the structure of this data, refer to the [data structure of the xLAM dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k#structure) in the Hugging Face documentation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 2: Prepare Data for Customization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For Customization, the NeMo Microservices platform leverages the OpenAI data format, comprised of messages and tools:\n",
|
||||
"- `messages` include the user query, as well as the ground truth `assistant` response to the query. This response contains the function name(s) and associated argument(s) in a `tool_calls` dict\n",
|
||||
"- `tools` include a list of functions and parameters available to the LLM to choose from, as well as their descriptions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following helper functions convert a single xLAM JSON data point into OpenAI format."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def normalize_type(param_type: str) -> str:\n",
|
||||
" \"\"\"\n",
|
||||
" Normalize Python type hints and parameter definitions to OpenAI function spec types.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" param_type: Type string that could include default values or complex types\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" Normalized type string according to OpenAI function spec\n",
|
||||
" \"\"\"\n",
|
||||
" # Remove whitespace\n",
|
||||
" param_type = param_type.strip()\n",
|
||||
"\n",
|
||||
" # Handle types with default values (e.g. \"str, default='London'\")\n",
|
||||
" if \",\" in param_type and \"default\" in param_type:\n",
|
||||
" param_type = param_type.split(\",\")[0].strip()\n",
|
||||
"\n",
|
||||
" # Handle types with just default values (e.g. \"default='London'\")\n",
|
||||
" if param_type.startswith(\"default=\"):\n",
|
||||
" return \"string\" # Default to string if only default value is given\n",
|
||||
"\n",
|
||||
" # Remove \", optional\" suffix if present\n",
|
||||
" param_type = param_type.replace(\", optional\", \"\").strip()\n",
|
||||
"\n",
|
||||
" # Handle complex types\n",
|
||||
" if param_type.startswith(\"Callable\"):\n",
|
||||
" return \"string\" # Represent callable as string in JSON schema\n",
|
||||
" if param_type.startswith(\"Tuple\"):\n",
|
||||
" return \"array\" # Represent tuple as array in JSON schema\n",
|
||||
" if param_type.startswith(\"List[\"):\n",
|
||||
" return \"array\"\n",
|
||||
" if param_type.startswith(\"Set\") or param_type == \"set\":\n",
|
||||
" return \"array\" # Represent set as array in JSON schema\n",
|
||||
"\n",
|
||||
" # Map common type variations to OpenAI spec types\n",
|
||||
" type_mapping: Dict[str, str] = {\n",
|
||||
" \"str\": \"string\",\n",
|
||||
" \"int\": \"integer\",\n",
|
||||
" \"float\": \"number\",\n",
|
||||
" \"bool\": \"boolean\",\n",
|
||||
" \"list\": \"array\",\n",
|
||||
" \"dict\": \"object\",\n",
|
||||
" \"List\": \"array\",\n",
|
||||
" \"Dict\": \"object\",\n",
|
||||
" \"set\": \"array\",\n",
|
||||
" \"Set\": \"array\"\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" if param_type in type_mapping:\n",
|
||||
" return type_mapping[param_type]\n",
|
||||
" else:\n",
|
||||
" print(f\"Unknown type: {param_type}\")\n",
|
||||
" return \"string\" # Default to string for unknown types\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def convert_tools_to_openai_spec(tools: Union[str, List[Dict[str, Any]]]) -> List[Dict[str, Any]]:\n",
|
||||
" # If tools is a string, try to parse it as JSON\n",
|
||||
" if isinstance(tools, str):\n",
|
||||
" try:\n",
|
||||
" tools = json.loads(tools)\n",
|
||||
" except json.JSONDecodeError as e:\n",
|
||||
" print(f\"Failed to parse tools string as JSON: {e}\")\n",
|
||||
" return []\n",
|
||||
"\n",
|
||||
" # Ensure tools is a list\n",
|
||||
" if not isinstance(tools, list):\n",
|
||||
" print(f\"Expected tools to be a list, but got {type(tools)}\")\n",
|
||||
" return []\n",
|
||||
"\n",
|
||||
" openai_tools: List[Dict[str, Any]] = []\n",
|
||||
" for tool in tools:\n",
|
||||
" # Check if tool is a dictionary\n",
|
||||
" if not isinstance(tool, dict):\n",
|
||||
" print(f\"Expected tool to be a dictionary, but got {type(tool)}\")\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
" # Check if 'parameters' is a dictionary\n",
|
||||
" if not isinstance(tool.get(\"parameters\"), dict):\n",
|
||||
" print(f\"Expected 'parameters' to be a dictionary, but got {type(tool.get('parameters'))} for tool: {tool}\")\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" normalized_parameters: Dict[str, Dict[str, Any]] = {}\n",
|
||||
" for param_name, param_info in tool[\"parameters\"].items():\n",
|
||||
" if not isinstance(param_info, dict):\n",
|
||||
" print(\n",
|
||||
" f\"Expected parameter info to be a dictionary, but got {type(param_info)} for parameter: {param_name}\"\n",
|
||||
" )\n",
|
||||
" continue\n",
|
||||
"\n",
|
||||
" # Create parameter info without default first\n",
|
||||
" param_dict = {\n",
|
||||
" \"description\": param_info.get(\"description\", \"\"),\n",
|
||||
" \"type\": normalize_type(param_info.get(\"type\", \"\")),\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" # Only add default if it exists, is not None, and is not an empty string\n",
|
||||
" default_value = param_info.get(\"default\")\n",
|
||||
" if default_value is not None and default_value != \"\":\n",
|
||||
" param_dict[\"default\"] = default_value\n",
|
||||
"\n",
|
||||
" normalized_parameters[param_name] = param_dict\n",
|
||||
"\n",
|
||||
" openai_tool = {\n",
|
||||
" \"type\": \"function\",\n",
|
||||
" \"function\": {\n",
|
||||
" \"name\": tool[\"name\"],\n",
|
||||
" \"description\": tool[\"description\"],\n",
|
||||
" \"parameters\": {\"type\": \"object\", \"properties\": normalized_parameters},\n",
|
||||
" },\n",
|
||||
" }\n",
|
||||
" openai_tools.append(openai_tool)\n",
|
||||
" return openai_tools\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def save_jsonl(filename, data):\n",
|
||||
" \"\"\"Write a list of json objects to a .jsonl file\"\"\"\n",
|
||||
" with open(filename, \"w\") as f:\n",
|
||||
" for entry in data:\n",
|
||||
" f.write(json.dumps(entry) + \"\\n\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def convert_tool_calls(xlam_tools):\n",
|
||||
" \"\"\"Convert XLAM tool format to OpenAI's tool schema.\"\"\"\n",
|
||||
" tools = []\n",
|
||||
" for tool in json.loads(xlam_tools):\n",
|
||||
" tools.append({\"type\": \"function\", \"function\": {\"name\": tool[\"name\"], \"arguments\": tool.get(\"arguments\", {})}})\n",
|
||||
" return tools\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def convert_example(example, dataset_type='single'):\n",
|
||||
" \"\"\"Convert an XLAM dataset example to OpenAI format.\"\"\"\n",
|
||||
" obj = {\"messages\": []}\n",
|
||||
"\n",
|
||||
" # User message\n",
|
||||
" obj[\"messages\"].append({\"role\": \"user\", \"content\": example[\"query\"]})\n",
|
||||
"\n",
|
||||
" # Tools\n",
|
||||
" if example.get(\"tools\"):\n",
|
||||
" obj[\"tools\"] = convert_tools_to_openai_spec(example[\"tools\"])\n",
|
||||
"\n",
|
||||
" # Assistant message\n",
|
||||
" assistant_message = {\"role\": \"assistant\", \"content\": \"\"}\n",
|
||||
" if example.get(\"answers\"):\n",
|
||||
" tool_calls = convert_tool_calls(example[\"answers\"])\n",
|
||||
" \n",
|
||||
" if dataset_type == \"single\":\n",
|
||||
" # Only include examples with a single tool call\n",
|
||||
" if len(tool_calls) == 1:\n",
|
||||
" assistant_message[\"tool_calls\"] = tool_calls\n",
|
||||
" else:\n",
|
||||
" return None\n",
|
||||
" else:\n",
|
||||
" # For other dataset types, include all tool calls\n",
|
||||
" assistant_message[\"tool_calls\"] = tool_calls\n",
|
||||
" \n",
|
||||
" obj[\"messages\"].append(assistant_message)\n",
|
||||
"\n",
|
||||
" return obj"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following code cell converts the example data to the OpenAI format required by NeMo Customizer."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"convert_example(example)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**NOTE**: The convert_example function by default only retains data points that have exactly one tool_call in the output.\n",
|
||||
"The llama-3.2-1b-instruct model does not support parallel tool calls.\n",
|
||||
"For more information, refer to the [supported models](https://docs.nvidia.com/nim/large-language-models/latest/function-calling.html#supported-models) in the NeMo documentation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Process Entire Dataset\n",
|
||||
"Convert each example by looping through the dataset."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"all_examples = []\n",
|
||||
"with open(os.path.join(DATA_ROOT, \"xlam_openai_format.jsonl\"), \"w\") as f:\n",
|
||||
" for example in dataset[\"train\"]:\n",
|
||||
" converted = convert_example(example)\n",
|
||||
" if converted is not None:\n",
|
||||
" all_examples.append(converted)\n",
|
||||
" f.write(json.dumps(converted) + \"\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Split Dataset\n",
|
||||
"This step splits the dataset into a train, validation, and test set. For demonstration, we use a smaller subset of all the examples.\n",
|
||||
"You may choose to modify `NUM_EXAMPLES` to leverage a larger subset."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Configure to change the size of dataset to use\n",
|
||||
"NUM_EXAMPLES = 5000\n",
|
||||
"\n",
|
||||
"assert NUM_EXAMPLES <= len(all_examples), f\"{NUM_EXAMPLES} exceeds the total number of available ({len(all_examples)}) data points\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
" # Randomly choose a subset\n",
|
||||
"sampled_examples = random.sample(all_examples, NUM_EXAMPLES)\n",
|
||||
"\n",
|
||||
"# Split into 70% training, 15% validation, 15% testing\n",
|
||||
"train_size = int(0.7 * len(sampled_examples))\n",
|
||||
"val_size = int(0.15 * len(sampled_examples))\n",
|
||||
"\n",
|
||||
"train_data = sampled_examples[:train_size]\n",
|
||||
"val_data = sampled_examples[train_size : train_size + val_size]\n",
|
||||
"test_data = sampled_examples[train_size + val_size :]\n",
|
||||
"\n",
|
||||
"# Save the training and validation splits. We will use test split in the next section\n",
|
||||
"save_jsonl(os.path.join(CUSTOMIZATION_DATA_ROOT, \"training.jsonl\"), train_data)\n",
|
||||
"save_jsonl(os.path.join(VALIDATION_DATA_ROOT,\"validation.jsonl\"), val_data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Step 3: Prepare Data for Evaluation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For evaluation, the NeMo Microservices platform uses a format with a minor modification to the OpenAI format. This requires `tools_calls` to be brought out of messages to create a distinct parallel field.\n",
|
||||
"- `messages` includes the user querytools includes a list of functions and parameters available to the LLM to choose from, as well as their descriptions.\n",
|
||||
"- `tool_calls` is the ground truth response to the user query. This response contains the function name(s) and associated argument(s) in a \"tool_calls\" dict."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The following steps transform the test dataset into a format compatible with the NeMo Evaluator microservice.\n",
|
||||
"This dataset is for measuring accuracy metrics before and after customization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def convert_example_eval(entry):\n",
|
||||
" \"\"\"Convert a single entry in the dataset to the evaluator format\"\"\"\n",
|
||||
"\n",
|
||||
" # Note: This is a WAR for a known bug with tool calling in NIM\n",
|
||||
" for tool in entry[\"tools\"]:\n",
|
||||
" if len(tool[\"function\"][\"parameters\"][\"properties\"]) > LIMIT_TOOL_PROPERTIES:\n",
|
||||
" return None\n",
|
||||
" \n",
|
||||
" new_entry = {\n",
|
||||
" \"messages\": [],\n",
|
||||
" \"tools\": entry[\"tools\"],\n",
|
||||
" \"tool_calls\": []\n",
|
||||
" }\n",
|
||||
" \n",
|
||||
" for msg in entry[\"messages\"]:\n",
|
||||
" if msg[\"role\"] == \"assistant\" and \"tool_calls\" in msg:\n",
|
||||
" new_entry[\"tool_calls\"] = msg[\"tool_calls\"]\n",
|
||||
" else:\n",
|
||||
" new_entry[\"messages\"].append(msg)\n",
|
||||
" \n",
|
||||
" return new_entry\n",
|
||||
"\n",
|
||||
"def convert_dataset_eval(data):\n",
|
||||
" \"\"\"Convert the entire dataset for evaluation by restructuring the data format.\"\"\"\n",
|
||||
" return [result for entry in data if (result := convert_example_eval(entry)) is not None]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"`NOTE`: We have implemented a workaround for a known bug where tool calls freeze the NIM if a tool description includes a function with a larger number of parameters. As such, we have limited the dataset to use examples with available tools having at most 8 parameters. This will be resolved in the next NIM release."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"test_data_eval = convert_dataset_eval(test_data)\n",
|
||||
"save_jsonl(os.path.join(EVALUATION_DATA_ROOT, \"xlam-test-single.jsonl\"), test_data_eval)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.10.2"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue