mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-27 18:50:41 +00:00
# What does this PR do? This PR contains two sets of notebooks that serve as reference material for developers getting started with Llama Stack using the NVIDIA Provider. Developers should be able to execute these notebooks end-to-end, pointing to their NeMo Microservices deployment. 1. `beginner_e2e/`: Notebook that walks through a beginner end-to-end workflow that covers creating datasets, running inference, customizing and evaluating models, and running safety checks. 2. `tool_calling/`: Notebook that is ported over from the [Data Flywheel & Tool Calling notebook](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/nemo/data-flywheel) that is referenced in the NeMo Microservices docs. I updated the notebook to use the Llama Stack client wherever possible, and added relevant instructions. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - Both notebook folders contain READMEs with pre-requisites. To manually test these notebooks, you'll need to have a deployment of the NeMo Microservices Platform and update the `config.py` file with your deployment's information. - I've run through these notebooks manually end-to-end to verify each step works. [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
595 lines
20 KiB
Text
595 lines
20 KiB
Text
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Part 1: Preparing Datasets for Fine-tuning and Evaluation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This notebook showcases transforming a dataset for finetuning and evaluating an LLM for tool calling with NeMo Microservices."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy NeMo Microservices\n",
|
|
"Ensure the NeMo Microservices platform is up and running, including the model downloading step for `meta/llama-3.2-1b-instruct`. Please refer to the [installation guide](https://docs.nvidia.com/nemo/microservices/latest/set-up/deploy-as-platform/index.html) for instructions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can verify the `meta/llama-3.1-8b-instruct` is deployed by querying the NIM endpoint. The response should include a model with an `id` of `meta/llama-3.1-8b-instruct`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"```bash\n",
|
|
"# URL to NeMo deployment management service\n",
|
|
"export NEMO_URL=\"http://nemo.test\"\n",
|
|
"\n",
|
|
"curl -X GET \"$NEMO_URL/v1/models\" \\\n",
|
|
" -H \"Accept: application/json\"\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Set up Developer Environment\n",
|
|
"Set up your development environment on your machine. The project uses `uv` to manage Python dependencies. From the root of the project, install dependencies and create your virtual environment:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"```bash\n",
|
|
"uv sync --extra dev\n",
|
|
"uv pip install -e .\n",
|
|
"source .venv/bin/activate\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Build Llama Stack Image\n",
|
|
"Build the Llama Stack image using the virtual environment you just created. For local development, set `LLAMA_STACK_DIR` to ensure your local code is use in the image. To use the production version of `llama-stack`, omit `LLAMA_STACK_DIR`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"```bash\n",
|
|
"LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"First, import the necessary libraries."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"import json\n",
|
|
"import random\n",
|
|
"from pprint import pprint\n",
|
|
"from typing import Any, Dict, List, Union\n",
|
|
"\n",
|
|
"import numpy as np\n",
|
|
"import torch\n",
|
|
"from datasets import load_dataset"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Set a random seed for reproducibility."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"SEED = 1234\n",
|
|
"\n",
|
|
"# Limits to at most N tool properties\n",
|
|
"LIMIT_TOOL_PROPERTIES = 8\n",
|
|
"\n",
|
|
"torch.manual_seed(SEED)\n",
|
|
"torch.cuda.manual_seed_all(SEED)\n",
|
|
"np.random.seed(SEED)\n",
|
|
"random.seed(SEED)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Define the data root directory and create necessary directoryies for storing processed data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Processed data will be stored here\n",
|
|
"DATA_ROOT = os.path.join(os.getcwd(), \"sample_data\")\n",
|
|
"CUSTOMIZATION_DATA_ROOT = os.path.join(DATA_ROOT, \"customization\")\n",
|
|
"VALIDATION_DATA_ROOT = os.path.join(DATA_ROOT, \"validation\")\n",
|
|
"EVALUATION_DATA_ROOT = os.path.join(DATA_ROOT, \"evaluation\")\n",
|
|
"\n",
|
|
"os.makedirs(DATA_ROOT, exist_ok=True)\n",
|
|
"os.makedirs(CUSTOMIZATION_DATA_ROOT, exist_ok=True)\n",
|
|
"os.makedirs(VALIDATION_DATA_ROOT, exist_ok=True)\n",
|
|
"os.makedirs(EVALUATION_DATA_ROOT, exist_ok=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 1: Download xLAM Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This step loads the xLAM dataset from Hugging Face.\n",
|
|
"\n",
|
|
"Ensure that you have followed the prerequisites mentioned above, obtained a Hugging Face access token, and configured it in config.py. In addition to getting an access token, you need to apply for access to the xLAM dataset [here](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k), which will be approved instantly."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from config import HF_TOKEN\n",
|
|
"\n",
|
|
"os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
|
|
"os.environ[\"HF_ENDPOINT\"] = \"https://huggingface.co\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Download from Hugging Face\n",
|
|
"dataset = load_dataset(\"Salesforce/xlam-function-calling-60k\")\n",
|
|
"\n",
|
|
"# Inspect a sample\n",
|
|
"example = dataset['train'][0]\n",
|
|
"pprint(example)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"For more details on the structure of this data, refer to the [data structure of the xLAM dataset](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k#structure) in the Hugging Face documentation."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 2: Prepare Data for Customization"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"For Customization, the NeMo Microservices platform leverages the OpenAI data format, comprised of messages and tools:\n",
|
|
"- `messages` include the user query, as well as the ground truth `assistant` response to the query. This response contains the function name(s) and associated argument(s) in a `tool_calls` dict\n",
|
|
"- `tools` include a list of functions and parameters available to the LLM to choose from, as well as their descriptions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following helper functions convert a single xLAM JSON data point into OpenAI format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def normalize_type(param_type: str) -> str:\n",
|
|
" \"\"\"\n",
|
|
" Normalize Python type hints and parameter definitions to OpenAI function spec types.\n",
|
|
"\n",
|
|
" Args:\n",
|
|
" param_type: Type string that could include default values or complex types\n",
|
|
"\n",
|
|
" Returns:\n",
|
|
" Normalized type string according to OpenAI function spec\n",
|
|
" \"\"\"\n",
|
|
" # Remove whitespace\n",
|
|
" param_type = param_type.strip()\n",
|
|
"\n",
|
|
" # Handle types with default values (e.g. \"str, default='London'\")\n",
|
|
" if \",\" in param_type and \"default\" in param_type:\n",
|
|
" param_type = param_type.split(\",\")[0].strip()\n",
|
|
"\n",
|
|
" # Handle types with just default values (e.g. \"default='London'\")\n",
|
|
" if param_type.startswith(\"default=\"):\n",
|
|
" return \"string\" # Default to string if only default value is given\n",
|
|
"\n",
|
|
" # Remove \", optional\" suffix if present\n",
|
|
" param_type = param_type.replace(\", optional\", \"\").strip()\n",
|
|
"\n",
|
|
" # Handle complex types\n",
|
|
" if param_type.startswith(\"Callable\"):\n",
|
|
" return \"string\" # Represent callable as string in JSON schema\n",
|
|
" if param_type.startswith(\"Tuple\"):\n",
|
|
" return \"array\" # Represent tuple as array in JSON schema\n",
|
|
" if param_type.startswith(\"List[\"):\n",
|
|
" return \"array\"\n",
|
|
" if param_type.startswith(\"Set\") or param_type == \"set\":\n",
|
|
" return \"array\" # Represent set as array in JSON schema\n",
|
|
"\n",
|
|
" # Map common type variations to OpenAI spec types\n",
|
|
" type_mapping: Dict[str, str] = {\n",
|
|
" \"str\": \"string\",\n",
|
|
" \"int\": \"integer\",\n",
|
|
" \"float\": \"number\",\n",
|
|
" \"bool\": \"boolean\",\n",
|
|
" \"list\": \"array\",\n",
|
|
" \"dict\": \"object\",\n",
|
|
" \"List\": \"array\",\n",
|
|
" \"Dict\": \"object\",\n",
|
|
" \"set\": \"array\",\n",
|
|
" \"Set\": \"array\"\n",
|
|
" }\n",
|
|
"\n",
|
|
" if param_type in type_mapping:\n",
|
|
" return type_mapping[param_type]\n",
|
|
" else:\n",
|
|
" print(f\"Unknown type: {param_type}\")\n",
|
|
" return \"string\" # Default to string for unknown types\n",
|
|
"\n",
|
|
"\n",
|
|
"def convert_tools_to_openai_spec(tools: Union[str, List[Dict[str, Any]]]) -> List[Dict[str, Any]]:\n",
|
|
" # If tools is a string, try to parse it as JSON\n",
|
|
" if isinstance(tools, str):\n",
|
|
" try:\n",
|
|
" tools = json.loads(tools)\n",
|
|
" except json.JSONDecodeError as e:\n",
|
|
" print(f\"Failed to parse tools string as JSON: {e}\")\n",
|
|
" return []\n",
|
|
"\n",
|
|
" # Ensure tools is a list\n",
|
|
" if not isinstance(tools, list):\n",
|
|
" print(f\"Expected tools to be a list, but got {type(tools)}\")\n",
|
|
" return []\n",
|
|
"\n",
|
|
" openai_tools: List[Dict[str, Any]] = []\n",
|
|
" for tool in tools:\n",
|
|
" # Check if tool is a dictionary\n",
|
|
" if not isinstance(tool, dict):\n",
|
|
" print(f\"Expected tool to be a dictionary, but got {type(tool)}\")\n",
|
|
" continue\n",
|
|
"\n",
|
|
" # Check if 'parameters' is a dictionary\n",
|
|
" if not isinstance(tool.get(\"parameters\"), dict):\n",
|
|
" print(f\"Expected 'parameters' to be a dictionary, but got {type(tool.get('parameters'))} for tool: {tool}\")\n",
|
|
" continue\n",
|
|
"\n",
|
|
" \n",
|
|
"\n",
|
|
" normalized_parameters: Dict[str, Dict[str, Any]] = {}\n",
|
|
" for param_name, param_info in tool[\"parameters\"].items():\n",
|
|
" if not isinstance(param_info, dict):\n",
|
|
" print(\n",
|
|
" f\"Expected parameter info to be a dictionary, but got {type(param_info)} for parameter: {param_name}\"\n",
|
|
" )\n",
|
|
" continue\n",
|
|
"\n",
|
|
" # Create parameter info without default first\n",
|
|
" param_dict = {\n",
|
|
" \"description\": param_info.get(\"description\", \"\"),\n",
|
|
" \"type\": normalize_type(param_info.get(\"type\", \"\")),\n",
|
|
" }\n",
|
|
"\n",
|
|
" # Only add default if it exists, is not None, and is not an empty string\n",
|
|
" default_value = param_info.get(\"default\")\n",
|
|
" if default_value is not None and default_value != \"\":\n",
|
|
" param_dict[\"default\"] = default_value\n",
|
|
"\n",
|
|
" normalized_parameters[param_name] = param_dict\n",
|
|
"\n",
|
|
" openai_tool = {\n",
|
|
" \"type\": \"function\",\n",
|
|
" \"function\": {\n",
|
|
" \"name\": tool[\"name\"],\n",
|
|
" \"description\": tool[\"description\"],\n",
|
|
" \"parameters\": {\"type\": \"object\", \"properties\": normalized_parameters},\n",
|
|
" },\n",
|
|
" }\n",
|
|
" openai_tools.append(openai_tool)\n",
|
|
" return openai_tools\n",
|
|
"\n",
|
|
"\n",
|
|
"def save_jsonl(filename, data):\n",
|
|
" \"\"\"Write a list of json objects to a .jsonl file\"\"\"\n",
|
|
" with open(filename, \"w\") as f:\n",
|
|
" for entry in data:\n",
|
|
" f.write(json.dumps(entry) + \"\\n\")\n",
|
|
"\n",
|
|
"\n",
|
|
"def convert_tool_calls(xlam_tools):\n",
|
|
" \"\"\"Convert XLAM tool format to OpenAI's tool schema.\"\"\"\n",
|
|
" tools = []\n",
|
|
" for tool in json.loads(xlam_tools):\n",
|
|
" tools.append({\"type\": \"function\", \"function\": {\"name\": tool[\"name\"], \"arguments\": tool.get(\"arguments\", {})}})\n",
|
|
" return tools\n",
|
|
"\n",
|
|
"\n",
|
|
"def convert_example(example, dataset_type='single'):\n",
|
|
" \"\"\"Convert an XLAM dataset example to OpenAI format.\"\"\"\n",
|
|
" obj = {\"messages\": []}\n",
|
|
"\n",
|
|
" # User message\n",
|
|
" obj[\"messages\"].append({\"role\": \"user\", \"content\": example[\"query\"]})\n",
|
|
"\n",
|
|
" # Tools\n",
|
|
" if example.get(\"tools\"):\n",
|
|
" obj[\"tools\"] = convert_tools_to_openai_spec(example[\"tools\"])\n",
|
|
"\n",
|
|
" # Assistant message\n",
|
|
" assistant_message = {\"role\": \"assistant\", \"content\": \"\"}\n",
|
|
" if example.get(\"answers\"):\n",
|
|
" tool_calls = convert_tool_calls(example[\"answers\"])\n",
|
|
" \n",
|
|
" if dataset_type == \"single\":\n",
|
|
" # Only include examples with a single tool call\n",
|
|
" if len(tool_calls) == 1:\n",
|
|
" assistant_message[\"tool_calls\"] = tool_calls\n",
|
|
" else:\n",
|
|
" return None\n",
|
|
" else:\n",
|
|
" # For other dataset types, include all tool calls\n",
|
|
" assistant_message[\"tool_calls\"] = tool_calls\n",
|
|
" \n",
|
|
" obj[\"messages\"].append(assistant_message)\n",
|
|
"\n",
|
|
" return obj"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following code cell converts the example data to the OpenAI format required by NeMo Customizer."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"convert_example(example)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**NOTE**: The convert_example function by default only retains data points that have exactly one tool_call in the output.\n",
|
|
"The llama-3.2-1b-instruct model does not support parallel tool calls.\n",
|
|
"For more information, refer to the [supported models](https://docs.nvidia.com/nim/large-language-models/latest/function-calling.html#supported-models) in the NeMo documentation."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Process Entire Dataset\n",
|
|
"Convert each example by looping through the dataset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"all_examples = []\n",
|
|
"with open(os.path.join(DATA_ROOT, \"xlam_openai_format.jsonl\"), \"w\") as f:\n",
|
|
" for example in dataset[\"train\"]:\n",
|
|
" converted = convert_example(example)\n",
|
|
" if converted is not None:\n",
|
|
" all_examples.append(converted)\n",
|
|
" f.write(json.dumps(converted) + \"\\n\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Split Dataset\n",
|
|
"This step splits the dataset into a train, validation, and test set. For demonstration, we use a smaller subset of all the examples.\n",
|
|
"You may choose to modify `NUM_EXAMPLES` to leverage a larger subset."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Configure to change the size of dataset to use\n",
|
|
"NUM_EXAMPLES = 5000\n",
|
|
"\n",
|
|
"assert NUM_EXAMPLES <= len(all_examples), f\"{NUM_EXAMPLES} exceeds the total number of available ({len(all_examples)}) data points\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
" # Randomly choose a subset\n",
|
|
"sampled_examples = random.sample(all_examples, NUM_EXAMPLES)\n",
|
|
"\n",
|
|
"# Split into 70% training, 15% validation, 15% testing\n",
|
|
"train_size = int(0.7 * len(sampled_examples))\n",
|
|
"val_size = int(0.15 * len(sampled_examples))\n",
|
|
"\n",
|
|
"train_data = sampled_examples[:train_size]\n",
|
|
"val_data = sampled_examples[train_size : train_size + val_size]\n",
|
|
"test_data = sampled_examples[train_size + val_size :]\n",
|
|
"\n",
|
|
"# Save the training and validation splits. We will use test split in the next section\n",
|
|
"save_jsonl(os.path.join(CUSTOMIZATION_DATA_ROOT, \"training.jsonl\"), train_data)\n",
|
|
"save_jsonl(os.path.join(VALIDATION_DATA_ROOT,\"validation.jsonl\"), val_data)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Step 3: Prepare Data for Evaluation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"For evaluation, the NeMo Microservices platform uses a format with a minor modification to the OpenAI format. This requires `tools_calls` to be brought out of messages to create a distinct parallel field.\n",
|
|
"- `messages` includes the user querytools includes a list of functions and parameters available to the LLM to choose from, as well as their descriptions.\n",
|
|
"- `tool_calls` is the ground truth response to the user query. This response contains the function name(s) and associated argument(s) in a \"tool_calls\" dict."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The following steps transform the test dataset into a format compatible with the NeMo Evaluator microservice.\n",
|
|
"This dataset is for measuring accuracy metrics before and after customization."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def convert_example_eval(entry):\n",
|
|
" \"\"\"Convert a single entry in the dataset to the evaluator format\"\"\"\n",
|
|
"\n",
|
|
" # Note: This is a WAR for a known bug with tool calling in NIM\n",
|
|
" for tool in entry[\"tools\"]:\n",
|
|
" if len(tool[\"function\"][\"parameters\"][\"properties\"]) > LIMIT_TOOL_PROPERTIES:\n",
|
|
" return None\n",
|
|
" \n",
|
|
" new_entry = {\n",
|
|
" \"messages\": [],\n",
|
|
" \"tools\": entry[\"tools\"],\n",
|
|
" \"tool_calls\": []\n",
|
|
" }\n",
|
|
" \n",
|
|
" for msg in entry[\"messages\"]:\n",
|
|
" if msg[\"role\"] == \"assistant\" and \"tool_calls\" in msg:\n",
|
|
" new_entry[\"tool_calls\"] = msg[\"tool_calls\"]\n",
|
|
" else:\n",
|
|
" new_entry[\"messages\"].append(msg)\n",
|
|
" \n",
|
|
" return new_entry\n",
|
|
"\n",
|
|
"def convert_dataset_eval(data):\n",
|
|
" \"\"\"Convert the entire dataset for evaluation by restructuring the data format.\"\"\"\n",
|
|
" return [result for entry in data if (result := convert_example_eval(entry)) is not None]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"`NOTE`: We have implemented a workaround for a known bug where tool calls freeze the NIM if a tool description includes a function with a larger number of parameters. As such, we have limited the dataset to use examples with available tools having at most 8 parameters. This will be resolved in the next NIM release."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"test_data_eval = convert_dataset_eval(test_data)\n",
|
|
"save_jsonl(os.path.join(EVALUATION_DATA_ROOT, \"xlam-test-single.jsonl\"), test_data_eval)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": ".venv",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.2"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|