llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 12:07:34 +00:00

History

ehhuang ac7c35fbe6 fix: don't pass default response format in Responses (#3614 ) # What does this PR do? Fireworks doesn't allow repsonse_format with tool use. The default response format is 'text' anyway, so we can safely omit. ## Test Plan Below script failed without the change, runs after. ``` #!/usr/bin/env python3 """ Script to test Responses API with kubernetes-mcp-server. This script: 1. Connects to the llama stack server 2. Uses the Responses API with MCP tools 3. Asks for the list of Kubernetes namespaces using the kubernetes-mcp-server """ import json from openai import OpenAI # Connect to the llama stack server base_url = "http://localhost:8321/v1" client = OpenAI(base_url=base_url, api_key="fake") # Define the MCP tool pointing to the kubernetes-mcp-server # The kubernetes-mcp-server is running on port 3000 with SSE endpoint at /sse mcp_server_url = "http://localhost:3000/sse" tools = [ { "type": "mcp", "server_label": "k8s", "server_url": mcp_server_url, } ] # Create a response request asking for k8s namespaces print("Sending request to list Kubernetes namespaces...") print(f"Using MCP server at: {mcp_server_url}") print("Available tools will be listed automatically by the MCP server.") print() response = client.responses.create( # model="meta-llama/Llama-3.2-3B-Instruct", # Using the vllm model model="fireworks/accounts/fireworks/models/llama4-scout-instruct-basic", # model="openai/gpt-4o", input="what are all the Kubernetes namespaces? Use tool call to `namespaces_list`. make sure to adhere to the tool calling format UNDER ALL CIRCUMSTANCES.", tools=tools, stream=False, ) print("\n" + "=" * 80) print("RESPONSE OUTPUT:") print("=" * 80) # Print the output for i, output in enumerate(response.output): print(f"\n[Output {i + 1}] Type: {output.type}") if output.type == "mcp_list_tools": print(f" Server: {output.server_label}") print(f" Tools available: {[t.name for t in output.tools]}") elif output.type == "mcp_call": print(f" Tool called: {output.name}") print(f" Arguments: {output.arguments}") print(f" Result: {output.output}") if output.error: print(f" Error: {output.error}") elif output.type == "message": print(f" Role: {output.role}") print(f" Content: {output.content}") print("\n" + "=" * 80) print("FINAL RESPONSE TEXT:") print("=" * 80) print(response.output_text) ```		2025-09-30 14:52:24 -07:00
..
agent	fix: adding mime type of application/json support (#3452 )	2025-09-29 11:27:31 -07:00
agents	fix: don't pass default response format in Responses (#3614 )	2025-09-30 14:52:24 -07:00
batches	feat(batches, completions): add /v1/completions support to /v1/batches (#3309 )	2025-09-05 11:59:57 -07:00
files	feat(files): fix expires_after API shape (#3604 )	2025-09-29 21:29:15 -07:00
inference	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547 )	2025-09-25 17:17:00 -04:00
inline	fix: mcp tool with array type should include items (#3602 )	2025-09-29 23:11:41 -07:00
nvidia	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547 )	2025-09-25 17:17:00 -04:00
utils	fix(expires_after): make sure multipart/form-data is properly parsed (#3612 )	2025-09-30 16:14:03 -04:00
vector_io	chore(api): remove deprecated embeddings impls (#3301 )	2025-09-29 14:45:09 -04:00
test_bedrock.py	fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386 )	2025-09-11 11:41:53 +02:00
test_configs.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00