mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-15 22:47:59 +00:00
# What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. **Fix library client** (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. **Update remaining providers** to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. **Update unit tests** to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.
58 lines
3.2 KiB
JSON
58 lines
3.2 KiB
JSON
{
|
|
"test_id": null,
|
|
"request": {
|
|
"method": "POST",
|
|
"url": "http://0.0.0.0:11434/v1/v1/chat/completions",
|
|
"headers": {},
|
|
"body": {
|
|
"model": "llama3.2:3b-instruct-fp16",
|
|
"messages": [
|
|
{
|
|
"role": "user",
|
|
"content": "Test trace openai 2"
|
|
}
|
|
],
|
|
"stream": false,
|
|
"extra_body": null
|
|
},
|
|
"endpoint": "/v1/chat/completions",
|
|
"model": "llama3.2:3b-instruct-fp16"
|
|
},
|
|
"response": {
|
|
"body": {
|
|
"__type__": "openai.types.chat.chat_completion.ChatCompletion",
|
|
"__data__": {
|
|
"id": "rec-b7bafc118c1e",
|
|
"choices": [
|
|
{
|
|
"finish_reason": "stop",
|
|
"index": 0,
|
|
"logprobs": null,
|
|
"message": {
|
|
"content": "I don't have direct access to OpenAI's systems or data. However, I can provide you with information on how to test the GPT-2 model using a few different methods:\n\n1. **OpenAI's GPT-2 Model**: You can sign up for an OpenAI API key and use it to interact with the GPT-2 model through their API. This will allow you to send text prompts and receive responses from the model.\n2. **Hugging Face Transformers Library**: The Hugging Face library provides a pre-trained GPT-2 model that you can use in Python. You can install the library using pip: `pip install transformers`. Then, you can import the model and use it to generate text.\n3. **Google Colab Notebook**: Google Colab provides a free online platform where you can run Python code, including using the Hugging Face library to interact with GPT-2.\n\nHere's an example of how you could test the GPT-2 model using the Hugging Face library in a Google Colab notebook:\n```\n!pip install transformers\nimport torch\nfrom transformers import GPT2Model, GPT2Tokenizer\n\n# Load pre-trained GPT-2 model and tokenizer\nmodel = GPT2Model.from_pretrained('gpt2-base')\ntokenizer = GPT2Tokenizer.from_pretrained('gpt2-base')\n\n# Define a text prompt\nprompt = \"Write a short story about a cat\"\n\n# Tokenize the prompt\ninputs = tokenizer(prompt, return_tensors='pt')\n\n# Generate text using the model\noutput = model.generate(inputs['input_ids'], max_length=128)\n\n# Print the generated text\nprint(tokenizer.decode(output[0], skip_special_tokens=True))\n```\nThis code will generate a short story about a cat based on the prompt you provided. Keep in mind that this is just an example, and the output may vary depending on the specific model and configuration used.\n\nLet me know if you have any further questions or if there's anything else I can help you with!",
|
|
"refusal": null,
|
|
"role": "assistant",
|
|
"annotations": null,
|
|
"audio": null,
|
|
"function_call": null,
|
|
"tool_calls": null
|
|
}
|
|
}
|
|
],
|
|
"created": 0,
|
|
"model": "llama3.2:3b-instruct-fp16",
|
|
"object": "chat.completion",
|
|
"service_tier": null,
|
|
"system_fingerprint": "fp_ollama",
|
|
"usage": {
|
|
"completion_tokens": 417,
|
|
"prompt_tokens": 31,
|
|
"total_tokens": 448,
|
|
"completion_tokens_details": null,
|
|
"prompt_tokens_details": null
|
|
}
|
|
}
|
|
},
|
|
"is_streaming": false
|
|
}
|
|
}
|