mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-15 22:18:00 +00:00
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
parent
c252dfa3ef
commit
0caef40e0d
26 changed files with 1595 additions and 246 deletions
56
tests/integration/recordings/responses/4096743baf8e.json
Normal file
56
tests/integration/recordings/responses/4096743baf8e.json
Normal file
|
@ -0,0 +1,56 @@
|
|||
{
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"url": "http://localhost:11434/v1/v1/completions",
|
||||
"headers": {},
|
||||
"body": {
|
||||
"model": "llama3.2:3b-instruct-fp16",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Test trace openai 0"
|
||||
}
|
||||
],
|
||||
"stream": false
|
||||
},
|
||||
"endpoint": "/v1/completions",
|
||||
"model": "llama3.2:3b-instruct-fp16"
|
||||
},
|
||||
"response": {
|
||||
"body": {
|
||||
"__type__": "openai.types.chat.chat_completion.ChatCompletion",
|
||||
"__data__": {
|
||||
"id": "chatcmpl-695",
|
||||
"choices": [
|
||||
{
|
||||
"finish_reason": "stop",
|
||||
"index": 0,
|
||||
"logprobs": null,
|
||||
"message": {
|
||||
"content": "You want to test the OpenAI API v0, but I need to clarify a few things.\n\nThe OpenAI API has undergone significant changes since its release in 2019. The v0 API was retired in favor of newer versions like v1 \"GPT-2\" and v3 \"GPT-3\".\n\nAfter verifying with OpenAI's Documentation: https://api.openai.com/docs/en/v1/basics, I found that there is no longer an API endpoint for testing with version 0.\n\nHowever, I can guide you through the steps to interact with the latest version of the OpenAI API, which should give you a similar experience:\n\nTo use the OpenAI v3 (GPT-3) API, you'll need to create an account on the OpenAI website and obtain an API key. Here are the general steps:\n\n1. Create an account on the OpenAI website: https://openai.com/\n2. Enable the API feature in your account settings\n3. Obtain an API key: go to your account dashboard \u2192 API\n4. Install a library that supports the v3 API, such as `python-openai` or `transformers`\n5. Use the library to send requests to the OpenAI API\n\nHere's some sample Python code using the `python-openai` library:\n\n```python\nimport openai\n\n# Initialize the OpenAI API client with your access token\naccess_token = \"YOUR_API_KEY_HERE\"\nopenai.api_key = access_token\nassistant = openai.pytorch.GPT3Small()\n\n# Test the assistant with a simple function call\nresponse = assistant.call(\n prompt=\"Hello, how are you?\",\n)\nprint(response)\n```\n\nPlease note that this is just an example, and you should replace `YOUR_API_KEY_HERE` with your actual API key.\n\nIf you're interested in using an older version of the OpenAI API for testing, I can try to provide more guidance on implementing it. However, keep in mind that v0 is no longer supported by OpenAI, and this might lead to limitations or inconsistencies.",
|
||||
"refusal": null,
|
||||
"role": "assistant",
|
||||
"annotations": null,
|
||||
"audio": null,
|
||||
"function_call": null,
|
||||
"tool_calls": null
|
||||
}
|
||||
}
|
||||
],
|
||||
"created": 1754051825,
|
||||
"model": "llama3.2:3b-instruct-fp16",
|
||||
"object": "chat.completion",
|
||||
"service_tier": null,
|
||||
"system_fingerprint": "fp_ollama",
|
||||
"usage": {
|
||||
"completion_tokens": 423,
|
||||
"prompt_tokens": 31,
|
||||
"total_tokens": 454,
|
||||
"completion_tokens_details": null,
|
||||
"prompt_tokens_details": null
|
||||
}
|
||||
}
|
||||
},
|
||||
"is_streaming": false
|
||||
}
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue