mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-15 22:18:00 +00:00
fix: telemetry fixes (inference and core telemetry) (#2733)
# What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
parent
c252dfa3ef
commit
0caef40e0d
26 changed files with 1595 additions and 246 deletions
|
@ -13,12 +13,12 @@
|
|||
"__data__": {
|
||||
"models": [
|
||||
{
|
||||
"model": "llama3.2:3b-instruct-fp16",
|
||||
"name": "llama3.2:3b-instruct-fp16",
|
||||
"digest": "195a8c01d91ec3cb1e0aad4624a51f2602c51fa7d96110f8ab5a20c84081804d",
|
||||
"expires_at": "2025-08-05T14:12:18.480323-07:00",
|
||||
"size": 7919570944,
|
||||
"size_vram": 7919570944,
|
||||
"model": "llama3.2:3b",
|
||||
"name": "llama3.2:3b",
|
||||
"digest": "a80c4f17acd55265feec403c7aef86be0c25983ab279d83f3bcd3abbcb5b8b72",
|
||||
"expires_at": "2025-08-06T15:57:21.573326-04:00",
|
||||
"size": 4030033920,
|
||||
"size_vram": 4030033920,
|
||||
"details": {
|
||||
"parent_model": "",
|
||||
"format": "gguf",
|
||||
|
@ -27,25 +27,7 @@
|
|||
"llama"
|
||||
],
|
||||
"parameter_size": "3.2B",
|
||||
"quantization_level": "F16"
|
||||
}
|
||||
},
|
||||
{
|
||||
"model": "all-minilm:l6-v2",
|
||||
"name": "all-minilm:l6-v2",
|
||||
"digest": "1b226e2802dbb772b5fc32a58f103ca1804ef7501331012de126ab22f67475ef",
|
||||
"expires_at": "2025-08-05T14:10:20.883978-07:00",
|
||||
"size": 590204928,
|
||||
"size_vram": 590204928,
|
||||
"details": {
|
||||
"parent_model": "",
|
||||
"format": "gguf",
|
||||
"family": "bert",
|
||||
"families": [
|
||||
"bert"
|
||||
],
|
||||
"parameter_size": "23M",
|
||||
"quantization_level": "F16"
|
||||
"quantization_level": "Q4_K_M"
|
||||
}
|
||||
}
|
||||
]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue