llama-stack-mirror/llama_stack/providers/inline
Dinesh Yeduguru ead9397e22
fix: tracing fixes for trace context propogation across coroutines (#1522)
# What does this PR do?
This PR has two fixes needed for correct trace context propagation
across asycnio boundary
Fix 1: Start using context vars to store the global trace context.
This is needed since we cannot use the same trace context across
coroutines since the state is shared. each coroutine
should have its own trace context so that each of it can start storing
its state correctly.
Fix 2: Start a new span for each new coroutines started for running
shields to keep the span tree clean


## Test Plan

### Integration tests with server
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run
~/.llama/distributions/together/together-run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
server logs:
https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe
test logs:
https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3
 
### Integration tests with library client
LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct

logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8

### Apps test with server:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
server logs:
https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797
app logs:
https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8
2025-03-11 07:12:48 -07:00
..
agents fix: tracing fixes for trace context propogation across coroutines (#1522) 2025-03-11 07:12:48 -07:00
datasetio build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00
eval chore: rename task_config to benchmark_config (#1397) 2025-03-04 12:44:04 -08:00
inference feat: updated inline vllm inference provider (#880) 2025-03-07 13:38:23 -08:00
ios/inference chore: removed executorch submodule (#1265) 2025-02-25 21:57:21 -08:00
post_training fix: replace eval with json decoding for format_adapter (#1328) 2025-02-28 11:25:23 -08:00
safety chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00
scoring feat: [new open benchmark] Math 500 (#1538) 2025-03-10 20:38:28 -07:00
telemetry fix: Revert "feat: record token usage for inference API (#1300)" (#1476) 2025-03-07 10:16:47 -08:00
tool_runtime chore: remove dependency on llama_models completely (#1344) 2025-03-01 12:48:08 -08:00
vector_io chore: made inbuilt tools blocking calls into async non blocking calls (#1509) 2025-03-09 16:59:24 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00