Add another header so client SDKs can identify their versions which can
be used for immediate detection of possible compatibility issues. A
semver mismatch against the wrong server should be immediately flagged
and requests should be denied.
Also change `X-LlamaStack-ProviderData` to `X-LlamaStack-Provider-Data`
since that hyphenation is better.
# What does this PR do?
- braintrust scoring provider requires OPENAI_API_KEY env variable to be
set
- move this to be able to be set as request headers (e.g. like together
/ fireworks api keys)
- fixes pytest with agents dependency
## Test Plan
**E2E**
```
llama stack run
```
```yaml
scoring:
- provider_id: braintrust-0
provider_type: inline::braintrust
config: {}
```
**Client**
```python
self.client = LlamaStackClient(
base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:5000"),
provider_data={
"openai_api_key": os.environ.get("OPENAI_API_KEY", ""),
},
)
```
- run `llama-stack-client eval run_scoring`
**Unit Test**
```
pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
```
```
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py --env OPENAI_API_KEY=$OPENAI_API_KEY
```
<img width="745" alt="image"
src="https://github.com/user-attachments/assets/68f5cdda-f6c8-496d-8b4f-1b3dabeca9c2">
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR moves all print statements to use logging. Things changed:
- Had to add `await start_trace("sse_generator")` to server.py to
actually get tracing working. else was not seeing any logs
- If no telemetry provider is provided in the run.yaml, we will write to
stdout
- by default, the logs are going to be in JSON, but we expose an option
to configure to output in a human readable way.
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.
Most important bits:
* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.
* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
- routing model A to ollama and model B to a remote provider like Together
- routing shield A to local impl while shield B to a remote provider like Bedrock
- routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis
* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.