Prevent sensitive information from being logged in telemetry output by
assigning SecretStr type to sensitive fields. API keys, password from
KV store are now covered. All providers have been converted.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
- remove auto-download of ollama embedding models
- add embedding model metadata to dynamic listing w/ unit test
- add support and tests for allowed_models
- removed inference provider models.py files where dynamic listing is
enabled
- store embedding metadata in embedding_model_metadata field on
inference providers
- make model_entries optional on ModelRegistryHelper and
LiteLLMOpenAIMixin
- make OpenAIMixin a ModelRegistryHelper
- skip base64 embedding test for remote::ollama, always returns floats
- only use OpenAI client for ollama model listing
- remove unused build_model_entry function
- remove unused get_huggingface_repo function
## Test Plan
ci w/ new tests
# What does this PR do?
update the Gemini inference provider to use openai-python for the
openai-compat endpoints
partially addresses #3349, does not address /inference/completion or
/inference/chat-completion
## Test Plan
ci
Groq has never supported raw completions anyhow. So this makes it easier
to switch it to LiteLLM. All our test suite passes.
I also updated all the openai-compat providers so they work with api
keys passed from headers. `provider_data`
## Test Plan
```bash
LLAMA_STACK_CONFIG=groq \
pytest -s -v tests/client-sdk/inference/test_text_inference.py \
--inference-model=groq/llama-3.3-70b-versatile --vision-inference-model=""
```
Also tested (openai, anthropic, gemini) providers. No regressions.
# What does this PR do?
This PR introduces more non-llama model support to llama stack.
Providers introduced: openai, anthropic and gemini. All of these
providers use essentially the same piece of code -- the implementation
works via the `litellm` library.
We will expose only specific models for providers we enable making sure
they all work well and pass tests. This setup (instead of automatically
enabling _all_ providers and models allowed by LiteLLM) ensures we can
also perform any needed prompt tuning on a per-model basis as needed
(just like we do it for llama models.)
## Test Plan
```bash
#!/bin/bash
args=("$@")
for model in openai/gpt-4o anthropic/claude-3-5-sonnet-latest gemini/gemini-1.5-flash; do
LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/inference/test_text_inference.py \
--embedding-model=all-MiniLM-L6-v2 \
--vision-inference-model="" \
--inference-model=$model "${args[@]}"
done
```