# What does this PR do?
remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc
since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.
This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.
so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.
relates to #3806
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
When stack config is set to server in docker
STACK_CONFIG_ARG=--stack-config=http://localhost:8321, the env variable
was not getting correctly set and test id not set, causing
This is needed for test-and-cut to work
E openai.BadRequestError: Error code: 400 - {'detail': 'Invalid value:
Test ID is required for file ID allocation'}
5286461406
## Test Plan
CI
# What does this PR do?
Adds a subpage of the OpenAI compatibility page in the documentation.
This subpage documents known limitations of the Responses API.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3575
---------
Signed-off-by: Bill Murdock <bmurdock@redhat.com>
As indicated in the title. Our `starter` distribution enables all remote
providers _very intentionally_ because we believe it creates an easier,
more welcoming experience to new folks using the software. If we do
that, and then slam the logs with errors making them question their life
choices, it is not so good :)
Note that this fix is limited in scope. If you ever try to actually
instantiate the OpenAI client from a code path without an API key being
present, you deserve to fail hard.
## Test Plan
Run `llama stack run starter` with `OPENAI_API_KEY` set. No more wall of
text, just one message saying "listed 96 models".
a bunch of logger.info()s are good for server code to help debug in
production, but we don't want them killing our unit test output :)
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
**!!BREAKING CHANGE!!**
The lookup is also straightforward -- we always look for this identifier
and don't try to find a match for something without the provider_id
prefix.
Note that, this ideally means we need to update the `register_model()`
API also (we should kill "identifier" from there) but I am not doing
that as part of this PR.
## Test Plan
Existing unit tests
Wanted to re-enable Responses CI but it seems to hang for some reason
due to some interactions with conversations_store or responses_store.
## Test Plan
```
# library client
./scripts/integration-tests.sh --stack-config ci-tests --suite responses
# server
./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses
```
# What does this PR do?
Have closed the previous PR due to merge conflicts with multiple PRs
Addressed all comments from
https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying
over to this one)
## Test Plan
Added UTs and integration tests
Handle a base case when no stored messages exist because no Response
call has been made.
## Test Plan
```
./scripts/integration-tests.sh --stack-config server:ci-tests \
--suite responses --inference-mode record-if-missing --pattern test_conversation_responses
```
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.
Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError.
Fixed KeyError when chunks don't have document_id in metadata or
chunk_metadata. Updated logging to safely extract document_id using
getattr and RAG memory to handle different document_id locations. Added
test for missing document_id scenarios.
# What does this PR do?
Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing
`document_id` fields. The API
was failing even though `document_id` is optional according to the
schema.
Closes#3494
## Test Plan
**Before fix:**
- POST to `/v1/vector-io/insert` with chunks → 500 KeyError
- Happened regardless of where `document_id` was placed
**After fix:**
- Same request works fine → 200 OK
- Tested with Postman using FAISS backend
- Added unit test covering missing `document_id` scenarios
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:
- it creates a streaming output item for the final assistant message
output by
the model. until now we only added content parts and included that
message in the final response.
- rewrites the conversation update code completely to account for items
other than messages (tool calls, outputs, etc.)
## Test Plan
Used the test script from
https://github.com/llamastack/llama-stack-client-python/pull/281 for
this
```
TEST_API_BASE_URL=http://localhost:8321/v1 \
pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs
```
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type
MR message created with the assistance of Claude-4.5-sonnet
This resolves https://github.com/llamastack/llama-stack/issues/3755
## What does this PR do?
This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))
## Problem
The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.
## Solution
This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:
### Commit 1: Initial addition of gemini-embedding-001
Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
"gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # NEW
}
```
**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.
### Commit 2: Fix model ID matching with `models/` prefix
Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:
```python
embedding_model_metadata: dict[str, dict[str, int]] = {
"models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048}, # UPDATED
"models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048}, # UPDATED
}
```
**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.
### Commit 3: Fix embedding generation for providers without usage stats
Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:
```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens, # ← Crashed with AttributeError
total_tokens=response.usage.total_tokens,
)
# After (Lines 351-362):
if response.usage:
usage = OpenAIEmbeddingUsage(
prompt_tokens=response.usage.prompt_tokens,
total_tokens=response.usage.total_tokens,
)
else:
usage = OpenAIEmbeddingUsage(
prompt_tokens=0, # Default when not provided
total_tokens=0, # Default when not provided
)
```
**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.
## Changes
### Modified Files
**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata
**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics
## Key Technical Details
### Model ID Matching Flow
1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`
### Why Both Keys Needed the Prefix
The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.
## How to test this PR
Verified the changes by:
1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API
2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```
**Results:**
- ✅ `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
- ✅ `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`
3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata
4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata
5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
jq '.data[0].embedding | length'
```
# What does this PR do?
Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
- Unit tests
- Integration tests
- Simple example below:
Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:
```python
vs = client.vector_stores.create(
extra_body={
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"embedding_dimension": 384,
}
)
```
The `extra_body` is now unnecessary.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>