llama-stack-mirror/llama_stack
Juan Pérez de Algaba add8cd801b
feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys (#3813)
# Add support for Google Gemini `gemini-embedding-001` embedding model
and correctly registers model type

MR message created with the assistance of Claude-4.5-sonnet

This resolves https://github.com/llamastack/llama-stack/issues/3755

## What does this PR do?

This PR adds support for the `gemini-embedding-001` Google embedding
model to the llama-stack Gemini provider. This model provides
high-dimensional embeddings (3072 dimensions) compared to the existing
`text-embedding-004` model (768 dimensions). Old embeddings models (such
as text-embedding-004) will be deprecated soon according to Google
([Link](https://developers.googleblog.com/en/gemini-embedding-available-gemini-api/))

## Problem

The Gemini provider only supported the `text-embedding-004` embedding
model. The newer `gemini-embedding-001` model, which provides
higher-dimensional embeddings for improved semantic representation, was
not available through llama-stack.

## Solution

This PR consists of three commits that implement, fix the model
registration, and enable embedding generation:

### Commit 1: Initial addition of gemini-embedding-001

Added metadata for `gemini-embedding-001` to the
`embedding_model_metadata` dictionary:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},
    "gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # NEW
}
```

**Issue discovered:** The model was not being registered correctly
because the dictionary keys didn't match the model IDs returned by
Gemini's API.

### Commit 2: Fix model ID matching with `models/` prefix

Updated both dictionary keys to include the `models/` prefix to match
Gemini's OpenAI-compatible API response format:

```python
embedding_model_metadata: dict[str, dict[str, int]] = {
    "models/text-embedding-004": {"embedding_dimension": 768, "context_length": 2048},      # UPDATED
    "models/gemini-embedding-001": {"embedding_dimension": 3072, "context_length": 2048},  # UPDATED
}
```

**Root cause:** Gemini's OpenAI-compatible API returns model IDs with
the `models/` prefix (e.g., `models/text-embedding-004`). The
`OpenAIMixin.list_models()` method directly matches these IDs against
the `embedding_model_metadata` dictionary keys. Without the prefix, the
models were being registered as LLMs instead of embedding models.

### Commit 3: Fix embedding generation for providers without usage stats

Fixed a bug in `OpenAIMixin.openai_embeddings()` that prevented
embedding generation for providers (like Gemini) that don't return usage
statistics:

```python
# Before (Line 351-354):
usage = OpenAIEmbeddingUsage(
    prompt_tokens=response.usage.prompt_tokens,  # ← Crashed with AttributeError
    total_tokens=response.usage.total_tokens,
)

# After (Lines 351-362):
if response.usage:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=response.usage.prompt_tokens,
        total_tokens=response.usage.total_tokens,
    )
else:
    usage = OpenAIEmbeddingUsage(
        prompt_tokens=0,  # Default when not provided
        total_tokens=0,   # Default when not provided
    )
```

**Impact:** This fix enables embedding generation for **all** Gemini
embedding models, not just the newly added one.

## Changes

### Modified Files

**`llama_stack/providers/remote/inference/gemini/gemini.py`**
- Line 17: Updated `text-embedding-004` key to
`models/text-embedding-004`
- Line 18: Added `models/gemini-embedding-001` with correct metadata

**`llama_stack/providers/utils/inference/openai_mixin.py`**
- Lines 351-362: Added null check for `response.usage` to handle
providers without usage statistics

## Key Technical Details

### Model ID Matching Flow

1. `list_provider_model_ids()` calls Gemini's `/v1/models` endpoint
2. API returns model IDs like: `models/text-embedding-004`,
`models/gemini-embedding-001`
3. `OpenAIMixin.list_models()` (line 410) checks: `if metadata :=
self.embedding_model_metadata.get(provider_model_id)`
4. If matched, registers as `model_type: "embedding"` with metadata;
otherwise registers as `model_type: "llm"`

### Why Both Keys Needed the Prefix

The `text-embedding-004` model was already working because there was
likely separate configuration or manual registration handling it. For
auto-discovery to work correctly for **both** models, both keys must
match the API's model ID format exactly.

## How to test this PR

Verified the changes by:

1. **Model Auto-Discovery**: Started llama-stack server and confirmed
models are auto-discovered from Gemini API

2. **Model Registration**: Confirmed both embedding models are correctly
registered and visible
```bash
curl http://localhost:8325/v1/models | jq '.data[] | select(.provider_id == "gemini" and .model_type == "embedding")'
```

**Results:**
-  `gemini/models/text-embedding-004` - 768 dimensions - `model_type:
"embedding"`
-  `gemini/models/gemini-embedding-001` - 3072 dimensions -
`model_type: "embedding"`

3. **Before Fix (Commit 1)**: Models appeared as `model_type: "llm"`
without embedding metadata

4. **After Fix (Commit 2)**: Models correctly identified as `model_type:
"embedding"` with proper metadata

5. **Generate Embeddings**: Verified embedding generation works
```bash
curl -X POST http://localhost:8325/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini/models/gemini-embedding-001", "input": "test"}' | \
  jq '.data[0].embedding | length'
```
2025-10-15 12:22:10 -04:00
..
apis chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs (#3740) 2025-10-14 13:48:40 -07:00
cli chore!: remove model mgmt from CLI for Hugging Face CLI (#3700) 2025-10-09 16:50:33 -07:00
core feat: Enable setting a default embedding model in the stack (#3803) 2025-10-14 18:25:13 -07:00
distributions refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183) 2025-10-14 10:44:20 -04:00
models chore: remove dead code (#3729) 2025-10-07 20:26:02 -07:00
providers feat(gemini): Support gemini-embedding-001 and fix models/ prefix in metadata keys (#3813) 2025-10-15 12:22:10 -04:00
strong_typing chore: refactor (chat)completions endpoints to use shared params struct (#3761) 2025-10-10 15:46:34 -07:00
testing fix(testing): improve api_recorder error messages for missing recordings (#3760) 2025-10-09 15:04:16 -07:00
ui chore(ui-deps): bump lucide-react from 0.542.0 to 0.545.0 in /llama_stack/ui (#3788) 2025-10-11 21:40:48 -04:00
__init__.py chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py feat: Add support for Conversations in Responses API (#3743) 2025-10-10 11:57:40 -07:00
schema_utils.py fix(auth): allow unauthenticated access to health and version endpoints (#3736) 2025-10-10 13:41:43 -07:00