From 6147321083cc0ecd0d4314eb83c8e55add538d45 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Juan=20P=C3=A9rez=20de=20Algaba?= <124347725+jperezdealgaba@users.noreply.github.com> Date: Sun, 9 Nov 2025 06:05:00 +0100 Subject: [PATCH 1/5] fix: Vector store persistence across server restarts (#3977) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # What does this PR do? This PR fixes a bug in LlamaStack 0.3.0 where vector stores created via the OpenAI-compatible API (`POST /v1/vector_stores`) would fail with `VectorStoreNotFoundError` after server restart when attempting operations like `vector_io.insert()` or `vector_io.query()`. The bug affected **6 vector IO providers**: `pgvector`, `sqlite_vec`, `chroma`, `milvus`, `qdrant`, and `weaviate`. Created with the assistance of: claude-4.5-sonnet ## Root Cause All affected providers had a broken `_get_and_cache_vector_store_index()` method that: 1. Did not load existing vector stores from persistent storage during initialization 2. Attempted to use `vector_store_table` (which was either `None` or a `KVStore` without the required `get_vector_store()` method) 3. Could not reload vector stores after server restart or cache miss ## Solution This PR implements a consistent pattern across all 6 providers: 1. **Load vector stores during initialization** - Pre-populate the cache from KV store on startup 2. **Fix lazy loading** - Modified `_get_and_cache_vector_store_index()` to load directly from KV store instead of relying on `vector_store_table` 3. **Remove broken dependency** - Eliminated reliance on the `vector_store_table` pattern ## Testing steps ### 1.1 Configure the stack Create or use an existing configuration with a vector IO provider. **Example `run.yaml`:** ```yaml vector_io_store: - provider_id: pgvector provider_type: remote::pgvector config: host: localhost port: 5432 db: llamastack user: llamastack password: llamastack inference: - provider_id: sentence-transformers provider_type: inline::sentence-transformers config: model: sentence-transformers/all-MiniLM-L6-v2 ``` ### 1.2 Start the server ```bash llama stack run run.yaml --port 5000 ``` Wait for the server to fully start. You should see: ``` INFO: Started server process INFO: Application startup complete ``` --- ## Step 2: Create a Vector Store ### 2.1 Create via API ```bash curl -X POST http://localhost:5000/v1/vector_stores \ -H "Content-Type: application/json" \ -d '{ "name": "test-persistence-store", "extra_body": { "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384, "provider_id": "pgvector" } }' | jq ``` ### 2.2 Expected Response ```json { "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d", "object": "vector_store", "name": "test-persistence-store", "status": "completed", "created_at": 1730304000, "file_counts": { "total": 0, "completed": 0, "in_progress": 0, "failed": 0, "cancelled": 0 }, "usage_bytes": 0 } ``` **Save the `id` field** (e.g., `vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`) — you’ll need it for the next steps. --- ## Step 3: Insert Data (Before Restart) ### 3.1 Insert chunks into the vector store ```bash export VS_ID="vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d" curl -X POST http://localhost:5000/vector-io/insert \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"chunks\": [ { \"content\": \"Python is a high-level programming language known for its readability.\", \"metadata\": {\"source\": \"doc1\", \"page\": 1} }, { \"content\": \"Machine learning enables computers to learn from data without explicit programming.\", \"metadata\": {\"source\": \"doc2\", \"page\": 1} }, { \"content\": \"Neural networks are inspired by biological neurons in the brain.\", \"metadata\": {\"source\": \"doc3\", \"page\": 1} } ] }" ``` ### 3.2 Expected Response Status: **200 OK** Response: *Empty or success confirmation* --- ## Step 4: Query Data (Before Restart – Baseline) ### 4.1 Query the vector store ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"What is machine learning?\" }" | jq ``` ### 4.2 Expected Response ```json { "chunks": [ { "content": "Machine learning enables computers to learn from data without explicit programming.", "metadata": {"source": "doc2", "page": 1} }, { "content": "Neural networks are inspired by biological neurons in the brain.", "metadata": {"source": "doc3", "page": 1} } ], "scores": [0.85, 0.72] } ``` **Checkpoint:** Works correctly before restart. --- ## Step 5: Restart the Server (Critical Test) ### 5.1 Stop the server In the terminal where it’s running: ``` Ctrl + C ``` Wait for: ``` Shutting down... ``` ### 5.2 Restart the server ```bash llama stack run run.yaml --port 5000 ``` Wait for: ``` INFO: Started server process INFO: Application startup complete ``` The vector store cache is now empty, but data should persist. --- ## Step 6: Verify Vector Store Exists (After Restart) ### 6.1 List vector stores ```bash curl http://localhost:5000/v1/vector_stores | jq ``` ### 6.2 Expected Response ```json { "object": "list", "data": [ { "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d", "name": "test-persistence-store", "status": "completed" } ] } ``` **Checkpoint:** Vector store should be listed. --- ## Step 7: Insert Data (After Restart – THE BUG TEST) ### 7.1 Insert new chunks ```bash curl -X POST http://localhost:5000/vector-io/insert \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"chunks\": [ { \"content\": \"This chunk was inserted AFTER the server restart.\", \"metadata\": {\"source\": \"post-restart\", \"test\": true} } ] }" ``` ### 7.2 Expected Results **With Fix (Correct):** ``` Status: 200 OK Response: Success ``` **Without Fix (Bug):** ```json { "detail": "VectorStoreNotFoundError: Vector Store 'vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d' not found." } ``` **Critical Test:** If insertion succeeds, the fix works. --- ## Step 8: Query Data (After Restart – Verification) ### 8.1 Query all data ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"restart\" }" | jq ``` ### 8.2 Expected Response ```json { "chunks": [ { "content": "This chunk was inserted AFTER the server restart.", "metadata": {"source": "post-restart", "test": true} } ], "scores": [0.95] } ``` **Checkpoint:** Both old and new data are queryable. --- ## Step 9: Multiple Restart Test (Extra Verification) ### 9.1 Restart again ```bash Ctrl + C llama stack run run.yaml --port 5000 ``` ### 9.2 Query after restart ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"programming\" }" | jq ``` **Expected:** Works correctly across multiple restarts. --------- Co-authored-by: Francisco Arceo --- .../providers/inline/vector_io/faiss/faiss.py | 27 +++++- .../inline/vector_io/sqlite_vec/sqlite_vec.py | 21 ++++- .../remote/vector_io/chroma/chroma.py | 12 ++- .../remote/vector_io/milvus/milvus.py | 13 ++- .../remote/vector_io/pgvector/pgvector.py | 40 ++++++-- .../remote/vector_io/qdrant/qdrant.py | 17 ++-- .../remote/vector_io/weaviate/weaviate.py | 13 ++- .../test_vector_io_openai_vector_stores.py | 93 +++++++++++++++++++ 8 files changed, 203 insertions(+), 33 deletions(-) diff --git a/src/llama_stack/providers/inline/vector_io/faiss/faiss.py b/src/llama_stack/providers/inline/vector_io/faiss/faiss.py index b01eb1b5c..96760b834 100644 --- a/src/llama_stack/providers/inline/vector_io/faiss/faiss.py +++ b/src/llama_stack/providers/inline/vector_io/faiss/faiss.py @@ -223,7 +223,8 @@ class FaissVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoco return HealthResponse(status=HealthStatus.ERROR, message=f"Health check failed: {str(e)}") async def register_vector_store(self, vector_store: VectorStore) -> None: - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before registering vector stores.") key = f"{VECTOR_DBS_PREFIX}{vector_store.identifier}" await self.kvstore.set(key=key, value=vector_store.model_dump_json()) @@ -239,7 +240,8 @@ class FaissVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoco return [i.vector_store for i in self.cache.values()] async def unregister_vector_store(self, vector_store_id: str) -> None: - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before unregistering vector stores.") if vector_store_id not in self.cache: return @@ -248,6 +250,27 @@ class FaissVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoco del self.cache[vector_store_id] await self.kvstore.delete(f"{VECTOR_DBS_PREFIX}{vector_store_id}") + async def _get_and_cache_vector_store_index(self, vector_store_id: str) -> VectorStoreWithIndex | None: + if vector_store_id in self.cache: + return self.cache[vector_store_id] + + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: + raise VectorStoreNotFoundError(vector_store_id) + + vector_store = VectorStore.model_validate_json(vector_store_data) + index = VectorStoreWithIndex( + vector_store=vector_store, + index=await FaissIndex.create(vector_store.embedding_dimension, self.kvstore, vector_store.identifier), + inference_api=self.inference_api, + ) + self.cache[vector_store_id] = index + return index + async def insert_chunks(self, vector_store_id: str, chunks: list[Chunk], ttl_seconds: int | None = None) -> None: index = self.cache.get(vector_store_id) if index is None: diff --git a/src/llama_stack/providers/inline/vector_io/sqlite_vec/sqlite_vec.py b/src/llama_stack/providers/inline/vector_io/sqlite_vec/sqlite_vec.py index 9cf7d8f44..399800d3e 100644 --- a/src/llama_stack/providers/inline/vector_io/sqlite_vec/sqlite_vec.py +++ b/src/llama_stack/providers/inline/vector_io/sqlite_vec/sqlite_vec.py @@ -412,6 +412,14 @@ class SQLiteVecVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresPro return [v.vector_store for v in self.cache.values()] async def register_vector_store(self, vector_store: VectorStore) -> None: + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before registering vector stores.") + + # Save to kvstore for persistence + key = f"{VECTOR_DBS_PREFIX}{vector_store.identifier}" + await self.kvstore.set(key=key, value=vector_store.model_dump_json()) + + # Create and cache the index index = await SQLiteVecIndex.create( vector_store.embedding_dimension, self.config.db_path, vector_store.identifier ) @@ -421,13 +429,16 @@ class SQLiteVecVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresPro if vector_store_id in self.cache: return self.cache[vector_store_id] - if self.vector_store_table is None: - raise VectorStoreNotFoundError(vector_store_id) - - vector_store = self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise VectorStoreNotFoundError(vector_store_id) + vector_store = VectorStore.model_validate_json(vector_store_data) index = VectorStoreWithIndex( vector_store=vector_store, index=SQLiteVecIndex( diff --git a/src/llama_stack/providers/remote/vector_io/chroma/chroma.py b/src/llama_stack/providers/remote/vector_io/chroma/chroma.py index a4fd15f77..97e2244b8 100644 --- a/src/llama_stack/providers/remote/vector_io/chroma/chroma.py +++ b/src/llama_stack/providers/remote/vector_io/chroma/chroma.py @@ -131,7 +131,6 @@ class ChromaVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoc async def initialize(self) -> None: self.kvstore = await kvstore_impl(self.config.persistence) - self.vector_store_table = self.kvstore if isinstance(self.config, RemoteChromaVectorIOConfig): log.info(f"Connecting to Chroma server at: {self.config.url}") @@ -190,9 +189,16 @@ class ChromaVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoc if vector_store_id in self.cache: return self.cache[vector_store_id] - vector_store = await self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise ValueError(f"Vector DB {vector_store_id} not found in Llama Stack") + + vector_store = VectorStore.model_validate_json(vector_store_data) collection = await maybe_await(self.client.get_collection(vector_store_id)) if not collection: raise ValueError(f"Vector DB {vector_store_id} not found in Chroma") diff --git a/src/llama_stack/providers/remote/vector_io/milvus/milvus.py b/src/llama_stack/providers/remote/vector_io/milvus/milvus.py index ace9ab1c4..73339b5be 100644 --- a/src/llama_stack/providers/remote/vector_io/milvus/milvus.py +++ b/src/llama_stack/providers/remote/vector_io/milvus/milvus.py @@ -328,13 +328,16 @@ class MilvusVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoc if vector_store_id in self.cache: return self.cache[vector_store_id] - if self.vector_store_table is None: - raise VectorStoreNotFoundError(vector_store_id) - - vector_store = await self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise VectorStoreNotFoundError(vector_store_id) + vector_store = VectorStore.model_validate_json(vector_store_data) index = VectorStoreWithIndex( vector_store=vector_store, index=MilvusIndex(client=self.client, collection_name=vector_store.identifier, kvstore=self.kvstore), diff --git a/src/llama_stack/providers/remote/vector_io/pgvector/pgvector.py b/src/llama_stack/providers/remote/vector_io/pgvector/pgvector.py index 29cfd673f..cf10a0e01 100644 --- a/src/llama_stack/providers/remote/vector_io/pgvector/pgvector.py +++ b/src/llama_stack/providers/remote/vector_io/pgvector/pgvector.py @@ -368,6 +368,22 @@ class PGVectorVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProt log.exception("Could not connect to PGVector database server") raise RuntimeError("Could not connect to PGVector database server") from e + # Load existing vector stores from KV store into cache + start_key = VECTOR_DBS_PREFIX + end_key = f"{VECTOR_DBS_PREFIX}\xff" + stored_vector_stores = await self.kvstore.values_in_range(start_key, end_key) + for vector_store_data in stored_vector_stores: + vector_store = VectorStore.model_validate_json(vector_store_data) + pgvector_index = PGVectorIndex( + vector_store=vector_store, + dimension=vector_store.embedding_dimension, + conn=self.conn, + kvstore=self.kvstore, + ) + await pgvector_index.initialize() + index = VectorStoreWithIndex(vector_store, index=pgvector_index, inference_api=self.inference_api) + self.cache[vector_store.identifier] = index + async def shutdown(self) -> None: if self.conn is not None: self.conn.close() @@ -377,7 +393,13 @@ class PGVectorVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProt async def register_vector_store(self, vector_store: VectorStore) -> None: # Persist vector DB metadata in the KV store - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before registering vector stores.") + + # Save to kvstore for persistence + key = f"{VECTOR_DBS_PREFIX}{vector_store.identifier}" + await self.kvstore.set(key=key, value=vector_store.model_dump_json()) + # Upsert model metadata in Postgres upsert_models(self.conn, [(vector_store.identifier, vector_store)]) @@ -396,7 +418,8 @@ class PGVectorVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProt del self.cache[vector_store_id] # Delete vector DB metadata from KV store - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before unregistering vector stores.") await self.kvstore.delete(key=f"{VECTOR_DBS_PREFIX}{vector_store_id}") async def insert_chunks(self, vector_store_id: str, chunks: list[Chunk], ttl_seconds: int | None = None) -> None: @@ -413,13 +436,16 @@ class PGVectorVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProt if vector_store_id in self.cache: return self.cache[vector_store_id] - if self.vector_store_table is None: - raise VectorStoreNotFoundError(vector_store_id) - - vector_store = await self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise VectorStoreNotFoundError(vector_store_id) + vector_store = VectorStore.model_validate_json(vector_store_data) index = PGVectorIndex(vector_store, vector_store.embedding_dimension, self.conn) await index.initialize() self.cache[vector_store_id] = VectorStoreWithIndex(vector_store, index, self.inference_api) diff --git a/src/llama_stack/providers/remote/vector_io/qdrant/qdrant.py b/src/llama_stack/providers/remote/vector_io/qdrant/qdrant.py index 266e9bf58..7d17c5591 100644 --- a/src/llama_stack/providers/remote/vector_io/qdrant/qdrant.py +++ b/src/llama_stack/providers/remote/vector_io/qdrant/qdrant.py @@ -183,7 +183,8 @@ class QdrantVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoc await super().shutdown() async def register_vector_store(self, vector_store: VectorStore) -> None: - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before registering vector stores.") key = f"{VECTOR_DBS_PREFIX}{vector_store.identifier}" await self.kvstore.set(key=key, value=vector_store.model_dump_json()) @@ -200,20 +201,24 @@ class QdrantVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, VectorStoresProtoc await self.cache[vector_store_id].index.delete() del self.cache[vector_store_id] - assert self.kvstore is not None + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") await self.kvstore.delete(f"{VECTOR_DBS_PREFIX}{vector_store_id}") async def _get_and_cache_vector_store_index(self, vector_store_id: str) -> VectorStoreWithIndex | None: if vector_store_id in self.cache: return self.cache[vector_store_id] - if self.vector_store_table is None: - raise ValueError(f"Vector DB not found {vector_store_id}") + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") - vector_store = await self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise VectorStoreNotFoundError(vector_store_id) + vector_store = VectorStore.model_validate_json(vector_store_data) index = VectorStoreWithIndex( vector_store=vector_store, index=QdrantIndex(client=self.client, collection_name=vector_store.identifier), diff --git a/src/llama_stack/providers/remote/vector_io/weaviate/weaviate.py b/src/llama_stack/providers/remote/vector_io/weaviate/weaviate.py index 7813f6e5c..d200662da 100644 --- a/src/llama_stack/providers/remote/vector_io/weaviate/weaviate.py +++ b/src/llama_stack/providers/remote/vector_io/weaviate/weaviate.py @@ -346,13 +346,16 @@ class WeaviateVectorIOAdapter(OpenAIVectorStoreMixin, VectorIO, NeedsRequestProv if vector_store_id in self.cache: return self.cache[vector_store_id] - if self.vector_store_table is None: - raise VectorStoreNotFoundError(vector_store_id) - - vector_store = await self.vector_store_table.get_vector_store(vector_store_id) - if not vector_store: + # Try to load from kvstore + if self.kvstore is None: + raise RuntimeError("KVStore not initialized. Call initialize() before using vector stores.") + + key = f"{VECTOR_DBS_PREFIX}{vector_store_id}" + vector_store_data = await self.kvstore.get(key) + if not vector_store_data: raise VectorStoreNotFoundError(vector_store_id) + vector_store = VectorStore.model_validate_json(vector_store_data) client = self._get_client() sanitized_collection_name = sanitize_collection_name(vector_store.identifier, weaviate_format=True) if not client.collections.exists(sanitized_collection_name): diff --git a/tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py b/tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py index 642a7c51f..121623e1b 100644 --- a/tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py +++ b/tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py @@ -92,6 +92,99 @@ async def test_persistence_across_adapter_restarts(vector_io_adapter): await vector_io_adapter.shutdown() +async def test_vector_store_lazy_loading_from_kvstore(vector_io_adapter): + """ + Test that vector stores can be lazy-loaded from KV store when not in cache. + + Verifies that clearing the cache doesn't break vector store access - they + can be loaded on-demand from persistent storage. + """ + await vector_io_adapter.initialize() + + vector_store_id = f"lazy_load_test_{np.random.randint(1e6)}" + vector_store = VectorStore( + identifier=vector_store_id, + provider_id="test_provider", + embedding_model="test_model", + embedding_dimension=128, + ) + await vector_io_adapter.register_vector_store(vector_store) + assert vector_store_id in vector_io_adapter.cache + + vector_io_adapter.cache.clear() + assert vector_store_id not in vector_io_adapter.cache + + loaded_index = await vector_io_adapter._get_and_cache_vector_store_index(vector_store_id) + assert loaded_index is not None + assert loaded_index.vector_store.identifier == vector_store_id + assert vector_store_id in vector_io_adapter.cache + + cached_index = await vector_io_adapter._get_and_cache_vector_store_index(vector_store_id) + assert cached_index is loaded_index + + await vector_io_adapter.shutdown() + + +async def test_vector_store_preloading_on_initialization(vector_io_adapter): + """ + Test that vector stores are preloaded from KV store during initialization. + + Verifies that after restart, all vector stores are automatically loaded into + cache and immediately accessible without requiring lazy loading. + """ + await vector_io_adapter.initialize() + + vector_store_ids = [f"preload_test_{i}_{np.random.randint(1e6)}" for i in range(3)] + for vs_id in vector_store_ids: + vector_store = VectorStore( + identifier=vs_id, + provider_id="test_provider", + embedding_model="test_model", + embedding_dimension=128, + ) + await vector_io_adapter.register_vector_store(vector_store) + + for vs_id in vector_store_ids: + assert vs_id in vector_io_adapter.cache + + await vector_io_adapter.shutdown() + await vector_io_adapter.initialize() + + for vs_id in vector_store_ids: + assert vs_id in vector_io_adapter.cache + + for vs_id in vector_store_ids: + loaded_index = await vector_io_adapter._get_and_cache_vector_store_index(vs_id) + assert loaded_index is not None + assert loaded_index.vector_store.identifier == vs_id + + await vector_io_adapter.shutdown() + + +async def test_kvstore_none_raises_runtime_error(vector_io_adapter): + """ + Test that accessing vector stores with uninitialized kvstore raises RuntimeError. + + Verifies proper RuntimeError is raised instead of assertions when kvstore is None. + """ + await vector_io_adapter.initialize() + + vector_store_id = f"kvstore_none_test_{np.random.randint(1e6)}" + vector_store = VectorStore( + identifier=vector_store_id, + provider_id="test_provider", + embedding_model="test_model", + embedding_dimension=128, + ) + await vector_io_adapter.register_vector_store(vector_store) + + vector_io_adapter.cache.clear() + vector_io_adapter.kvstore = None + + with pytest.raises(RuntimeError, match="KVStore not initialized"): + await vector_io_adapter._get_and_cache_vector_store_index(vector_store_id) + + async def test_register_and_unregister_vector_store(vector_io_adapter): unique_id = f"foo_db_{np.random.randint(1e6)}" dummy = VectorStore( From 4341c4c2aca4842f9ef1ce27fa82d58b9f926cd2 Mon Sep 17 00:00:00 2001 From: Vaishnavi Hire Date: Mon, 10 Nov 2025 09:29:15 -0500 Subject: [PATCH 2/5] docs: Add Llama Stack Operator docs (#3983) # What does this PR do? Add documentation for llama-stack-k8s-operator under kubernetes deployment guide. Signed-off-by: Vaishnavi Hire --- docs/docs/deploying/kubernetes_deployment.mdx | 217 +++++++++++------- 1 file changed, 139 insertions(+), 78 deletions(-) diff --git a/docs/docs/deploying/kubernetes_deployment.mdx b/docs/docs/deploying/kubernetes_deployment.mdx index 8ed1e2756..48d08f0db 100644 --- a/docs/docs/deploying/kubernetes_deployment.mdx +++ b/docs/docs/deploying/kubernetes_deployment.mdx @@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem'; # Kubernetes Deployment Guide -Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS. +Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually. ## Prerequisites @@ -110,115 +110,176 @@ spec: EOF ``` -### Step 3: Configure Llama Stack +### Step 3: Install Kubernetes Operator -Update your run configuration: - -```yaml -providers: - inference: - - provider_id: vllm - provider_type: remote::vllm - config: - url: http://vllm-server.default.svc.cluster.local:8000/v1 - max_tokens: 4096 - api_token: fake -``` - -Build container image: +Install the Llama Stack Kubernetes operator to manage Llama Stack deployments: ```bash -tmp_dir=$(mktemp -d) && cat >$tmp_dir/Containerfile.llama-stack-run-k8s <-service`): + +```bash +# List services to find the service name +kubectl get services | grep llamastack + +# Port forward and test (replace SERVICE_NAME with the actual service name) +kubectl port-forward service/llamastack-vllm-service 8321:8321 +``` + +In another terminal, test the deployment: + +```bash +llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?" ``` ## Troubleshooting -**Check pod status:** +### vLLM Server Issues + +**Check vLLM pod status:** ```bash kubectl get pods -l app.kubernetes.io/name=vllm kubectl logs -l app.kubernetes.io/name=vllm ``` -**Test service connectivity:** +**Test vLLM service connectivity:** ```bash kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models ``` +### Llama Stack Server Issues + +**Check LlamaStackDistribution status:** +```bash +# Get detailed status +kubectl describe llamastackdistribution llamastack-vllm + +# Check for events +kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm +``` + +**Check operator-managed pods:** +```bash +# List all pods managed by the operator +kubectl get pods -l app.kubernetes.io/name=llama-stack + +# Check pod logs (replace POD_NAME with actual pod name) +kubectl logs -l app.kubernetes.io/name=llama-stack +``` + +**Check operator status:** +```bash +# Verify the operator is running +kubectl get pods -n llama-stack-operator-system + +# Check operator logs if issues persist +kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager +``` + +**Verify service connectivity:** +```bash +# Get the service endpoint +kubectl get svc llamastack-vllm-service + +# Test connectivity from within the cluster +kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health +``` + ## Related Resources - **[Deployment Overview](/docs/deploying/)** - Overview of deployment options - **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions - **[Configuration](/docs/distributions/configuration)** - Detailed configuration options +- **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator +- **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource. From d4ecbfd092a7502b4b3ffffbbc3df75c8c38862d Mon Sep 17 00:00:00 2001 From: ehhuang Date: Mon, 10 Nov 2025 10:16:35 -0800 Subject: [PATCH 3/5] fix(vector store)!: fix file content API (#4105) # What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI --- client-sdks/stainless/openapi.yml | 48 ++++++++----------- docs/static/llama-stack-spec.yaml | 48 ++++++++----------- docs/static/stainless-llama-stack-spec.yaml | 48 ++++++++----------- src/llama_stack/apis/vector_io/vector_io.py | 24 +++++----- src/llama_stack/core/routers/vector_io.py | 4 +- .../core/routing_tables/vector_stores.py | 4 +- .../utils/memory/openai_vector_store_mixin.py | 15 +++--- .../vector_io/test_openai_vector_stores.py | 16 +++---- 8 files changed, 93 insertions(+), 114 deletions(-) diff --git a/client-sdks/stainless/openapi.yml b/client-sdks/stainless/openapi.yml index d8159be62..adee2f086 100644 --- a/client-sdks/stainless/openapi.yml +++ b/client-sdks/stainless/openapi.yml @@ -2916,11 +2916,11 @@ paths: responses: '200': description: >- - A list of InterleavedContent representing the file contents. + A VectorStoreFileContentResponse representing the file contents. content: application/json: schema: - $ref: '#/components/schemas/VectorStoreFileContentsResponse' + $ref: '#/components/schemas/VectorStoreFileContentResponse' '400': $ref: '#/components/responses/BadRequest400' '429': @@ -10465,41 +10465,35 @@ components: title: VectorStoreContent description: >- Content item from a vector store file or search result. - VectorStoreFileContentsResponse: + VectorStoreFileContentResponse: type: object properties: - file_id: + object: type: string - description: Unique identifier for the file - filename: - type: string - description: Name of the file - attributes: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object + const: vector_store.file_content.page + default: vector_store.file_content.page description: >- - Key-value attributes associated with the file - content: + The object type, which is always `vector_store.file_content.page` + data: type: array items: $ref: '#/components/schemas/VectorStoreContent' - description: List of content items from the file + description: Parsed content of the file + has_more: + type: boolean + description: >- + Indicates if there are more content pages to fetch + next_page: + type: string + description: The token for the next page, if any additionalProperties: false required: - - file_id - - filename - - attributes - - content - title: VectorStoreFileContentsResponse + - object + - data + - has_more + title: VectorStoreFileContentResponse description: >- - Response from retrieving the contents of a vector store file. + Represents the parsed content of a vector store file. OpenaiSearchVectorStoreRequest: type: object properties: diff --git a/docs/static/llama-stack-spec.yaml b/docs/static/llama-stack-spec.yaml index ea7fd6eec..72600bf13 100644 --- a/docs/static/llama-stack-spec.yaml +++ b/docs/static/llama-stack-spec.yaml @@ -2913,11 +2913,11 @@ paths: responses: '200': description: >- - A list of InterleavedContent representing the file contents. + A VectorStoreFileContentResponse representing the file contents. content: application/json: schema: - $ref: '#/components/schemas/VectorStoreFileContentsResponse' + $ref: '#/components/schemas/VectorStoreFileContentResponse' '400': $ref: '#/components/responses/BadRequest400' '429': @@ -9749,41 +9749,35 @@ components: title: VectorStoreContent description: >- Content item from a vector store file or search result. - VectorStoreFileContentsResponse: + VectorStoreFileContentResponse: type: object properties: - file_id: + object: type: string - description: Unique identifier for the file - filename: - type: string - description: Name of the file - attributes: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object + const: vector_store.file_content.page + default: vector_store.file_content.page description: >- - Key-value attributes associated with the file - content: + The object type, which is always `vector_store.file_content.page` + data: type: array items: $ref: '#/components/schemas/VectorStoreContent' - description: List of content items from the file + description: Parsed content of the file + has_more: + type: boolean + description: >- + Indicates if there are more content pages to fetch + next_page: + type: string + description: The token for the next page, if any additionalProperties: false required: - - file_id - - filename - - attributes - - content - title: VectorStoreFileContentsResponse + - object + - data + - has_more + title: VectorStoreFileContentResponse description: >- - Response from retrieving the contents of a vector store file. + Represents the parsed content of a vector store file. OpenaiSearchVectorStoreRequest: type: object properties: diff --git a/docs/static/stainless-llama-stack-spec.yaml b/docs/static/stainless-llama-stack-spec.yaml index d8159be62..adee2f086 100644 --- a/docs/static/stainless-llama-stack-spec.yaml +++ b/docs/static/stainless-llama-stack-spec.yaml @@ -2916,11 +2916,11 @@ paths: responses: '200': description: >- - A list of InterleavedContent representing the file contents. + A VectorStoreFileContentResponse representing the file contents. content: application/json: schema: - $ref: '#/components/schemas/VectorStoreFileContentsResponse' + $ref: '#/components/schemas/VectorStoreFileContentResponse' '400': $ref: '#/components/responses/BadRequest400' '429': @@ -10465,41 +10465,35 @@ components: title: VectorStoreContent description: >- Content item from a vector store file or search result. - VectorStoreFileContentsResponse: + VectorStoreFileContentResponse: type: object properties: - file_id: + object: type: string - description: Unique identifier for the file - filename: - type: string - description: Name of the file - attributes: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object + const: vector_store.file_content.page + default: vector_store.file_content.page description: >- - Key-value attributes associated with the file - content: + The object type, which is always `vector_store.file_content.page` + data: type: array items: $ref: '#/components/schemas/VectorStoreContent' - description: List of content items from the file + description: Parsed content of the file + has_more: + type: boolean + description: >- + Indicates if there are more content pages to fetch + next_page: + type: string + description: The token for the next page, if any additionalProperties: false required: - - file_id - - filename - - attributes - - content - title: VectorStoreFileContentsResponse + - object + - data + - has_more + title: VectorStoreFileContentResponse description: >- - Response from retrieving the contents of a vector store file. + Represents the parsed content of a vector store file. OpenaiSearchVectorStoreRequest: type: object properties: diff --git a/src/llama_stack/apis/vector_io/vector_io.py b/src/llama_stack/apis/vector_io/vector_io.py index 26c961db3..846c6f191 100644 --- a/src/llama_stack/apis/vector_io/vector_io.py +++ b/src/llama_stack/apis/vector_io/vector_io.py @@ -396,19 +396,19 @@ class VectorStoreListFilesResponse(BaseModel): @json_schema_type -class VectorStoreFileContentsResponse(BaseModel): - """Response from retrieving the contents of a vector store file. +class VectorStoreFileContentResponse(BaseModel): + """Represents the parsed content of a vector store file. - :param file_id: Unique identifier for the file - :param filename: Name of the file - :param attributes: Key-value attributes associated with the file - :param content: List of content items from the file + :param object: The object type, which is always `vector_store.file_content.page` + :param data: Parsed content of the file + :param has_more: Indicates if there are more content pages to fetch + :param next_page: The token for the next page, if any """ - file_id: str - filename: str - attributes: dict[str, Any] - content: list[VectorStoreContent] + object: Literal["vector_store.file_content.page"] = "vector_store.file_content.page" + data: list[VectorStoreContent] + has_more: bool + next_page: str | None = None @json_schema_type @@ -732,12 +732,12 @@ class VectorIO(Protocol): self, vector_store_id: str, file_id: str, - ) -> VectorStoreFileContentsResponse: + ) -> VectorStoreFileContentResponse: """Retrieves the contents of a vector store file. :param vector_store_id: The ID of the vector store containing the file to retrieve. :param file_id: The ID of the file to retrieve. - :returns: A list of InterleavedContent representing the file contents. + :returns: A VectorStoreFileContentResponse representing the file contents. """ ... diff --git a/src/llama_stack/core/routers/vector_io.py b/src/llama_stack/core/routers/vector_io.py index b54217619..9dac461db 100644 --- a/src/llama_stack/core/routers/vector_io.py +++ b/src/llama_stack/core/routers/vector_io.py @@ -24,7 +24,7 @@ from llama_stack.apis.vector_io import ( VectorStoreChunkingStrategyStaticConfig, VectorStoreDeleteResponse, VectorStoreFileBatchObject, - VectorStoreFileContentsResponse, + VectorStoreFileContentResponse, VectorStoreFileDeleteResponse, VectorStoreFileObject, VectorStoreFilesListInBatchResponse, @@ -338,7 +338,7 @@ class VectorIORouter(VectorIO): self, vector_store_id: str, file_id: str, - ) -> VectorStoreFileContentsResponse: + ) -> VectorStoreFileContentResponse: logger.debug(f"VectorIORouter.openai_retrieve_vector_store_file_contents: {vector_store_id}, {file_id}") provider = await self.routing_table.get_provider_impl(vector_store_id) return await provider.openai_retrieve_vector_store_file_contents( diff --git a/src/llama_stack/core/routing_tables/vector_stores.py b/src/llama_stack/core/routing_tables/vector_stores.py index c6c80a01e..f95a4dbe3 100644 --- a/src/llama_stack/core/routing_tables/vector_stores.py +++ b/src/llama_stack/core/routing_tables/vector_stores.py @@ -15,7 +15,7 @@ from llama_stack.apis.vector_io.vector_io import ( SearchRankingOptions, VectorStoreChunkingStrategy, VectorStoreDeleteResponse, - VectorStoreFileContentsResponse, + VectorStoreFileContentResponse, VectorStoreFileDeleteResponse, VectorStoreFileObject, VectorStoreFileStatus, @@ -195,7 +195,7 @@ class VectorStoresRoutingTable(CommonRoutingTableImpl): self, vector_store_id: str, file_id: str, - ) -> VectorStoreFileContentsResponse: + ) -> VectorStoreFileContentResponse: await self.assert_action_allowed("read", "vector_store", vector_store_id) provider = await self.get_provider_impl(vector_store_id) return await provider.openai_retrieve_vector_store_file_contents( diff --git a/src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py b/src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py index d047d9d12..86e6ea013 100644 --- a/src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py +++ b/src/llama_stack/providers/utils/memory/openai_vector_store_mixin.py @@ -30,7 +30,7 @@ from llama_stack.apis.vector_io import ( VectorStoreContent, VectorStoreDeleteResponse, VectorStoreFileBatchObject, - VectorStoreFileContentsResponse, + VectorStoreFileContentResponse, VectorStoreFileCounts, VectorStoreFileDeleteResponse, VectorStoreFileLastError, @@ -921,22 +921,21 @@ class OpenAIVectorStoreMixin(ABC): self, vector_store_id: str, file_id: str, - ) -> VectorStoreFileContentsResponse: + ) -> VectorStoreFileContentResponse: """Retrieves the contents of a vector store file.""" if vector_store_id not in self.openai_vector_stores: raise VectorStoreNotFoundError(vector_store_id) - file_info = await self._load_openai_vector_store_file(vector_store_id, file_id) dict_chunks = await self._load_openai_vector_store_file_contents(vector_store_id, file_id) chunks = [Chunk.model_validate(c) for c in dict_chunks] content = [] for chunk in chunks: content.extend(self._chunk_to_vector_store_content(chunk)) - return VectorStoreFileContentsResponse( - file_id=file_id, - filename=file_info.get("filename", ""), - attributes=file_info.get("attributes", {}), - content=content, + return VectorStoreFileContentResponse( + object="vector_store.file_content.page", + data=content, + has_more=False, + next_page=None, ) async def openai_update_vector_store_file( diff --git a/tests/integration/vector_io/test_openai_vector_stores.py b/tests/integration/vector_io/test_openai_vector_stores.py index 97ce4abe8..20f9d2978 100644 --- a/tests/integration/vector_io/test_openai_vector_stores.py +++ b/tests/integration/vector_io/test_openai_vector_stores.py @@ -907,16 +907,16 @@ def test_openai_vector_store_retrieve_file_contents( ) assert file_contents is not None - assert len(file_contents.content) == 1 - content = file_contents.content[0] + assert file_contents.object == "vector_store.file_content.page" + assert len(file_contents.data) == 1 + content = file_contents.data[0] # llama-stack-client returns a model, openai-python is a badboy and returns a dict if not isinstance(content, dict): content = content.model_dump() assert content["type"] == "text" assert content["text"] == test_content.decode("utf-8") - assert file_contents.filename == file_name - assert file_contents.attributes == attributes + assert file_contents.has_more is False @vector_provider_wrapper @@ -1483,14 +1483,12 @@ def test_openai_vector_store_file_batch_retrieve_contents( ) assert file_contents is not None - assert file_contents.filename == file_data[i][0] - assert len(file_contents.content) > 0 + assert file_contents.object == "vector_store.file_content.page" + assert len(file_contents.data) > 0 # Verify the content matches what we uploaded content_text = ( - file_contents.content[0].text - if hasattr(file_contents.content[0], "text") - else file_contents.content[0]["text"] + file_contents.data[0].text if hasattr(file_contents.data[0], "text") else file_contents.data[0]["text"] ) assert file_data[i][1].decode("utf-8") in content_text From fadf17daf37c1518a5b05adf56bc0939453c0a6e Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Mon, 10 Nov 2025 10:36:33 -0800 Subject: [PATCH 4/5] feat(api)!: deprecate register/unregister resource APIs (#4099) Mark all register_* / unregister_* APIs as deprecated across models, shields, tool groups, datasets, benchmarks, and scoring functions. This is the first step toward moving resource mutations to an `/admin` namespace as outlined in https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585. The deprecation flag will be reflected in the OpenAPI schema to warn API users that these endpoints are being phased out. Next step will be implementing the `/admin` route namespace for these resource management operations. - `register_model` / `unregister_model` - `register_shield` / `unregister_shield` - `register_tool_group` / `unregister_toolgroup` - `register_dataset` / `unregister_dataset` - `register_benchmark` / `unregister_benchmark` - `register_scoring_function` / `unregister_scoring_function` --- client-sdks/stainless/openapi.yml | 603 ++------- docs/static/deprecated-llama-stack-spec.yaml | 1094 ++++++++++++++++- .../static/experimental-llama-stack-spec.yaml | 214 ++-- docs/static/llama-stack-spec.yaml | 389 +----- docs/static/stainless-llama-stack-spec.yaml | 603 ++------- src/llama_stack/apis/benchmarks/benchmarks.py | 4 +- src/llama_stack/apis/datasets/datasets.py | 4 +- src/llama_stack/apis/models/models.py | 4 +- .../scoring_functions/scoring_functions.py | 6 +- src/llama_stack/apis/shields/shields.py | 4 +- src/llama_stack/apis/tools/tools.py | 4 +- 11 files changed, 1454 insertions(+), 1475 deletions(-) diff --git a/client-sdks/stainless/openapi.yml b/client-sdks/stainless/openapi.yml index adee2f086..2b9849535 100644 --- a/client-sdks/stainless/openapi.yml +++ b/client-sdks/stainless/openapi.yml @@ -998,39 +998,6 @@ paths: description: List models using the OpenAI API. parameters: [] deprecated: false - post: - responses: - '200': - description: A Model. - content: - application/json: - schema: - $ref: '#/components/schemas/Model' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Register model. - description: >- - Register model. - - Register a model. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterModelRequest' - required: true - deprecated: false /v1/models/{model_id}: get: responses: @@ -1065,36 +1032,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Unregister model. - description: >- - Unregister model. - - Unregister a model. - parameters: - - name: model_id - in: path - description: >- - The identifier of the model to unregister. - required: true - schema: - type: string - deprecated: false /v1/moderations: post: responses: @@ -1725,32 +1662,6 @@ paths: description: List all scoring functions. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Register a scoring function. - description: Register a scoring function. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterScoringFunctionRequest' - required: true - deprecated: false /v1/scoring-functions/{scoring_fn_id}: get: responses: @@ -1782,33 +1693,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Unregister a scoring function. - description: Unregister a scoring function. - parameters: - - name: scoring_fn_id - in: path - description: >- - The ID of the scoring function to unregister. - required: true - schema: - type: string - deprecated: false /v1/scoring/score: post: responses: @@ -1897,36 +1781,6 @@ paths: description: List all shields. parameters: [] deprecated: false - post: - responses: - '200': - description: A Shield. - content: - application/json: - schema: - $ref: '#/components/schemas/Shield' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Register a shield. - description: Register a shield. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterShieldRequest' - required: true - deprecated: false /v1/shields/{identifier}: get: responses: @@ -1958,33 +1812,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Unregister a shield. - description: Unregister a shield. - parameters: - - name: identifier - in: path - description: >- - The identifier of the shield to unregister. - required: true - schema: - type: string - deprecated: false /v1/tool-runtime/invoke: post: responses: @@ -2080,32 +1907,6 @@ paths: description: List tool groups with optional provider. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Register a tool group. - description: Register a tool group. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterToolGroupRequest' - required: true - deprecated: false /v1/toolgroups/{toolgroup_id}: get: responses: @@ -2137,32 +1938,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Unregister a tool group. - description: Unregister a tool group. - parameters: - - name: toolgroup_id - in: path - description: The ID of the tool group to unregister. - required: true - schema: - type: string - deprecated: false /v1/tools: get: responses: @@ -3171,7 +2946,7 @@ paths: schema: $ref: '#/components/schemas/RegisterDatasetRequest' required: true - deprecated: false + deprecated: true /v1beta/datasets/{dataset_id}: get: responses: @@ -3228,7 +3003,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks: get: responses: @@ -3279,7 +3054,7 @@ paths: schema: $ref: '#/components/schemas/RegisterBenchmarkRequest' required: true - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}: get: responses: @@ -3336,7 +3111,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}/evaluations: post: responses: @@ -6280,46 +6055,6 @@ components: required: - data title: OpenAIListModelsResponse - ModelType: - type: string - enum: - - llm - - embedding - - rerank - title: ModelType - description: >- - Enumeration of supported model types in Llama Stack. - RegisterModelRequest: - type: object - properties: - model_id: - type: string - description: The identifier of the model to register. - provider_model_id: - type: string - description: >- - The identifier of the model in the provider. - provider_id: - type: string - description: The identifier of the provider. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: Any additional metadata for this model. - model_type: - $ref: '#/components/schemas/ModelType' - description: The type of model to register. - additionalProperties: false - required: - - model_id - title: RegisterModelRequest Model: type: object properties: @@ -6377,6 +6112,15 @@ components: title: Model description: >- A model resource representing an AI model registered in Llama Stack. + ModelType: + type: string + enum: + - llm + - embedding + - rerank + title: ModelType + description: >- + Enumeration of supported model types in Llama Stack. RunModerationRequest: type: object properties: @@ -9115,61 +8859,6 @@ components: required: - data title: ListScoringFunctionsResponse - ParamType: - oneOf: - - $ref: '#/components/schemas/StringType' - - $ref: '#/components/schemas/NumberType' - - $ref: '#/components/schemas/BooleanType' - - $ref: '#/components/schemas/ArrayType' - - $ref: '#/components/schemas/ObjectType' - - $ref: '#/components/schemas/JsonType' - - $ref: '#/components/schemas/UnionType' - - $ref: '#/components/schemas/ChatCompletionInputType' - - $ref: '#/components/schemas/CompletionInputType' - discriminator: - propertyName: type - mapping: - string: '#/components/schemas/StringType' - number: '#/components/schemas/NumberType' - boolean: '#/components/schemas/BooleanType' - array: '#/components/schemas/ArrayType' - object: '#/components/schemas/ObjectType' - json: '#/components/schemas/JsonType' - union: '#/components/schemas/UnionType' - chat_completion_input: '#/components/schemas/ChatCompletionInputType' - completion_input: '#/components/schemas/CompletionInputType' - RegisterScoringFunctionRequest: - type: object - properties: - scoring_fn_id: - type: string - description: >- - The ID of the scoring function to register. - description: - type: string - description: The description of the scoring function. - return_type: - $ref: '#/components/schemas/ParamType' - description: The return type of the scoring function. - provider_scoring_fn_id: - type: string - description: >- - The ID of the provider scoring function to use for the scoring function. - provider_id: - type: string - description: >- - The ID of the provider to use for the scoring function. - params: - $ref: '#/components/schemas/ScoringFnParams' - description: >- - The parameters for the scoring function for benchmark eval, these can - be overridden for app eval. - additionalProperties: false - required: - - scoring_fn_id - - description - - return_type - title: RegisterScoringFunctionRequest ScoreRequest: type: object properties: @@ -9345,35 +9034,6 @@ components: required: - data title: ListShieldsResponse - RegisterShieldRequest: - type: object - properties: - shield_id: - type: string - description: >- - The identifier of the shield to register. - provider_shield_id: - type: string - description: >- - The identifier of the shield in the provider. - provider_id: - type: string - description: The identifier of the provider. - params: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The parameters of the shield. - additionalProperties: false - required: - - shield_id - title: RegisterShieldRequest InvokeToolRequest: type: object properties: @@ -9634,37 +9294,6 @@ components: title: ListToolGroupsResponse description: >- Response containing a list of tool groups. - RegisterToolGroupRequest: - type: object - properties: - toolgroup_id: - type: string - description: The ID of the tool group to register. - provider_id: - type: string - description: >- - The ID of the provider to use for the tool group. - mcp_endpoint: - $ref: '#/components/schemas/URL' - description: >- - The MCP endpoint to use for the tool group. - args: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - A dictionary of arguments to pass to the tool group. - additionalProperties: false - required: - - toolgroup_id - - provider_id - title: RegisterToolGroupRequest Chunk: type: object properties: @@ -10810,68 +10439,6 @@ components: - data title: ListDatasetsResponse description: Response from listing datasets. - DataSource: - oneOf: - - $ref: '#/components/schemas/URIDataSource' - - $ref: '#/components/schemas/RowsDataSource' - discriminator: - propertyName: type - mapping: - uri: '#/components/schemas/URIDataSource' - rows: '#/components/schemas/RowsDataSource' - RegisterDatasetRequest: - type: object - properties: - purpose: - type: string - enum: - - post-training/messages - - eval/question-answer - - eval/messages-answer - description: >- - The purpose of the dataset. One of: - "post-training/messages": The dataset - contains a messages column with list of messages for post-training. { - "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", - "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset - contains a question column and an answer column for evaluation. { "question": - "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": - The dataset contains a messages column with list of messages and an answer - column for evaluation. { "messages": [ {"role": "user", "content": "Hello, - my name is John Doe."}, {"role": "assistant", "content": "Hello, John - Doe. How can I help you today?"}, {"role": "user", "content": "What's - my name?"}, ], "answer": "John Doe" } - source: - $ref: '#/components/schemas/DataSource' - description: >- - The data source of the dataset. Ensure that the data source schema is - compatible with the purpose of the dataset. Examples: - { "type": "uri", - "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": - "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" - } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" - } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": - "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] - } ] } - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - The metadata for the dataset. - E.g. {"description": "My dataset"}. - dataset_id: - type: string - description: >- - The ID of the dataset. If not provided, an ID will be generated. - additionalProperties: false - required: - - purpose - - source - title: RegisterDatasetRequest Benchmark: type: object properties: @@ -10939,47 +10506,6 @@ components: required: - data title: ListBenchmarksResponse - RegisterBenchmarkRequest: - type: object - properties: - benchmark_id: - type: string - description: The ID of the benchmark to register. - dataset_id: - type: string - description: >- - The ID of the dataset to use for the benchmark. - scoring_functions: - type: array - items: - type: string - description: >- - The scoring functions to use for the benchmark. - provider_benchmark_id: - type: string - description: >- - The ID of the provider benchmark to use for the benchmark. - provider_id: - type: string - description: >- - The ID of the provider to use for the benchmark. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The metadata to use for the benchmark. - additionalProperties: false - required: - - benchmark_id - - dataset_id - - scoring_functions - title: RegisterBenchmarkRequest BenchmarkConfig: type: object properties: @@ -11841,6 +11367,109 @@ components: - hyperparam_search_config - logger_config title: SupervisedFineTuneRequest + DataSource: + oneOf: + - $ref: '#/components/schemas/URIDataSource' + - $ref: '#/components/schemas/RowsDataSource' + discriminator: + propertyName: type + mapping: + uri: '#/components/schemas/URIDataSource' + rows: '#/components/schemas/RowsDataSource' + RegisterDatasetRequest: + type: object + properties: + purpose: + type: string + enum: + - post-training/messages + - eval/question-answer + - eval/messages-answer + description: >- + The purpose of the dataset. One of: - "post-training/messages": The dataset + contains a messages column with list of messages for post-training. { + "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", + "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset + contains a question column and an answer column for evaluation. { "question": + "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": + The dataset contains a messages column with list of messages and an answer + column for evaluation. { "messages": [ {"role": "user", "content": "Hello, + my name is John Doe."}, {"role": "assistant", "content": "Hello, John + Doe. How can I help you today?"}, {"role": "user", "content": "What's + my name?"}, ], "answer": "John Doe" } + source: + $ref: '#/components/schemas/DataSource' + description: >- + The data source of the dataset. Ensure that the data source schema is + compatible with the purpose of the dataset. Examples: - { "type": "uri", + "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": + "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" + } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" + } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": + "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] + } ] } + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + The metadata for the dataset. - E.g. {"description": "My dataset"}. + dataset_id: + type: string + description: >- + The ID of the dataset. If not provided, an ID will be generated. + additionalProperties: false + required: + - purpose + - source + title: RegisterDatasetRequest + RegisterBenchmarkRequest: + type: object + properties: + benchmark_id: + type: string + description: The ID of the benchmark to register. + dataset_id: + type: string + description: >- + The ID of the dataset to use for the benchmark. + scoring_functions: + type: array + items: + type: string + description: >- + The scoring functions to use for the benchmark. + provider_benchmark_id: + type: string + description: >- + The ID of the provider benchmark to use for the benchmark. + provider_id: + type: string + description: >- + The ID of the provider to use for the benchmark. + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: The metadata to use for the benchmark. + additionalProperties: false + required: + - benchmark_id + - dataset_id + - scoring_functions + title: RegisterBenchmarkRequest responses: BadRequest400: description: The request was invalid or malformed diff --git a/docs/static/deprecated-llama-stack-spec.yaml b/docs/static/deprecated-llama-stack-spec.yaml index 3bc965eb7..dea2e5bbe 100644 --- a/docs/static/deprecated-llama-stack-spec.yaml +++ b/docs/static/deprecated-llama-stack-spec.yaml @@ -13,7 +13,352 @@ info: migration reference only. servers: - url: http://any-hosted-llama-stack.com -paths: {} +paths: + /v1/models: + post: + responses: + '200': + description: A Model. + content: + application/json: + schema: + $ref: '#/components/schemas/Model' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Models + summary: Register model. + description: >- + Register model. + + Register a model. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterModelRequest' + required: true + deprecated: true + /v1/models/{model_id}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Models + summary: Unregister model. + description: >- + Unregister model. + + Unregister a model. + parameters: + - name: model_id + in: path + description: >- + The identifier of the model to unregister. + required: true + schema: + type: string + deprecated: true + /v1/scoring-functions: + post: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - ScoringFunctions + summary: Register a scoring function. + description: Register a scoring function. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterScoringFunctionRequest' + required: true + deprecated: true + /v1/scoring-functions/{scoring_fn_id}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - ScoringFunctions + summary: Unregister a scoring function. + description: Unregister a scoring function. + parameters: + - name: scoring_fn_id + in: path + description: >- + The ID of the scoring function to unregister. + required: true + schema: + type: string + deprecated: true + /v1/shields: + post: + responses: + '200': + description: A Shield. + content: + application/json: + schema: + $ref: '#/components/schemas/Shield' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Shields + summary: Register a shield. + description: Register a shield. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterShieldRequest' + required: true + deprecated: true + /v1/shields/{identifier}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Shields + summary: Unregister a shield. + description: Unregister a shield. + parameters: + - name: identifier + in: path + description: >- + The identifier of the shield to unregister. + required: true + schema: + type: string + deprecated: true + /v1/toolgroups: + post: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - ToolGroups + summary: Register a tool group. + description: Register a tool group. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterToolGroupRequest' + required: true + deprecated: true + /v1/toolgroups/{toolgroup_id}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - ToolGroups + summary: Unregister a tool group. + description: Unregister a tool group. + parameters: + - name: toolgroup_id + in: path + description: The ID of the tool group to unregister. + required: true + schema: + type: string + deprecated: true + /v1beta/datasets: + post: + responses: + '200': + description: A Dataset. + content: + application/json: + schema: + $ref: '#/components/schemas/Dataset' + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Datasets + summary: Register a new dataset. + description: Register a new dataset. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterDatasetRequest' + required: true + deprecated: true + /v1beta/datasets/{dataset_id}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Datasets + summary: Unregister a dataset by its ID. + description: Unregister a dataset by its ID. + parameters: + - name: dataset_id + in: path + description: The ID of the dataset to unregister. + required: true + schema: + type: string + deprecated: true + /v1alpha/eval/benchmarks: + post: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Benchmarks + summary: Register a benchmark. + description: Register a benchmark. + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RegisterBenchmarkRequest' + required: true + deprecated: true + /v1alpha/eval/benchmarks/{benchmark_id}: + delete: + responses: + '200': + description: OK + '400': + $ref: '#/components/responses/BadRequest400' + '429': + $ref: >- + #/components/responses/TooManyRequests429 + '500': + $ref: >- + #/components/responses/InternalServerError500 + default: + $ref: '#/components/responses/DefaultError' + tags: + - Benchmarks + summary: Unregister a benchmark. + description: Unregister a benchmark. + parameters: + - name: benchmark_id + in: path + description: The ID of the benchmark to unregister. + required: true + schema: + type: string + deprecated: true jsonSchemaDialect: >- https://json-schema.org/draft/2020-12/schema components: @@ -46,6 +391,730 @@ components: title: Error description: >- Error response from the API. Roughly follows RFC 7807. + ModelType: + type: string + enum: + - llm + - embedding + - rerank + title: ModelType + description: >- + Enumeration of supported model types in Llama Stack. + RegisterModelRequest: + type: object + properties: + model_id: + type: string + description: The identifier of the model to register. + provider_model_id: + type: string + description: >- + The identifier of the model in the provider. + provider_id: + type: string + description: The identifier of the provider. + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: Any additional metadata for this model. + model_type: + $ref: '#/components/schemas/ModelType' + description: The type of model to register. + additionalProperties: false + required: + - model_id + title: RegisterModelRequest + Model: + type: object + properties: + identifier: + type: string + description: >- + Unique identifier for this resource in llama stack + provider_resource_id: + type: string + description: >- + Unique identifier for this resource in the provider + provider_id: + type: string + description: >- + ID of the provider that owns this resource + type: + type: string + enum: + - model + - shield + - vector_store + - dataset + - scoring_function + - benchmark + - tool + - tool_group + - prompt + const: model + default: model + description: >- + The resource type, always 'model' for model resources + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: Any additional metadata for this model + model_type: + $ref: '#/components/schemas/ModelType' + default: llm + description: >- + The type of model (LLM or embedding model) + additionalProperties: false + required: + - identifier + - provider_id + - type + - metadata + - model_type + title: Model + description: >- + A model resource representing an AI model registered in Llama Stack. + AggregationFunctionType: + type: string + enum: + - average + - weighted_average + - median + - categorical_count + - accuracy + title: AggregationFunctionType + description: >- + Types of aggregation functions for scoring results. + ArrayType: + type: object + properties: + type: + type: string + const: array + default: array + description: Discriminator type. Always "array" + additionalProperties: false + required: + - type + title: ArrayType + description: Parameter type for array values. + BasicScoringFnParams: + type: object + properties: + type: + $ref: '#/components/schemas/ScoringFnParamsType' + const: basic + default: basic + description: >- + The type of scoring function parameters, always basic + aggregation_functions: + type: array + items: + $ref: '#/components/schemas/AggregationFunctionType' + description: >- + Aggregation functions to apply to the scores of each row + additionalProperties: false + required: + - type + - aggregation_functions + title: BasicScoringFnParams + description: >- + Parameters for basic scoring function configuration. + BooleanType: + type: object + properties: + type: + type: string + const: boolean + default: boolean + description: Discriminator type. Always "boolean" + additionalProperties: false + required: + - type + title: BooleanType + description: Parameter type for boolean values. + ChatCompletionInputType: + type: object + properties: + type: + type: string + const: chat_completion_input + default: chat_completion_input + description: >- + Discriminator type. Always "chat_completion_input" + additionalProperties: false + required: + - type + title: ChatCompletionInputType + description: >- + Parameter type for chat completion input. + CompletionInputType: + type: object + properties: + type: + type: string + const: completion_input + default: completion_input + description: >- + Discriminator type. Always "completion_input" + additionalProperties: false + required: + - type + title: CompletionInputType + description: Parameter type for completion input. + JsonType: + type: object + properties: + type: + type: string + const: json + default: json + description: Discriminator type. Always "json" + additionalProperties: false + required: + - type + title: JsonType + description: Parameter type for JSON values. + LLMAsJudgeScoringFnParams: + type: object + properties: + type: + $ref: '#/components/schemas/ScoringFnParamsType' + const: llm_as_judge + default: llm_as_judge + description: >- + The type of scoring function parameters, always llm_as_judge + judge_model: + type: string + description: >- + Identifier of the LLM model to use as a judge for scoring + prompt_template: + type: string + description: >- + (Optional) Custom prompt template for the judge model + judge_score_regexes: + type: array + items: + type: string + description: >- + Regexes to extract the answer from generated response + aggregation_functions: + type: array + items: + $ref: '#/components/schemas/AggregationFunctionType' + description: >- + Aggregation functions to apply to the scores of each row + additionalProperties: false + required: + - type + - judge_model + - judge_score_regexes + - aggregation_functions + title: LLMAsJudgeScoringFnParams + description: >- + Parameters for LLM-as-judge scoring function configuration. + NumberType: + type: object + properties: + type: + type: string + const: number + default: number + description: Discriminator type. Always "number" + additionalProperties: false + required: + - type + title: NumberType + description: Parameter type for numeric values. + ObjectType: + type: object + properties: + type: + type: string + const: object + default: object + description: Discriminator type. Always "object" + additionalProperties: false + required: + - type + title: ObjectType + description: Parameter type for object values. + ParamType: + oneOf: + - $ref: '#/components/schemas/StringType' + - $ref: '#/components/schemas/NumberType' + - $ref: '#/components/schemas/BooleanType' + - $ref: '#/components/schemas/ArrayType' + - $ref: '#/components/schemas/ObjectType' + - $ref: '#/components/schemas/JsonType' + - $ref: '#/components/schemas/UnionType' + - $ref: '#/components/schemas/ChatCompletionInputType' + - $ref: '#/components/schemas/CompletionInputType' + discriminator: + propertyName: type + mapping: + string: '#/components/schemas/StringType' + number: '#/components/schemas/NumberType' + boolean: '#/components/schemas/BooleanType' + array: '#/components/schemas/ArrayType' + object: '#/components/schemas/ObjectType' + json: '#/components/schemas/JsonType' + union: '#/components/schemas/UnionType' + chat_completion_input: '#/components/schemas/ChatCompletionInputType' + completion_input: '#/components/schemas/CompletionInputType' + RegexParserScoringFnParams: + type: object + properties: + type: + $ref: '#/components/schemas/ScoringFnParamsType' + const: regex_parser + default: regex_parser + description: >- + The type of scoring function parameters, always regex_parser + parsing_regexes: + type: array + items: + type: string + description: >- + Regex to extract the answer from generated response + aggregation_functions: + type: array + items: + $ref: '#/components/schemas/AggregationFunctionType' + description: >- + Aggregation functions to apply to the scores of each row + additionalProperties: false + required: + - type + - parsing_regexes + - aggregation_functions + title: RegexParserScoringFnParams + description: >- + Parameters for regex parser scoring function configuration. + ScoringFnParams: + oneOf: + - $ref: '#/components/schemas/LLMAsJudgeScoringFnParams' + - $ref: '#/components/schemas/RegexParserScoringFnParams' + - $ref: '#/components/schemas/BasicScoringFnParams' + discriminator: + propertyName: type + mapping: + llm_as_judge: '#/components/schemas/LLMAsJudgeScoringFnParams' + regex_parser: '#/components/schemas/RegexParserScoringFnParams' + basic: '#/components/schemas/BasicScoringFnParams' + ScoringFnParamsType: + type: string + enum: + - llm_as_judge + - regex_parser + - basic + title: ScoringFnParamsType + description: >- + Types of scoring function parameter configurations. + StringType: + type: object + properties: + type: + type: string + const: string + default: string + description: Discriminator type. Always "string" + additionalProperties: false + required: + - type + title: StringType + description: Parameter type for string values. + UnionType: + type: object + properties: + type: + type: string + const: union + default: union + description: Discriminator type. Always "union" + additionalProperties: false + required: + - type + title: UnionType + description: Parameter type for union values. + RegisterScoringFunctionRequest: + type: object + properties: + scoring_fn_id: + type: string + description: >- + The ID of the scoring function to register. + description: + type: string + description: The description of the scoring function. + return_type: + $ref: '#/components/schemas/ParamType' + description: The return type of the scoring function. + provider_scoring_fn_id: + type: string + description: >- + The ID of the provider scoring function to use for the scoring function. + provider_id: + type: string + description: >- + The ID of the provider to use for the scoring function. + params: + $ref: '#/components/schemas/ScoringFnParams' + description: >- + The parameters for the scoring function for benchmark eval, these can + be overridden for app eval. + additionalProperties: false + required: + - scoring_fn_id + - description + - return_type + title: RegisterScoringFunctionRequest + RegisterShieldRequest: + type: object + properties: + shield_id: + type: string + description: >- + The identifier of the shield to register. + provider_shield_id: + type: string + description: >- + The identifier of the shield in the provider. + provider_id: + type: string + description: The identifier of the provider. + params: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: The parameters of the shield. + additionalProperties: false + required: + - shield_id + title: RegisterShieldRequest + Shield: + type: object + properties: + identifier: + type: string + provider_resource_id: + type: string + provider_id: + type: string + type: + type: string + enum: + - model + - shield + - vector_store + - dataset + - scoring_function + - benchmark + - tool + - tool_group + - prompt + const: shield + default: shield + description: The resource type, always shield + params: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + (Optional) Configuration parameters for the shield + additionalProperties: false + required: + - identifier + - provider_id + - type + title: Shield + description: >- + A safety shield resource that can be used to check content. + URL: + type: object + properties: + uri: + type: string + description: The URL string pointing to the resource + additionalProperties: false + required: + - uri + title: URL + description: A URL reference to external content. + RegisterToolGroupRequest: + type: object + properties: + toolgroup_id: + type: string + description: The ID of the tool group to register. + provider_id: + type: string + description: >- + The ID of the provider to use for the tool group. + mcp_endpoint: + $ref: '#/components/schemas/URL' + description: >- + The MCP endpoint to use for the tool group. + args: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + A dictionary of arguments to pass to the tool group. + additionalProperties: false + required: + - toolgroup_id + - provider_id + title: RegisterToolGroupRequest + DataSource: + oneOf: + - $ref: '#/components/schemas/URIDataSource' + - $ref: '#/components/schemas/RowsDataSource' + discriminator: + propertyName: type + mapping: + uri: '#/components/schemas/URIDataSource' + rows: '#/components/schemas/RowsDataSource' + RowsDataSource: + type: object + properties: + type: + type: string + const: rows + default: rows + rows: + type: array + items: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + The dataset is stored in rows. E.g. - [ {"messages": [{"role": "user", + "content": "Hello, world!"}, {"role": "assistant", "content": "Hello, + world!"}]} ] + additionalProperties: false + required: + - type + - rows + title: RowsDataSource + description: A dataset stored in rows. + URIDataSource: + type: object + properties: + type: + type: string + const: uri + default: uri + uri: + type: string + description: >- + The dataset can be obtained from a URI. E.g. - "https://mywebsite.com/mydata.jsonl" + - "lsfs://mydata.jsonl" - "data:csv;base64,{base64_content}" + additionalProperties: false + required: + - type + - uri + title: URIDataSource + description: >- + A dataset that can be obtained from a URI. + RegisterDatasetRequest: + type: object + properties: + purpose: + type: string + enum: + - post-training/messages + - eval/question-answer + - eval/messages-answer + description: >- + The purpose of the dataset. One of: - "post-training/messages": The dataset + contains a messages column with list of messages for post-training. { + "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", + "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset + contains a question column and an answer column for evaluation. { "question": + "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": + The dataset contains a messages column with list of messages and an answer + column for evaluation. { "messages": [ {"role": "user", "content": "Hello, + my name is John Doe."}, {"role": "assistant", "content": "Hello, John + Doe. How can I help you today?"}, {"role": "user", "content": "What's + my name?"}, ], "answer": "John Doe" } + source: + $ref: '#/components/schemas/DataSource' + description: >- + The data source of the dataset. Ensure that the data source schema is + compatible with the purpose of the dataset. Examples: - { "type": "uri", + "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": + "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" + } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" + } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": + "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] + } ] } + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + The metadata for the dataset. - E.g. {"description": "My dataset"}. + dataset_id: + type: string + description: >- + The ID of the dataset. If not provided, an ID will be generated. + additionalProperties: false + required: + - purpose + - source + title: RegisterDatasetRequest + Dataset: + type: object + properties: + identifier: + type: string + provider_resource_id: + type: string + provider_id: + type: string + type: + type: string + enum: + - model + - shield + - vector_store + - dataset + - scoring_function + - benchmark + - tool + - tool_group + - prompt + const: dataset + default: dataset + description: >- + Type of resource, always 'dataset' for datasets + purpose: + type: string + enum: + - post-training/messages + - eval/question-answer + - eval/messages-answer + description: >- + Purpose of the dataset indicating its intended use + source: + oneOf: + - $ref: '#/components/schemas/URIDataSource' + - $ref: '#/components/schemas/RowsDataSource' + discriminator: + propertyName: type + mapping: + uri: '#/components/schemas/URIDataSource' + rows: '#/components/schemas/RowsDataSource' + description: >- + Data source configuration for the dataset + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: Additional metadata for the dataset + additionalProperties: false + required: + - identifier + - provider_id + - type + - purpose + - source + - metadata + title: Dataset + description: >- + Dataset resource for storing and accessing training or evaluation data. + RegisterBenchmarkRequest: + type: object + properties: + benchmark_id: + type: string + description: The ID of the benchmark to register. + dataset_id: + type: string + description: >- + The ID of the dataset to use for the benchmark. + scoring_functions: + type: array + items: + type: string + description: >- + The scoring functions to use for the benchmark. + provider_benchmark_id: + type: string + description: >- + The ID of the provider benchmark to use for the benchmark. + provider_id: + type: string + description: >- + The ID of the provider to use for the benchmark. + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: The metadata to use for the benchmark. + additionalProperties: false + required: + - benchmark_id + - dataset_id + - scoring_functions + title: RegisterBenchmarkRequest responses: BadRequest400: description: The request was invalid or malformed @@ -93,4 +1162,25 @@ components: detail: An unexpected error occurred security: - Default: [] -tags: [] +tags: + - name: Benchmarks + description: '' + - name: Datasets + description: '' + - name: Models + description: '' + - name: ScoringFunctions + description: '' + - name: Shields + description: '' + - name: ToolGroups + description: '' +x-tagGroups: + - name: Operations + tags: + - Benchmarks + - Datasets + - Models + - ScoringFunctions + - Shields + - ToolGroups diff --git a/docs/static/experimental-llama-stack-spec.yaml b/docs/static/experimental-llama-stack-spec.yaml index 68e2f59be..6f379d17c 100644 --- a/docs/static/experimental-llama-stack-spec.yaml +++ b/docs/static/experimental-llama-stack-spec.yaml @@ -162,7 +162,7 @@ paths: schema: $ref: '#/components/schemas/RegisterDatasetRequest' required: true - deprecated: false + deprecated: true /v1beta/datasets/{dataset_id}: get: responses: @@ -219,7 +219,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks: get: responses: @@ -270,7 +270,7 @@ paths: schema: $ref: '#/components/schemas/RegisterBenchmarkRequest' required: true - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}: get: responses: @@ -327,7 +327,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}/evaluations: post: responses: @@ -936,68 +936,6 @@ components: - data title: ListDatasetsResponse description: Response from listing datasets. - DataSource: - oneOf: - - $ref: '#/components/schemas/URIDataSource' - - $ref: '#/components/schemas/RowsDataSource' - discriminator: - propertyName: type - mapping: - uri: '#/components/schemas/URIDataSource' - rows: '#/components/schemas/RowsDataSource' - RegisterDatasetRequest: - type: object - properties: - purpose: - type: string - enum: - - post-training/messages - - eval/question-answer - - eval/messages-answer - description: >- - The purpose of the dataset. One of: - "post-training/messages": The dataset - contains a messages column with list of messages for post-training. { - "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", - "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset - contains a question column and an answer column for evaluation. { "question": - "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": - The dataset contains a messages column with list of messages and an answer - column for evaluation. { "messages": [ {"role": "user", "content": "Hello, - my name is John Doe."}, {"role": "assistant", "content": "Hello, John - Doe. How can I help you today?"}, {"role": "user", "content": "What's - my name?"}, ], "answer": "John Doe" } - source: - $ref: '#/components/schemas/DataSource' - description: >- - The data source of the dataset. Ensure that the data source schema is - compatible with the purpose of the dataset. Examples: - { "type": "uri", - "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": - "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" - } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" - } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": - "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] - } ] } - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - The metadata for the dataset. - E.g. {"description": "My dataset"}. - dataset_id: - type: string - description: >- - The ID of the dataset. If not provided, an ID will be generated. - additionalProperties: false - required: - - purpose - - source - title: RegisterDatasetRequest Benchmark: type: object properties: @@ -1065,47 +1003,6 @@ components: required: - data title: ListBenchmarksResponse - RegisterBenchmarkRequest: - type: object - properties: - benchmark_id: - type: string - description: The ID of the benchmark to register. - dataset_id: - type: string - description: >- - The ID of the dataset to use for the benchmark. - scoring_functions: - type: array - items: - type: string - description: >- - The scoring functions to use for the benchmark. - provider_benchmark_id: - type: string - description: >- - The ID of the provider benchmark to use for the benchmark. - provider_id: - type: string - description: >- - The ID of the provider to use for the benchmark. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The metadata to use for the benchmark. - additionalProperties: false - required: - - benchmark_id - - dataset_id - - scoring_functions - title: RegisterBenchmarkRequest AggregationFunctionType: type: string enum: @@ -2254,6 +2151,109 @@ components: - hyperparam_search_config - logger_config title: SupervisedFineTuneRequest + DataSource: + oneOf: + - $ref: '#/components/schemas/URIDataSource' + - $ref: '#/components/schemas/RowsDataSource' + discriminator: + propertyName: type + mapping: + uri: '#/components/schemas/URIDataSource' + rows: '#/components/schemas/RowsDataSource' + RegisterDatasetRequest: + type: object + properties: + purpose: + type: string + enum: + - post-training/messages + - eval/question-answer + - eval/messages-answer + description: >- + The purpose of the dataset. One of: - "post-training/messages": The dataset + contains a messages column with list of messages for post-training. { + "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", + "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset + contains a question column and an answer column for evaluation. { "question": + "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": + The dataset contains a messages column with list of messages and an answer + column for evaluation. { "messages": [ {"role": "user", "content": "Hello, + my name is John Doe."}, {"role": "assistant", "content": "Hello, John + Doe. How can I help you today?"}, {"role": "user", "content": "What's + my name?"}, ], "answer": "John Doe" } + source: + $ref: '#/components/schemas/DataSource' + description: >- + The data source of the dataset. Ensure that the data source schema is + compatible with the purpose of the dataset. Examples: - { "type": "uri", + "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": + "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" + } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" + } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": + "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] + } ] } + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + The metadata for the dataset. - E.g. {"description": "My dataset"}. + dataset_id: + type: string + description: >- + The ID of the dataset. If not provided, an ID will be generated. + additionalProperties: false + required: + - purpose + - source + title: RegisterDatasetRequest + RegisterBenchmarkRequest: + type: object + properties: + benchmark_id: + type: string + description: The ID of the benchmark to register. + dataset_id: + type: string + description: >- + The ID of the dataset to use for the benchmark. + scoring_functions: + type: array + items: + type: string + description: >- + The scoring functions to use for the benchmark. + provider_benchmark_id: + type: string + description: >- + The ID of the provider benchmark to use for the benchmark. + provider_id: + type: string + description: >- + The ID of the provider to use for the benchmark. + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: The metadata to use for the benchmark. + additionalProperties: false + required: + - benchmark_id + - dataset_id + - scoring_functions + title: RegisterBenchmarkRequest responses: BadRequest400: description: The request was invalid or malformed diff --git a/docs/static/llama-stack-spec.yaml b/docs/static/llama-stack-spec.yaml index 72600bf13..4680afac9 100644 --- a/docs/static/llama-stack-spec.yaml +++ b/docs/static/llama-stack-spec.yaml @@ -995,39 +995,6 @@ paths: description: List models using the OpenAI API. parameters: [] deprecated: false - post: - responses: - '200': - description: A Model. - content: - application/json: - schema: - $ref: '#/components/schemas/Model' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Register model. - description: >- - Register model. - - Register a model. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterModelRequest' - required: true - deprecated: false /v1/models/{model_id}: get: responses: @@ -1062,36 +1029,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Unregister model. - description: >- - Unregister model. - - Unregister a model. - parameters: - - name: model_id - in: path - description: >- - The identifier of the model to unregister. - required: true - schema: - type: string - deprecated: false /v1/moderations: post: responses: @@ -1722,32 +1659,6 @@ paths: description: List all scoring functions. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Register a scoring function. - description: Register a scoring function. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterScoringFunctionRequest' - required: true - deprecated: false /v1/scoring-functions/{scoring_fn_id}: get: responses: @@ -1779,33 +1690,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Unregister a scoring function. - description: Unregister a scoring function. - parameters: - - name: scoring_fn_id - in: path - description: >- - The ID of the scoring function to unregister. - required: true - schema: - type: string - deprecated: false /v1/scoring/score: post: responses: @@ -1894,36 +1778,6 @@ paths: description: List all shields. parameters: [] deprecated: false - post: - responses: - '200': - description: A Shield. - content: - application/json: - schema: - $ref: '#/components/schemas/Shield' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Register a shield. - description: Register a shield. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterShieldRequest' - required: true - deprecated: false /v1/shields/{identifier}: get: responses: @@ -1955,33 +1809,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Unregister a shield. - description: Unregister a shield. - parameters: - - name: identifier - in: path - description: >- - The identifier of the shield to unregister. - required: true - schema: - type: string - deprecated: false /v1/tool-runtime/invoke: post: responses: @@ -2077,32 +1904,6 @@ paths: description: List tool groups with optional provider. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Register a tool group. - description: Register a tool group. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterToolGroupRequest' - required: true - deprecated: false /v1/toolgroups/{toolgroup_id}: get: responses: @@ -2134,32 +1935,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Unregister a tool group. - description: Unregister a tool group. - parameters: - - name: toolgroup_id - in: path - description: The ID of the tool group to unregister. - required: true - schema: - type: string - deprecated: false /v1/tools: get: responses: @@ -5564,46 +5339,6 @@ components: required: - data title: OpenAIListModelsResponse - ModelType: - type: string - enum: - - llm - - embedding - - rerank - title: ModelType - description: >- - Enumeration of supported model types in Llama Stack. - RegisterModelRequest: - type: object - properties: - model_id: - type: string - description: The identifier of the model to register. - provider_model_id: - type: string - description: >- - The identifier of the model in the provider. - provider_id: - type: string - description: The identifier of the provider. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: Any additional metadata for this model. - model_type: - $ref: '#/components/schemas/ModelType' - description: The type of model to register. - additionalProperties: false - required: - - model_id - title: RegisterModelRequest Model: type: object properties: @@ -5661,6 +5396,15 @@ components: title: Model description: >- A model resource representing an AI model registered in Llama Stack. + ModelType: + type: string + enum: + - llm + - embedding + - rerank + title: ModelType + description: >- + Enumeration of supported model types in Llama Stack. RunModerationRequest: type: object properties: @@ -8399,61 +8143,6 @@ components: required: - data title: ListScoringFunctionsResponse - ParamType: - oneOf: - - $ref: '#/components/schemas/StringType' - - $ref: '#/components/schemas/NumberType' - - $ref: '#/components/schemas/BooleanType' - - $ref: '#/components/schemas/ArrayType' - - $ref: '#/components/schemas/ObjectType' - - $ref: '#/components/schemas/JsonType' - - $ref: '#/components/schemas/UnionType' - - $ref: '#/components/schemas/ChatCompletionInputType' - - $ref: '#/components/schemas/CompletionInputType' - discriminator: - propertyName: type - mapping: - string: '#/components/schemas/StringType' - number: '#/components/schemas/NumberType' - boolean: '#/components/schemas/BooleanType' - array: '#/components/schemas/ArrayType' - object: '#/components/schemas/ObjectType' - json: '#/components/schemas/JsonType' - union: '#/components/schemas/UnionType' - chat_completion_input: '#/components/schemas/ChatCompletionInputType' - completion_input: '#/components/schemas/CompletionInputType' - RegisterScoringFunctionRequest: - type: object - properties: - scoring_fn_id: - type: string - description: >- - The ID of the scoring function to register. - description: - type: string - description: The description of the scoring function. - return_type: - $ref: '#/components/schemas/ParamType' - description: The return type of the scoring function. - provider_scoring_fn_id: - type: string - description: >- - The ID of the provider scoring function to use for the scoring function. - provider_id: - type: string - description: >- - The ID of the provider to use for the scoring function. - params: - $ref: '#/components/schemas/ScoringFnParams' - description: >- - The parameters for the scoring function for benchmark eval, these can - be overridden for app eval. - additionalProperties: false - required: - - scoring_fn_id - - description - - return_type - title: RegisterScoringFunctionRequest ScoreRequest: type: object properties: @@ -8629,35 +8318,6 @@ components: required: - data title: ListShieldsResponse - RegisterShieldRequest: - type: object - properties: - shield_id: - type: string - description: >- - The identifier of the shield to register. - provider_shield_id: - type: string - description: >- - The identifier of the shield in the provider. - provider_id: - type: string - description: The identifier of the provider. - params: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The parameters of the shield. - additionalProperties: false - required: - - shield_id - title: RegisterShieldRequest InvokeToolRequest: type: object properties: @@ -8918,37 +8578,6 @@ components: title: ListToolGroupsResponse description: >- Response containing a list of tool groups. - RegisterToolGroupRequest: - type: object - properties: - toolgroup_id: - type: string - description: The ID of the tool group to register. - provider_id: - type: string - description: >- - The ID of the provider to use for the tool group. - mcp_endpoint: - $ref: '#/components/schemas/URL' - description: >- - The MCP endpoint to use for the tool group. - args: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - A dictionary of arguments to pass to the tool group. - additionalProperties: false - required: - - toolgroup_id - - provider_id - title: RegisterToolGroupRequest Chunk: type: object properties: diff --git a/docs/static/stainless-llama-stack-spec.yaml b/docs/static/stainless-llama-stack-spec.yaml index adee2f086..2b9849535 100644 --- a/docs/static/stainless-llama-stack-spec.yaml +++ b/docs/static/stainless-llama-stack-spec.yaml @@ -998,39 +998,6 @@ paths: description: List models using the OpenAI API. parameters: [] deprecated: false - post: - responses: - '200': - description: A Model. - content: - application/json: - schema: - $ref: '#/components/schemas/Model' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Register model. - description: >- - Register model. - - Register a model. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterModelRequest' - required: true - deprecated: false /v1/models/{model_id}: get: responses: @@ -1065,36 +1032,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Models - summary: Unregister model. - description: >- - Unregister model. - - Unregister a model. - parameters: - - name: model_id - in: path - description: >- - The identifier of the model to unregister. - required: true - schema: - type: string - deprecated: false /v1/moderations: post: responses: @@ -1725,32 +1662,6 @@ paths: description: List all scoring functions. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Register a scoring function. - description: Register a scoring function. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterScoringFunctionRequest' - required: true - deprecated: false /v1/scoring-functions/{scoring_fn_id}: get: responses: @@ -1782,33 +1693,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ScoringFunctions - summary: Unregister a scoring function. - description: Unregister a scoring function. - parameters: - - name: scoring_fn_id - in: path - description: >- - The ID of the scoring function to unregister. - required: true - schema: - type: string - deprecated: false /v1/scoring/score: post: responses: @@ -1897,36 +1781,6 @@ paths: description: List all shields. parameters: [] deprecated: false - post: - responses: - '200': - description: A Shield. - content: - application/json: - schema: - $ref: '#/components/schemas/Shield' - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Register a shield. - description: Register a shield. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterShieldRequest' - required: true - deprecated: false /v1/shields/{identifier}: get: responses: @@ -1958,33 +1812,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - Shields - summary: Unregister a shield. - description: Unregister a shield. - parameters: - - name: identifier - in: path - description: >- - The identifier of the shield to unregister. - required: true - schema: - type: string - deprecated: false /v1/tool-runtime/invoke: post: responses: @@ -2080,32 +1907,6 @@ paths: description: List tool groups with optional provider. parameters: [] deprecated: false - post: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Register a tool group. - description: Register a tool group. - parameters: [] - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/RegisterToolGroupRequest' - required: true - deprecated: false /v1/toolgroups/{toolgroup_id}: get: responses: @@ -2137,32 +1938,6 @@ paths: schema: type: string deprecated: false - delete: - responses: - '200': - description: OK - '400': - $ref: '#/components/responses/BadRequest400' - '429': - $ref: >- - #/components/responses/TooManyRequests429 - '500': - $ref: >- - #/components/responses/InternalServerError500 - default: - $ref: '#/components/responses/DefaultError' - tags: - - ToolGroups - summary: Unregister a tool group. - description: Unregister a tool group. - parameters: - - name: toolgroup_id - in: path - description: The ID of the tool group to unregister. - required: true - schema: - type: string - deprecated: false /v1/tools: get: responses: @@ -3171,7 +2946,7 @@ paths: schema: $ref: '#/components/schemas/RegisterDatasetRequest' required: true - deprecated: false + deprecated: true /v1beta/datasets/{dataset_id}: get: responses: @@ -3228,7 +3003,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks: get: responses: @@ -3279,7 +3054,7 @@ paths: schema: $ref: '#/components/schemas/RegisterBenchmarkRequest' required: true - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}: get: responses: @@ -3336,7 +3111,7 @@ paths: required: true schema: type: string - deprecated: false + deprecated: true /v1alpha/eval/benchmarks/{benchmark_id}/evaluations: post: responses: @@ -6280,46 +6055,6 @@ components: required: - data title: OpenAIListModelsResponse - ModelType: - type: string - enum: - - llm - - embedding - - rerank - title: ModelType - description: >- - Enumeration of supported model types in Llama Stack. - RegisterModelRequest: - type: object - properties: - model_id: - type: string - description: The identifier of the model to register. - provider_model_id: - type: string - description: >- - The identifier of the model in the provider. - provider_id: - type: string - description: The identifier of the provider. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: Any additional metadata for this model. - model_type: - $ref: '#/components/schemas/ModelType' - description: The type of model to register. - additionalProperties: false - required: - - model_id - title: RegisterModelRequest Model: type: object properties: @@ -6377,6 +6112,15 @@ components: title: Model description: >- A model resource representing an AI model registered in Llama Stack. + ModelType: + type: string + enum: + - llm + - embedding + - rerank + title: ModelType + description: >- + Enumeration of supported model types in Llama Stack. RunModerationRequest: type: object properties: @@ -9115,61 +8859,6 @@ components: required: - data title: ListScoringFunctionsResponse - ParamType: - oneOf: - - $ref: '#/components/schemas/StringType' - - $ref: '#/components/schemas/NumberType' - - $ref: '#/components/schemas/BooleanType' - - $ref: '#/components/schemas/ArrayType' - - $ref: '#/components/schemas/ObjectType' - - $ref: '#/components/schemas/JsonType' - - $ref: '#/components/schemas/UnionType' - - $ref: '#/components/schemas/ChatCompletionInputType' - - $ref: '#/components/schemas/CompletionInputType' - discriminator: - propertyName: type - mapping: - string: '#/components/schemas/StringType' - number: '#/components/schemas/NumberType' - boolean: '#/components/schemas/BooleanType' - array: '#/components/schemas/ArrayType' - object: '#/components/schemas/ObjectType' - json: '#/components/schemas/JsonType' - union: '#/components/schemas/UnionType' - chat_completion_input: '#/components/schemas/ChatCompletionInputType' - completion_input: '#/components/schemas/CompletionInputType' - RegisterScoringFunctionRequest: - type: object - properties: - scoring_fn_id: - type: string - description: >- - The ID of the scoring function to register. - description: - type: string - description: The description of the scoring function. - return_type: - $ref: '#/components/schemas/ParamType' - description: The return type of the scoring function. - provider_scoring_fn_id: - type: string - description: >- - The ID of the provider scoring function to use for the scoring function. - provider_id: - type: string - description: >- - The ID of the provider to use for the scoring function. - params: - $ref: '#/components/schemas/ScoringFnParams' - description: >- - The parameters for the scoring function for benchmark eval, these can - be overridden for app eval. - additionalProperties: false - required: - - scoring_fn_id - - description - - return_type - title: RegisterScoringFunctionRequest ScoreRequest: type: object properties: @@ -9345,35 +9034,6 @@ components: required: - data title: ListShieldsResponse - RegisterShieldRequest: - type: object - properties: - shield_id: - type: string - description: >- - The identifier of the shield to register. - provider_shield_id: - type: string - description: >- - The identifier of the shield in the provider. - provider_id: - type: string - description: The identifier of the provider. - params: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The parameters of the shield. - additionalProperties: false - required: - - shield_id - title: RegisterShieldRequest InvokeToolRequest: type: object properties: @@ -9634,37 +9294,6 @@ components: title: ListToolGroupsResponse description: >- Response containing a list of tool groups. - RegisterToolGroupRequest: - type: object - properties: - toolgroup_id: - type: string - description: The ID of the tool group to register. - provider_id: - type: string - description: >- - The ID of the provider to use for the tool group. - mcp_endpoint: - $ref: '#/components/schemas/URL' - description: >- - The MCP endpoint to use for the tool group. - args: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - A dictionary of arguments to pass to the tool group. - additionalProperties: false - required: - - toolgroup_id - - provider_id - title: RegisterToolGroupRequest Chunk: type: object properties: @@ -10810,68 +10439,6 @@ components: - data title: ListDatasetsResponse description: Response from listing datasets. - DataSource: - oneOf: - - $ref: '#/components/schemas/URIDataSource' - - $ref: '#/components/schemas/RowsDataSource' - discriminator: - propertyName: type - mapping: - uri: '#/components/schemas/URIDataSource' - rows: '#/components/schemas/RowsDataSource' - RegisterDatasetRequest: - type: object - properties: - purpose: - type: string - enum: - - post-training/messages - - eval/question-answer - - eval/messages-answer - description: >- - The purpose of the dataset. One of: - "post-training/messages": The dataset - contains a messages column with list of messages for post-training. { - "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", - "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset - contains a question column and an answer column for evaluation. { "question": - "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": - The dataset contains a messages column with list of messages and an answer - column for evaluation. { "messages": [ {"role": "user", "content": "Hello, - my name is John Doe."}, {"role": "assistant", "content": "Hello, John - Doe. How can I help you today?"}, {"role": "user", "content": "What's - my name?"}, ], "answer": "John Doe" } - source: - $ref: '#/components/schemas/DataSource' - description: >- - The data source of the dataset. Ensure that the data source schema is - compatible with the purpose of the dataset. Examples: - { "type": "uri", - "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": - "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" - } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" - } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": - "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] - } ] } - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: >- - The metadata for the dataset. - E.g. {"description": "My dataset"}. - dataset_id: - type: string - description: >- - The ID of the dataset. If not provided, an ID will be generated. - additionalProperties: false - required: - - purpose - - source - title: RegisterDatasetRequest Benchmark: type: object properties: @@ -10939,47 +10506,6 @@ components: required: - data title: ListBenchmarksResponse - RegisterBenchmarkRequest: - type: object - properties: - benchmark_id: - type: string - description: The ID of the benchmark to register. - dataset_id: - type: string - description: >- - The ID of the dataset to use for the benchmark. - scoring_functions: - type: array - items: - type: string - description: >- - The scoring functions to use for the benchmark. - provider_benchmark_id: - type: string - description: >- - The ID of the provider benchmark to use for the benchmark. - provider_id: - type: string - description: >- - The ID of the provider to use for the benchmark. - metadata: - type: object - additionalProperties: - oneOf: - - type: 'null' - - type: boolean - - type: number - - type: string - - type: array - - type: object - description: The metadata to use for the benchmark. - additionalProperties: false - required: - - benchmark_id - - dataset_id - - scoring_functions - title: RegisterBenchmarkRequest BenchmarkConfig: type: object properties: @@ -11841,6 +11367,109 @@ components: - hyperparam_search_config - logger_config title: SupervisedFineTuneRequest + DataSource: + oneOf: + - $ref: '#/components/schemas/URIDataSource' + - $ref: '#/components/schemas/RowsDataSource' + discriminator: + propertyName: type + mapping: + uri: '#/components/schemas/URIDataSource' + rows: '#/components/schemas/RowsDataSource' + RegisterDatasetRequest: + type: object + properties: + purpose: + type: string + enum: + - post-training/messages + - eval/question-answer + - eval/messages-answer + description: >- + The purpose of the dataset. One of: - "post-training/messages": The dataset + contains a messages column with list of messages for post-training. { + "messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant", + "content": "Hello, world!"}, ] } - "eval/question-answer": The dataset + contains a question column and an answer column for evaluation. { "question": + "What is the capital of France?", "answer": "Paris" } - "eval/messages-answer": + The dataset contains a messages column with list of messages and an answer + column for evaluation. { "messages": [ {"role": "user", "content": "Hello, + my name is John Doe."}, {"role": "assistant", "content": "Hello, John + Doe. How can I help you today?"}, {"role": "user", "content": "What's + my name?"}, ], "answer": "John Doe" } + source: + $ref: '#/components/schemas/DataSource' + description: >- + The data source of the dataset. Ensure that the data source schema is + compatible with the purpose of the dataset. Examples: - { "type": "uri", + "uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri": + "lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}" + } - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train" + } - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content": + "Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ] + } ] } + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: >- + The metadata for the dataset. - E.g. {"description": "My dataset"}. + dataset_id: + type: string + description: >- + The ID of the dataset. If not provided, an ID will be generated. + additionalProperties: false + required: + - purpose + - source + title: RegisterDatasetRequest + RegisterBenchmarkRequest: + type: object + properties: + benchmark_id: + type: string + description: The ID of the benchmark to register. + dataset_id: + type: string + description: >- + The ID of the dataset to use for the benchmark. + scoring_functions: + type: array + items: + type: string + description: >- + The scoring functions to use for the benchmark. + provider_benchmark_id: + type: string + description: >- + The ID of the provider benchmark to use for the benchmark. + provider_id: + type: string + description: >- + The ID of the provider to use for the benchmark. + metadata: + type: object + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + description: The metadata to use for the benchmark. + additionalProperties: false + required: + - benchmark_id + - dataset_id + - scoring_functions + title: RegisterBenchmarkRequest responses: BadRequest400: description: The request was invalid or malformed diff --git a/src/llama_stack/apis/benchmarks/benchmarks.py b/src/llama_stack/apis/benchmarks/benchmarks.py index 933205489..9a67269c3 100644 --- a/src/llama_stack/apis/benchmarks/benchmarks.py +++ b/src/llama_stack/apis/benchmarks/benchmarks.py @@ -74,7 +74,7 @@ class Benchmarks(Protocol): """ ... - @webmethod(route="/eval/benchmarks", method="POST", level=LLAMA_STACK_API_V1ALPHA) + @webmethod(route="/eval/benchmarks", method="POST", level=LLAMA_STACK_API_V1ALPHA, deprecated=True) async def register_benchmark( self, benchmark_id: str, @@ -95,7 +95,7 @@ class Benchmarks(Protocol): """ ... - @webmethod(route="/eval/benchmarks/{benchmark_id}", method="DELETE", level=LLAMA_STACK_API_V1ALPHA) + @webmethod(route="/eval/benchmarks/{benchmark_id}", method="DELETE", level=LLAMA_STACK_API_V1ALPHA, deprecated=True) async def unregister_benchmark(self, benchmark_id: str) -> None: """Unregister a benchmark. diff --git a/src/llama_stack/apis/datasets/datasets.py b/src/llama_stack/apis/datasets/datasets.py index ed4ecec22..9bedc6209 100644 --- a/src/llama_stack/apis/datasets/datasets.py +++ b/src/llama_stack/apis/datasets/datasets.py @@ -146,7 +146,7 @@ class ListDatasetsResponse(BaseModel): class Datasets(Protocol): - @webmethod(route="/datasets", method="POST", level=LLAMA_STACK_API_V1BETA) + @webmethod(route="/datasets", method="POST", level=LLAMA_STACK_API_V1BETA, deprecated=True) async def register_dataset( self, purpose: DatasetPurpose, @@ -235,7 +235,7 @@ class Datasets(Protocol): """ ... - @webmethod(route="/datasets/{dataset_id:path}", method="DELETE", level=LLAMA_STACK_API_V1BETA) + @webmethod(route="/datasets/{dataset_id:path}", method="DELETE", level=LLAMA_STACK_API_V1BETA, deprecated=True) async def unregister_dataset( self, dataset_id: str, diff --git a/src/llama_stack/apis/models/models.py b/src/llama_stack/apis/models/models.py index 5c976886c..bbb359b51 100644 --- a/src/llama_stack/apis/models/models.py +++ b/src/llama_stack/apis/models/models.py @@ -136,7 +136,7 @@ class Models(Protocol): """ ... - @webmethod(route="/models", method="POST", level=LLAMA_STACK_API_V1) + @webmethod(route="/models", method="POST", level=LLAMA_STACK_API_V1, deprecated=True) async def register_model( self, model_id: str, @@ -158,7 +158,7 @@ class Models(Protocol): """ ... - @webmethod(route="/models/{model_id:path}", method="DELETE", level=LLAMA_STACK_API_V1) + @webmethod(route="/models/{model_id:path}", method="DELETE", level=LLAMA_STACK_API_V1, deprecated=True) async def unregister_model( self, model_id: str, diff --git a/src/llama_stack/apis/scoring_functions/scoring_functions.py b/src/llama_stack/apis/scoring_functions/scoring_functions.py index fe49723ab..78f4a7541 100644 --- a/src/llama_stack/apis/scoring_functions/scoring_functions.py +++ b/src/llama_stack/apis/scoring_functions/scoring_functions.py @@ -178,7 +178,7 @@ class ScoringFunctions(Protocol): """ ... - @webmethod(route="/scoring-functions", method="POST", level=LLAMA_STACK_API_V1) + @webmethod(route="/scoring-functions", method="POST", level=LLAMA_STACK_API_V1, deprecated=True) async def register_scoring_function( self, scoring_fn_id: str, @@ -199,7 +199,9 @@ class ScoringFunctions(Protocol): """ ... - @webmethod(route="/scoring-functions/{scoring_fn_id:path}", method="DELETE", level=LLAMA_STACK_API_V1) + @webmethod( + route="/scoring-functions/{scoring_fn_id:path}", method="DELETE", level=LLAMA_STACK_API_V1, deprecated=True + ) async def unregister_scoring_function(self, scoring_fn_id: str) -> None: """Unregister a scoring function. diff --git a/src/llama_stack/apis/shields/shields.py b/src/llama_stack/apis/shields/shields.py index ca4483828..659ba8b75 100644 --- a/src/llama_stack/apis/shields/shields.py +++ b/src/llama_stack/apis/shields/shields.py @@ -67,7 +67,7 @@ class Shields(Protocol): """ ... - @webmethod(route="/shields", method="POST", level=LLAMA_STACK_API_V1) + @webmethod(route="/shields", method="POST", level=LLAMA_STACK_API_V1, deprecated=True) async def register_shield( self, shield_id: str, @@ -85,7 +85,7 @@ class Shields(Protocol): """ ... - @webmethod(route="/shields/{identifier:path}", method="DELETE", level=LLAMA_STACK_API_V1) + @webmethod(route="/shields/{identifier:path}", method="DELETE", level=LLAMA_STACK_API_V1, deprecated=True) async def unregister_shield(self, identifier: str) -> None: """Unregister a shield. diff --git a/src/llama_stack/apis/tools/tools.py b/src/llama_stack/apis/tools/tools.py index c9bdfcfb6..4e7cf2544 100644 --- a/src/llama_stack/apis/tools/tools.py +++ b/src/llama_stack/apis/tools/tools.py @@ -109,7 +109,7 @@ class ListToolDefsResponse(BaseModel): @runtime_checkable @telemetry_traceable class ToolGroups(Protocol): - @webmethod(route="/toolgroups", method="POST", level=LLAMA_STACK_API_V1) + @webmethod(route="/toolgroups", method="POST", level=LLAMA_STACK_API_V1, deprecated=True) async def register_tool_group( self, toolgroup_id: str, @@ -167,7 +167,7 @@ class ToolGroups(Protocol): """ ... - @webmethod(route="/toolgroups/{toolgroup_id:path}", method="DELETE", level=LLAMA_STACK_API_V1) + @webmethod(route="/toolgroups/{toolgroup_id:path}", method="DELETE", level=LLAMA_STACK_API_V1, deprecated=True) async def unregister_toolgroup( self, toolgroup_id: str, From 209a78b618f5e71b1ff384ba9877c815950ac8e1 Mon Sep 17 00:00:00 2001 From: Dennis Kennetz Date: Mon, 10 Nov 2025 15:16:24 -0600 Subject: [PATCH 5/5] feat: add oci genai service as chat inference provider (#3876) # What does this PR do? Adds OCI GenAI PaaS models for openai chat completion endpoints. ## Test Plan In an OCI tenancy with access to GenAI PaaS, perform the following steps: 1. Ensure you have IAM policies in place to use service (check docs included in this PR) 2. For local development, [setup OCI cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and configure the CLI with your region, tenancy, and auth [here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm) 3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like: ```bash OCI_AUTH_TYPE=config_file \ OCI_CLI_PROFILE=CHICAGO \ OCI_REGION=us-chicago-1 \ OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \ llama stack run oci ``` 4. Hit the `models` endpoint to list models after server is running: ```bash curl http://localhost:8321/v1/models | jq ... { "identifier": "meta.llama-4-scout-17b-16e-instruct", "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q", "provider_id": "oci", "type": "model", "metadata": { "display_name": "meta.llama-4-scout-17b-16e-instruct", "capabilities": [ "CHAT" ], "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q" }, "model_type": "llm" }, ... ``` 5. Use the "display_name" field to use the model in a `/chat/completions` request: ```bash # Streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": true, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' # Non-streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": false, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' ``` 6. Try out other models from the `/models` endpoint. --- .../distributions/remote_hosted_distro/oci.md | 143 ++++++++++++++++++ docs/docs/providers/inference/remote_oci.mdx | 41 +++++ pyproject.toml | 1 + src/llama_stack/distributions/oci/__init__.py | 7 + src/llama_stack/distributions/oci/build.yaml | 35 +++++ .../distributions/oci/doc_template.md | 140 +++++++++++++++++ src/llama_stack/distributions/oci/oci.py | 108 +++++++++++++ src/llama_stack/distributions/oci/run.yaml | 136 +++++++++++++++++ .../providers/registry/inference.py | 14 ++ .../remote/inference/oci/__init__.py | 17 +++ .../providers/remote/inference/oci/auth.py | 79 ++++++++++ .../providers/remote/inference/oci/config.py | 75 +++++++++ .../providers/remote/inference/oci/oci.py | 140 +++++++++++++++++ .../inference/test_openai_completion.py | 1 + .../inference/test_openai_embeddings.py | 1 + 15 files changed, 938 insertions(+) create mode 100644 docs/docs/distributions/remote_hosted_distro/oci.md create mode 100644 docs/docs/providers/inference/remote_oci.mdx create mode 100644 src/llama_stack/distributions/oci/__init__.py create mode 100644 src/llama_stack/distributions/oci/build.yaml create mode 100644 src/llama_stack/distributions/oci/doc_template.md create mode 100644 src/llama_stack/distributions/oci/oci.py create mode 100644 src/llama_stack/distributions/oci/run.yaml create mode 100644 src/llama_stack/providers/remote/inference/oci/__init__.py create mode 100644 src/llama_stack/providers/remote/inference/oci/auth.py create mode 100644 src/llama_stack/providers/remote/inference/oci/config.py create mode 100644 src/llama_stack/providers/remote/inference/oci/oci.py diff --git a/docs/docs/distributions/remote_hosted_distro/oci.md b/docs/docs/distributions/remote_hosted_distro/oci.md new file mode 100644 index 000000000..b13cf5f73 --- /dev/null +++ b/docs/docs/distributions/remote_hosted_distro/oci.md @@ -0,0 +1,143 @@ +--- +orphan: true +--- + +# OCI Distribution + +The `llamastack/distribution-oci` distribution consists of the following provider configurations. + +| API | Provider(s) | +|-----|-------------| +| agents | `inline::meta-reference` | +| datasetio | `remote::huggingface`, `inline::localfs` | +| eval | `inline::meta-reference` | +| files | `inline::localfs` | +| inference | `remote::oci` | +| safety | `inline::llama-guard` | +| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` | +| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` | +| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | + + +### Environment Variables + +The following environment variables can be configured: + +- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`) +- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``) +- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``) +- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`) +- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`) + + +## Prerequisites +### Oracle Cloud Infrastructure Setup + +Before using the OCI Generative AI distribution, ensure you have: + +1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/) +2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy +3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models +4. **Authentication**: Configure authentication using either: + - **Instance Principal** (recommended for cloud-hosted deployments) + - **API Key** (for on-premises or development environments) + +### Authentication Methods + +#### Instance Principal Authentication (Recommended) +Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments. + +Requirements: +- Instance must be running in an Oracle Cloud Infrastructure compartment +- Instance must have appropriate IAM policies to access Generative AI services + +#### API Key Authentication +For development or on-premises deployments, follow [this doc](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm) to learn how to create your API signing key for your config file. + +### Required IAM Policies + +Ensure your OCI user or instance has the following policy statements: + +``` +Allow group to use generative-ai-inference-endpoints in compartment +Allow group to manage generative-ai-inference-endpoints in compartment +``` + +## Supported Services + +### Inference: OCI Generative AI +Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports: + +- **Chat Completions**: Conversational AI with context awareness +- **Text Generation**: Complete prompts and generate text content + +#### Available Models +Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models. + +### Safety: Llama Guard +For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide: +- Content filtering and moderation +- Policy compliance checking +- Harmful content detection + +### Vector Storage: Multiple Options +The distribution supports several vector storage providers: +- **FAISS**: Local in-memory vector search +- **ChromaDB**: Distributed vector database +- **PGVector**: PostgreSQL with vector extensions + +### Additional Services +- **Dataset I/O**: Local filesystem and Hugging Face integration +- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities +- **Evaluation**: Meta reference evaluation framework + +## Running Llama Stack with OCI + +You can run the OCI distribution via Docker or local virtual environment. + +### Via venv + +If you've set up your local development environment, you can also build the image using your local virtual environment. + +```bash +OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci +``` + +### Configuration Examples + +#### Using Instance Principal (Recommended for Production) +```bash +export OCI_AUTH_TYPE=instance_principal +export OCI_REGION=us-chicago-1 +export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1.. +``` + +#### Using API Key Authentication (Development) +```bash +export OCI_AUTH_TYPE=config_file +export OCI_CONFIG_FILE_PATH=~/.oci/config +export OCI_CLI_PROFILE=DEFAULT +export OCI_REGION=us-chicago-1 +export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id +``` + +## Regional Endpoints + +OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit: + +https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions + +## Troubleshooting + +### Common Issues + +1. **Authentication Errors**: Verify your OCI credentials and IAM policies +2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region +3. **Permission Denied**: Check compartment permissions and Generative AI service access +4. **Region Unavailable**: Verify the specified region supports Generative AI services + +### Getting Help + +For additional support: +- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm) +- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues) diff --git a/docs/docs/providers/inference/remote_oci.mdx b/docs/docs/providers/inference/remote_oci.mdx new file mode 100644 index 000000000..33a201a55 --- /dev/null +++ b/docs/docs/providers/inference/remote_oci.mdx @@ -0,0 +1,41 @@ +--- +description: | + Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models. + Provider documentation + https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm +sidebar_label: Remote - Oci +title: remote::oci +--- + +# remote::oci + +## Description + + +Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models. +Provider documentation +https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm + + +## Configuration + +| Field | Type | Required | Default | Description | +|-------|------|----------|---------|-------------| +| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. | +| `refresh_models` | `` | No | False | Whether to refresh models periodically from the provider | +| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider | +| `oci_auth_type` | `` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) | +| `oci_region` | `` | No | us-ashburn-1 | OCI region (e.g., us-ashburn-1) | +| `oci_compartment_id` | `` | No | | OCI compartment ID for the Generative AI service | +| `oci_config_file_path` | `` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) | +| `oci_config_profile` | `` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) | + +## Sample Configuration + +```yaml +oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal} +oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config} +oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT} +oci_region: ${env.OCI_REGION:=us-ashburn-1} +oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=} +``` diff --git a/pyproject.toml b/pyproject.toml index 4ec83249c..653c6d613 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -298,6 +298,7 @@ exclude = [ "^src/llama_stack/providers/remote/agents/sample/", "^src/llama_stack/providers/remote/datasetio/huggingface/", "^src/llama_stack/providers/remote/datasetio/nvidia/", + "^src/llama_stack/providers/remote/inference/oci/", "^src/llama_stack/providers/remote/inference/bedrock/", "^src/llama_stack/providers/remote/inference/nvidia/", "^src/llama_stack/providers/remote/inference/passthrough/", diff --git a/src/llama_stack/distributions/oci/__init__.py b/src/llama_stack/distributions/oci/__init__.py new file mode 100644 index 000000000..68c0efe44 --- /dev/null +++ b/src/llama_stack/distributions/oci/__init__.py @@ -0,0 +1,7 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +from .oci import get_distribution_template # noqa: F401 diff --git a/src/llama_stack/distributions/oci/build.yaml b/src/llama_stack/distributions/oci/build.yaml new file mode 100644 index 000000000..7e082e1f6 --- /dev/null +++ b/src/llama_stack/distributions/oci/build.yaml @@ -0,0 +1,35 @@ +version: 2 +distribution_spec: + description: Use Oracle Cloud Infrastructure (OCI) Generative AI for running LLM + inference with scalable cloud services + providers: + inference: + - provider_type: remote::oci + vector_io: + - provider_type: inline::faiss + - provider_type: remote::chromadb + - provider_type: remote::pgvector + safety: + - provider_type: inline::llama-guard + agents: + - provider_type: inline::meta-reference + eval: + - provider_type: inline::meta-reference + datasetio: + - provider_type: remote::huggingface + - provider_type: inline::localfs + scoring: + - provider_type: inline::basic + - provider_type: inline::llm-as-judge + - provider_type: inline::braintrust + tool_runtime: + - provider_type: remote::brave-search + - provider_type: remote::tavily-search + - provider_type: inline::rag-runtime + - provider_type: remote::model-context-protocol + files: + - provider_type: inline::localfs +image_type: venv +additional_pip_packages: +- aiosqlite +- sqlalchemy[asyncio] diff --git a/src/llama_stack/distributions/oci/doc_template.md b/src/llama_stack/distributions/oci/doc_template.md new file mode 100644 index 000000000..320530ccd --- /dev/null +++ b/src/llama_stack/distributions/oci/doc_template.md @@ -0,0 +1,140 @@ +--- +orphan: true +--- +# OCI Distribution + +The `llamastack/distribution-{{ name }}` distribution consists of the following provider configurations. + +{{ providers_table }} + +{% if run_config_env_vars %} +### Environment Variables + +The following environment variables can be configured: + +{% for var, (default_value, description) in run_config_env_vars.items() %} +- `{{ var }}`: {{ description }} (default: `{{ default_value }}`) +{% endfor %} +{% endif %} + +{% if default_models %} +### Models + +The following models are available by default: + +{% for model in default_models %} +- `{{ model.model_id }} {{ model.doc_string }}` +{% endfor %} +{% endif %} + +## Prerequisites +### Oracle Cloud Infrastructure Setup + +Before using the OCI Generative AI distribution, ensure you have: + +1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/) +2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy +3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models +4. **Authentication**: Configure authentication using either: + - **Instance Principal** (recommended for cloud-hosted deployments) + - **API Key** (for on-premises or development environments) + +### Authentication Methods + +#### Instance Principal Authentication (Recommended) +Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments. + +Requirements: +- Instance must be running in an Oracle Cloud Infrastructure compartment +- Instance must have appropriate IAM policies to access Generative AI services + +#### API Key Authentication +For development or on-premises deployments, follow [this doc](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm) to learn how to create your API signing key for your config file. + +### Required IAM Policies + +Ensure your OCI user or instance has the following policy statements: + +``` +Allow group to use generative-ai-inference-endpoints in compartment +Allow group to manage generative-ai-inference-endpoints in compartment +``` + +## Supported Services + +### Inference: OCI Generative AI +Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports: + +- **Chat Completions**: Conversational AI with context awareness +- **Text Generation**: Complete prompts and generate text content + +#### Available Models +Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models. + +### Safety: Llama Guard +For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide: +- Content filtering and moderation +- Policy compliance checking +- Harmful content detection + +### Vector Storage: Multiple Options +The distribution supports several vector storage providers: +- **FAISS**: Local in-memory vector search +- **ChromaDB**: Distributed vector database +- **PGVector**: PostgreSQL with vector extensions + +### Additional Services +- **Dataset I/O**: Local filesystem and Hugging Face integration +- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities +- **Evaluation**: Meta reference evaluation framework + +## Running Llama Stack with OCI + +You can run the OCI distribution via Docker or local virtual environment. + +### Via venv + +If you've set up your local development environment, you can also build the image using your local virtual environment. + +```bash +OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci +``` + +### Configuration Examples + +#### Using Instance Principal (Recommended for Production) +```bash +export OCI_AUTH_TYPE=instance_principal +export OCI_REGION=us-chicago-1 +export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1.. +``` + +#### Using API Key Authentication (Development) +```bash +export OCI_AUTH_TYPE=config_file +export OCI_CONFIG_FILE_PATH=~/.oci/config +export OCI_CLI_PROFILE=DEFAULT +export OCI_REGION=us-chicago-1 +export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id +``` + +## Regional Endpoints + +OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit: + +https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions + +## Troubleshooting + +### Common Issues + +1. **Authentication Errors**: Verify your OCI credentials and IAM policies +2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region +3. **Permission Denied**: Check compartment permissions and Generative AI service access +4. **Region Unavailable**: Verify the specified region supports Generative AI services + +### Getting Help + +For additional support: +- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm) +- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues) \ No newline at end of file diff --git a/src/llama_stack/distributions/oci/oci.py b/src/llama_stack/distributions/oci/oci.py new file mode 100644 index 000000000..1f21840f1 --- /dev/null +++ b/src/llama_stack/distributions/oci/oci.py @@ -0,0 +1,108 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +from pathlib import Path + +from llama_stack.core.datatypes import BuildProvider, Provider, ToolGroupInput +from llama_stack.distributions.template import DistributionTemplate, RunConfigSettings +from llama_stack.providers.inline.files.localfs.config import LocalfsFilesImplConfig +from llama_stack.providers.inline.vector_io.faiss.config import FaissVectorIOConfig +from llama_stack.providers.remote.inference.oci.config import OCIConfig + + +def get_distribution_template(name: str = "oci") -> DistributionTemplate: + providers = { + "inference": [BuildProvider(provider_type="remote::oci")], + "vector_io": [ + BuildProvider(provider_type="inline::faiss"), + BuildProvider(provider_type="remote::chromadb"), + BuildProvider(provider_type="remote::pgvector"), + ], + "safety": [BuildProvider(provider_type="inline::llama-guard")], + "agents": [BuildProvider(provider_type="inline::meta-reference")], + "eval": [BuildProvider(provider_type="inline::meta-reference")], + "datasetio": [ + BuildProvider(provider_type="remote::huggingface"), + BuildProvider(provider_type="inline::localfs"), + ], + "scoring": [ + BuildProvider(provider_type="inline::basic"), + BuildProvider(provider_type="inline::llm-as-judge"), + BuildProvider(provider_type="inline::braintrust"), + ], + "tool_runtime": [ + BuildProvider(provider_type="remote::brave-search"), + BuildProvider(provider_type="remote::tavily-search"), + BuildProvider(provider_type="inline::rag-runtime"), + BuildProvider(provider_type="remote::model-context-protocol"), + ], + "files": [BuildProvider(provider_type="inline::localfs")], + } + + inference_provider = Provider( + provider_id="oci", + provider_type="remote::oci", + config=OCIConfig.sample_run_config(), + ) + + vector_io_provider = Provider( + provider_id="faiss", + provider_type="inline::faiss", + config=FaissVectorIOConfig.sample_run_config(f"~/.llama/distributions/{name}"), + ) + + files_provider = Provider( + provider_id="meta-reference-files", + provider_type="inline::localfs", + config=LocalfsFilesImplConfig.sample_run_config(f"~/.llama/distributions/{name}"), + ) + default_tool_groups = [ + ToolGroupInput( + toolgroup_id="builtin::websearch", + provider_id="tavily-search", + ), + ] + + return DistributionTemplate( + name=name, + distro_type="remote_hosted", + description="Use Oracle Cloud Infrastructure (OCI) Generative AI for running LLM inference with scalable cloud services", + container_image=None, + template_path=Path(__file__).parent / "doc_template.md", + providers=providers, + run_configs={ + "run.yaml": RunConfigSettings( + provider_overrides={ + "inference": [inference_provider], + "vector_io": [vector_io_provider], + "files": [files_provider], + }, + default_tool_groups=default_tool_groups, + ), + }, + run_config_env_vars={ + "OCI_AUTH_TYPE": ( + "instance_principal", + "OCI authentication type (instance_principal or config_file)", + ), + "OCI_REGION": ( + "", + "OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1)", + ), + "OCI_COMPARTMENT_OCID": ( + "", + "OCI compartment ID for the Generative AI service", + ), + "OCI_CONFIG_FILE_PATH": ( + "~/.oci/config", + "OCI config file path (required if OCI_AUTH_TYPE is config_file)", + ), + "OCI_CLI_PROFILE": ( + "DEFAULT", + "OCI CLI profile name to use from config file", + ), + }, + ) diff --git a/src/llama_stack/distributions/oci/run.yaml b/src/llama_stack/distributions/oci/run.yaml new file mode 100644 index 000000000..e385ec606 --- /dev/null +++ b/src/llama_stack/distributions/oci/run.yaml @@ -0,0 +1,136 @@ +version: 2 +image_name: oci +apis: +- agents +- datasetio +- eval +- files +- inference +- safety +- scoring +- tool_runtime +- vector_io +providers: + inference: + - provider_id: oci + provider_type: remote::oci + config: + oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal} + oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config} + oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT} + oci_region: ${env.OCI_REGION:=us-ashburn-1} + oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=} + vector_io: + - provider_id: faiss + provider_type: inline::faiss + config: + persistence: + namespace: vector_io::faiss + backend: kv_default + safety: + - provider_id: llama-guard + provider_type: inline::llama-guard + config: + excluded_categories: [] + agents: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + persistence: + agent_state: + namespace: agents + backend: kv_default + responses: + table_name: responses + backend: sql_default + max_write_queue_size: 10000 + num_writers: 4 + eval: + - provider_id: meta-reference + provider_type: inline::meta-reference + config: + kvstore: + namespace: eval + backend: kv_default + datasetio: + - provider_id: huggingface + provider_type: remote::huggingface + config: + kvstore: + namespace: datasetio::huggingface + backend: kv_default + - provider_id: localfs + provider_type: inline::localfs + config: + kvstore: + namespace: datasetio::localfs + backend: kv_default + scoring: + - provider_id: basic + provider_type: inline::basic + - provider_id: llm-as-judge + provider_type: inline::llm-as-judge + - provider_id: braintrust + provider_type: inline::braintrust + config: + openai_api_key: ${env.OPENAI_API_KEY:=} + tool_runtime: + - provider_id: brave-search + provider_type: remote::brave-search + config: + api_key: ${env.BRAVE_SEARCH_API_KEY:=} + max_results: 3 + - provider_id: tavily-search + provider_type: remote::tavily-search + config: + api_key: ${env.TAVILY_SEARCH_API_KEY:=} + max_results: 3 + - provider_id: rag-runtime + provider_type: inline::rag-runtime + - provider_id: model-context-protocol + provider_type: remote::model-context-protocol + files: + - provider_id: meta-reference-files + provider_type: inline::localfs + config: + storage_dir: ${env.FILES_STORAGE_DIR:=~/.llama/distributions/oci/files} + metadata_store: + table_name: files_metadata + backend: sql_default +storage: + backends: + kv_default: + type: kv_sqlite + db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/oci}/kvstore.db + sql_default: + type: sql_sqlite + db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/oci}/sql_store.db + stores: + metadata: + namespace: registry + backend: kv_default + inference: + table_name: inference_store + backend: sql_default + max_write_queue_size: 10000 + num_writers: 4 + conversations: + table_name: openai_conversations + backend: sql_default + prompts: + namespace: prompts + backend: kv_default +registered_resources: + models: [] + shields: [] + vector_dbs: [] + datasets: [] + scoring_fns: [] + benchmarks: [] + tool_groups: + - toolgroup_id: builtin::websearch + provider_id: tavily-search +server: + port: 8321 +telemetry: + enabled: true diff --git a/src/llama_stack/providers/registry/inference.py b/src/llama_stack/providers/registry/inference.py index 1b70182fc..3cbfd408b 100644 --- a/src/llama_stack/providers/registry/inference.py +++ b/src/llama_stack/providers/registry/inference.py @@ -297,6 +297,20 @@ Available Models: Azure OpenAI inference provider for accessing GPT models and other Azure services. Provider documentation https://learn.microsoft.com/en-us/azure/ai-foundry/openai/overview +""", + ), + RemoteProviderSpec( + api=Api.inference, + provider_type="remote::oci", + adapter_type="oci", + pip_packages=["oci"], + module="llama_stack.providers.remote.inference.oci", + config_class="llama_stack.providers.remote.inference.oci.config.OCIConfig", + provider_data_validator="llama_stack.providers.remote.inference.oci.config.OCIProviderDataValidator", + description=""" +Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models. +Provider documentation +https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm """, ), ] diff --git a/src/llama_stack/providers/remote/inference/oci/__init__.py b/src/llama_stack/providers/remote/inference/oci/__init__.py new file mode 100644 index 000000000..280a8c1d2 --- /dev/null +++ b/src/llama_stack/providers/remote/inference/oci/__init__.py @@ -0,0 +1,17 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +from llama_stack.apis.inference import InferenceProvider + +from .config import OCIConfig + + +async def get_adapter_impl(config: OCIConfig, _deps) -> InferenceProvider: + from .oci import OCIInferenceAdapter + + adapter = OCIInferenceAdapter(config=config) + await adapter.initialize() + return adapter diff --git a/src/llama_stack/providers/remote/inference/oci/auth.py b/src/llama_stack/providers/remote/inference/oci/auth.py new file mode 100644 index 000000000..f64436eb5 --- /dev/null +++ b/src/llama_stack/providers/remote/inference/oci/auth.py @@ -0,0 +1,79 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +from collections.abc import Generator, Mapping +from typing import Any, override + +import httpx +import oci +import requests +from oci.config import DEFAULT_LOCATION, DEFAULT_PROFILE + +OciAuthSigner = type[oci.signer.AbstractBaseSigner] + + +class HttpxOciAuth(httpx.Auth): + """ + Custom HTTPX authentication class that implements OCI request signing. + + This class handles the authentication flow for HTTPX requests by signing them + using the OCI Signer, which adds the necessary authentication headers for + OCI API calls. + + Attributes: + signer (oci.signer.Signer): The OCI signer instance used for request signing + """ + + def __init__(self, signer: OciAuthSigner): + self.signer = signer + + @override + def auth_flow(self, request: httpx.Request) -> Generator[httpx.Request, httpx.Response, None]: + # Read the request content to handle streaming requests properly + try: + content = request.content + except httpx.RequestNotRead: + # For streaming requests, we need to read the content first + content = request.read() + + req = requests.Request( + method=request.method, + url=str(request.url), + headers=dict(request.headers), + data=content, + ) + prepared_request = req.prepare() + + # Sign the request using the OCI Signer + self.signer.do_request_sign(prepared_request) # type: ignore + + # Update the original HTTPX request with the signed headers + request.headers.update(prepared_request.headers) + + yield request + + +class OciInstancePrincipalAuth(HttpxOciAuth): + def __init__(self, **kwargs: Mapping[str, Any]): + self.signer = oci.auth.signers.InstancePrincipalsSecurityTokenSigner(**kwargs) + + +class OciUserPrincipalAuth(HttpxOciAuth): + def __init__(self, config_file: str = DEFAULT_LOCATION, profile_name: str = DEFAULT_PROFILE): + config = oci.config.from_file(config_file, profile_name) + oci.config.validate_config(config) # type: ignore + key_content = "" + with open(config["key_file"]) as f: + key_content = f.read() + + self.signer = oci.signer.Signer( + tenancy=config["tenancy"], + user=config["user"], + fingerprint=config["fingerprint"], + private_key_file_location=config.get("key_file"), + pass_phrase="none", # type: ignore + private_key_content=key_content, + ) diff --git a/src/llama_stack/providers/remote/inference/oci/config.py b/src/llama_stack/providers/remote/inference/oci/config.py new file mode 100644 index 000000000..9747b08ea --- /dev/null +++ b/src/llama_stack/providers/remote/inference/oci/config.py @@ -0,0 +1,75 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + +import os +from typing import Any + +from pydantic import BaseModel, Field + +from llama_stack.providers.utils.inference.model_registry import RemoteInferenceProviderConfig +from llama_stack.schema_utils import json_schema_type + + +class OCIProviderDataValidator(BaseModel): + oci_auth_type: str = Field( + description="OCI authentication type (must be one of: instance_principal, config_file)", + ) + oci_region: str = Field( + description="OCI region (e.g., us-ashburn-1)", + ) + oci_compartment_id: str = Field( + description="OCI compartment ID for the Generative AI service", + ) + oci_config_file_path: str | None = Field( + default="~/.oci/config", + description="OCI config file path (required if oci_auth_type is config_file)", + ) + oci_config_profile: str | None = Field( + default="DEFAULT", + description="OCI config profile (required if oci_auth_type is config_file)", + ) + + +@json_schema_type +class OCIConfig(RemoteInferenceProviderConfig): + oci_auth_type: str = Field( + description="OCI authentication type (must be one of: instance_principal, config_file)", + default_factory=lambda: os.getenv("OCI_AUTH_TYPE", "instance_principal"), + ) + oci_region: str = Field( + default_factory=lambda: os.getenv("OCI_REGION", "us-ashburn-1"), + description="OCI region (e.g., us-ashburn-1)", + ) + oci_compartment_id: str = Field( + default_factory=lambda: os.getenv("OCI_COMPARTMENT_OCID", ""), + description="OCI compartment ID for the Generative AI service", + ) + oci_config_file_path: str = Field( + default_factory=lambda: os.getenv("OCI_CONFIG_FILE_PATH", "~/.oci/config"), + description="OCI config file path (required if oci_auth_type is config_file)", + ) + oci_config_profile: str = Field( + default_factory=lambda: os.getenv("OCI_CLI_PROFILE", "DEFAULT"), + description="OCI config profile (required if oci_auth_type is config_file)", + ) + + @classmethod + def sample_run_config( + cls, + oci_auth_type: str = "${env.OCI_AUTH_TYPE:=instance_principal}", + oci_config_file_path: str = "${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}", + oci_config_profile: str = "${env.OCI_CLI_PROFILE:=DEFAULT}", + oci_region: str = "${env.OCI_REGION:=us-ashburn-1}", + oci_compartment_id: str = "${env.OCI_COMPARTMENT_OCID:=}", + **kwargs, + ) -> dict[str, Any]: + return { + "oci_auth_type": oci_auth_type, + "oci_config_file_path": oci_config_file_path, + "oci_config_profile": oci_config_profile, + "oci_region": oci_region, + "oci_compartment_id": oci_compartment_id, + } diff --git a/src/llama_stack/providers/remote/inference/oci/oci.py b/src/llama_stack/providers/remote/inference/oci/oci.py new file mode 100644 index 000000000..253dcf2b6 --- /dev/null +++ b/src/llama_stack/providers/remote/inference/oci/oci.py @@ -0,0 +1,140 @@ +# Copyright (c) Meta Platforms, Inc. and affiliates. +# All rights reserved. +# +# This source code is licensed under the terms described in the LICENSE file in +# the root directory of this source tree. + + +from collections.abc import Iterable +from typing import Any + +import httpx +import oci +from oci.generative_ai.generative_ai_client import GenerativeAiClient +from oci.generative_ai.models import ModelCollection +from openai._base_client import DefaultAsyncHttpxClient + +from llama_stack.apis.inference.inference import ( + OpenAIEmbeddingsRequestWithExtraBody, + OpenAIEmbeddingsResponse, +) +from llama_stack.apis.models import ModelType +from llama_stack.log import get_logger +from llama_stack.providers.remote.inference.oci.auth import OciInstancePrincipalAuth, OciUserPrincipalAuth +from llama_stack.providers.remote.inference.oci.config import OCIConfig +from llama_stack.providers.utils.inference.openai_mixin import OpenAIMixin + +logger = get_logger(name=__name__, category="inference::oci") + +OCI_AUTH_TYPE_INSTANCE_PRINCIPAL = "instance_principal" +OCI_AUTH_TYPE_CONFIG_FILE = "config_file" +VALID_OCI_AUTH_TYPES = [OCI_AUTH_TYPE_INSTANCE_PRINCIPAL, OCI_AUTH_TYPE_CONFIG_FILE] +DEFAULT_OCI_REGION = "us-ashburn-1" + +MODEL_CAPABILITIES = ["TEXT_GENERATION", "TEXT_SUMMARIZATION", "TEXT_EMBEDDINGS", "CHAT"] + + +class OCIInferenceAdapter(OpenAIMixin): + config: OCIConfig + + async def initialize(self) -> None: + """Initialize and validate OCI configuration.""" + if self.config.oci_auth_type not in VALID_OCI_AUTH_TYPES: + raise ValueError( + f"Invalid OCI authentication type: {self.config.oci_auth_type}." + f"Valid types are one of: {VALID_OCI_AUTH_TYPES}" + ) + + if not self.config.oci_compartment_id: + raise ValueError("OCI_COMPARTMENT_OCID is a required parameter. Either set in env variable or config.") + + def get_base_url(self) -> str: + region = self.config.oci_region or DEFAULT_OCI_REGION + return f"https://inference.generativeai.{region}.oci.oraclecloud.com/20231130/actions/v1" + + def get_api_key(self) -> str | None: + # OCI doesn't use API keys, it uses request signing + return "" + + def get_extra_client_params(self) -> dict[str, Any]: + """ + Get extra parameters for the AsyncOpenAI client, including OCI-specific auth and headers. + """ + auth = self._get_auth() + compartment_id = self.config.oci_compartment_id or "" + + return { + "http_client": DefaultAsyncHttpxClient( + auth=auth, + headers={ + "CompartmentId": compartment_id, + }, + ), + } + + def _get_oci_signer(self) -> oci.signer.AbstractBaseSigner | None: + if self.config.oci_auth_type == OCI_AUTH_TYPE_INSTANCE_PRINCIPAL: + return oci.auth.signers.InstancePrincipalsSecurityTokenSigner() + return None + + def _get_oci_config(self) -> dict: + if self.config.oci_auth_type == OCI_AUTH_TYPE_INSTANCE_PRINCIPAL: + config = {"region": self.config.oci_region} + elif self.config.oci_auth_type == OCI_AUTH_TYPE_CONFIG_FILE: + config = oci.config.from_file(self.config.oci_config_file_path, self.config.oci_config_profile) + if not config.get("region"): + raise ValueError( + "Region not specified in config. Please specify in config or with OCI_REGION env variable." + ) + + return config + + def _get_auth(self) -> httpx.Auth: + if self.config.oci_auth_type == OCI_AUTH_TYPE_INSTANCE_PRINCIPAL: + return OciInstancePrincipalAuth() + elif self.config.oci_auth_type == OCI_AUTH_TYPE_CONFIG_FILE: + return OciUserPrincipalAuth( + config_file=self.config.oci_config_file_path, profile_name=self.config.oci_config_profile + ) + else: + raise ValueError(f"Invalid OCI authentication type: {self.config.oci_auth_type}") + + async def list_provider_model_ids(self) -> Iterable[str]: + """ + List available models from OCI Generative AI service. + """ + oci_config = self._get_oci_config() + oci_signer = self._get_oci_signer() + compartment_id = self.config.oci_compartment_id or "" + + if oci_signer is None: + client = GenerativeAiClient(config=oci_config) + else: + client = GenerativeAiClient(config=oci_config, signer=oci_signer) + + models: ModelCollection = client.list_models( + compartment_id=compartment_id, capability=MODEL_CAPABILITIES, lifecycle_state="ACTIVE" + ).data + + seen_models = set() + model_ids = [] + for model in models.items: + if model.time_deprecated or model.time_on_demand_retired: + continue + + if "CHAT" not in model.capabilities or "FINE_TUNE" in model.capabilities: + continue + + # Use display_name + model_type as the key to avoid conflicts + model_key = (model.display_name, ModelType.llm) + if model_key in seen_models: + continue + + seen_models.add(model_key) + model_ids.append(model.display_name) + + return model_ids + + async def openai_embeddings(self, params: OpenAIEmbeddingsRequestWithExtraBody) -> OpenAIEmbeddingsResponse: + # The constructed url is a mask that hits OCI's "chat" action, which is not supported for embeddings. + raise NotImplementedError("OCI Provider does not (currently) support embeddings") diff --git a/tests/integration/inference/test_openai_completion.py b/tests/integration/inference/test_openai_completion.py index 1568ffbe2..4ce2850b4 100644 --- a/tests/integration/inference/test_openai_completion.py +++ b/tests/integration/inference/test_openai_completion.py @@ -54,6 +54,7 @@ def skip_if_model_doesnt_support_openai_completion(client_with_models, model_id) # {"error":{"message":"Unknown request URL: GET /openai/v1/completions. Please check the URL for typos, # or see the docs at https://console.groq.com/docs/","type":"invalid_request_error","code":"unknown_url"}} "remote::groq", + "remote::oci", "remote::gemini", # https://generativelanguage.googleapis.com/v1beta/openai/completions -> 404 "remote::anthropic", # at least claude-3-{5,7}-{haiku,sonnet}-* / claude-{sonnet,opus}-4-* are not supported "remote::azure", # {'error': {'code': 'OperationNotSupported', 'message': 'The completion operation diff --git a/tests/integration/inference/test_openai_embeddings.py b/tests/integration/inference/test_openai_embeddings.py index 704775716..fe8070162 100644 --- a/tests/integration/inference/test_openai_embeddings.py +++ b/tests/integration/inference/test_openai_embeddings.py @@ -138,6 +138,7 @@ def skip_if_model_doesnt_support_openai_embeddings(client, model_id): "remote::runpod", "remote::sambanova", "remote::tgi", + "remote::oci", ): pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support OpenAI embeddings.")