llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Doug Edgar	e8cd8508b5	fix: handle missing external_providers_dir (#3974 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 50s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes the handling of the external_providers_dir configuration field to align with its ongoing deprecation, in favor of the provider `module` specification approach. It addresses the issue in #3950, where using the default provided run.yaml config resulted in the `external_providers_dir` parameter being set to the literal string `None`, and crashing the llama-stack server when starting. <!-- If resolving an issue, uncomment and update the line below --> Closes #3950 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Built a new container image from `podman build . -f containers/Containerfile --build-arg DISTRO_NAME=starter --tag llama-stack:starter` - Tested it locally with `podman run -it localhost/llama-stack:starter` - Tested it on an OpenShift 4.19 cluster, deployed via the llama-stack-k8s-operator. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-10-30 17:01:31 -07:00
Derek Higgins	ff2b270e2f	fix: relax structured output test assertions to handle whitespace and… (#3997 ) … case variations The ollama/llama3.2:3b-instruct-fp16 model returns string values with trailing whitespace in structured JSON output. Updated test assertions to use case-insensitive substring matching instead of exact equality. Use .lower() for case-insensitive comparison Check if expected value is contained in actual value (handles whitespace) Closes: #3996 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 16:55:23 -07:00
ehhuang	0e384a55a1	feat: support `workers` in run config (#3992 ) # What does this PR do? ## Test Plan Set workers: 4 in run.yaml. Start server and observe logs multiple times.	2025-10-30 16:34:12 -07:00
Ashwin Bharambe	6f90a7af4b	ci: target release-X.Y.x branches instead of release-X.Y.x-maint (#3995 ) We will be updating our release procedure to be more "normal" or "sane". We will - create release branches like normal people - land cherry-picks onto those branches - run releases off of those branches - no more "rc" branch pollution either Given that, this PR cleans things up a bit - Remove `-maint` suffix from release branch patterns in CI workflows - Update branch matching to `release-X.Y.x` format	2025-10-30 16:27:13 -07:00
Eric Huang	6a2c68168c	merge commit for archive created by Sapling	2025-10-30 15:48:51 -07:00
Eric Huang	b538c38d49	workers # What does this PR do? ## Test Plan	2025-10-30 15:48:44 -07:00
Ashwin Bharambe	90234d6973	ci: support release branches and match client branch (#3990 ) - Update workflows to trigger on release-X.Y.x-maint branches - When PR targets release branch, fetch matching branch from llama-stack-client-python - Falls back to main if matching client branch doesn't exist - Updated workflows: - integration-tests.yml - integration-auth-tests.yml - integration-sql-store-tests.yml - integration-vector-io-tests.yml - unit-tests.yml - backward-compat.yml - pre-commit.yml	2025-10-30 15:20:34 -07:00
Ashwin Bharambe	c2ae42b343	fix(ci): show pre-commit output easily on failure (#3985 ) Right now, the failed Step which is opened by GH by default tells me to just go up and click and scroll through for no reason.	2025-10-30 11:48:20 -07:00
Ashwin Bharambe	77c8bc6fa7	fix(ci): add back server:ci-tests to replay tests (#3976 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details Python Package Build Test / build (3.12) (push) Failing after 39s Details Unit Tests / unit-tests (3.12) (push) Failing after 40s Details UI Tests / ui-tests (22) (push) Successful in 42s Details It is useful for local debugging. If both server and docker are failing, you can just run server locally to debug which is much easier to do.	2025-10-30 11:02:59 -07:00
Eric Huang	c26daaf4af	merge commit for archive created by Sapling	2025-10-30 11:02:41 -07:00
Eric Huang	b71a870f04	chore: fix tele test # What does this PR do? ## Test Plan	2025-10-30 11:02:27 -07:00
ehhuang	5e20938832	fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture (#3982 ) # What does this PR do? this is meant to be a manual flag ## Test Plan CI	2025-10-30 09:13:04 -07:00
Eric Huang	2bb28ec0f6	merge commit for archive created by Sapling	2025-10-30 09:03:41 -07:00
Eric Huang	993e5215a4	fix: remove LLAMA_STACK_TEST_FORCE_SERVER_RESTART setting in fixture # What does this PR do? ## Test Plan	2025-10-30 09:03:27 -07:00
Sébastien Han	b4ea05ada9	chore: add batches to openapi schema (#3980 ) # What does this PR do? While working on https://github.com/llamastack/llama-stack/pull/3944 I realized that the batches API wasn't generated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-30 07:08:35 -07:00
Derek Higgins	19d85003de	test: Updated test skips that were marked with "inline::vllm" (#3979 ) This should be "remote::vllm". This causes some log probs tests to be skipped with remote vllm. (They fail if run). Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-10-30 14:48:21 +01:00
Ashwin Bharambe	174ef162b3	fix(mypy): add fast and full mypy modes (#3975 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Test Llama Stack Build / build (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details UI Tests / ui-tests (22) (push) Successful in 38s Details `mypy` became very slow for the common path. This can make local pre-commit runs very slow. Let's restore that. - restore fast mirrors-mypy hook for local runs - add optional mypy-full hook and docs so devs can match CI - run full mypy in CI with a hint when failures occur ### Test Plan - uv run pre-commit run mypy --all-files - uv run pre-commit run mypy-full --hook-stage manual --all-files - uv run --group dev --group type_checking mypy	2025-10-29 19:02:32 -07:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
Charlie Doern	0ef9166c7e	fix: make integration-tests.sh Mac friendly (#3971 ) # What does this PR do? When running ./scripts/integration-tests.sh --network host on mac fails regularly due to how Docker runs on MacOS. if on mac, keep network bridge mode. before: === Starting Docker Container === Using image: localhost/distribution-ci-tests:dev WARNING: Published ports are discarded when using host network mode Waiting for Docker container to start... ❌ Docker container failed to start Container logs: INFO 2025-10-29 18:38:32,180 llama_stack.cli.stack.run:100 cli: Using run configuration: /workspace/src/llama_stack/distributions/ci-tests/run.yaml ... (stack starts but is not reachable on network) after: === Starting Docker Container === Using image: localhost/distribution-ci-tests:dev Using bridge networking with port mapping (non-Linux) Waiting for Docker container to start... ✅ Docker container started successfully === Running Integration Tests === ## Test Plan integration tests pass! Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 14:12:09 -07:00
Ashwin Bharambe	da8f014b96	feat(models): list models available via provider_data header (#3968 ) ## Summary When users provide API keys via `X-LlamaStack-Provider-Data` header, `models.list()` now returns models they can access from those providers, not just pre-registered models from the registry. This complements the routing fix from `f88416ef8` which enabled inference calls with `provider_id/model_id` format for unregistered models. Users can now discover which models are available to them before making inference requests. The implementation reuses `NeedsRequestProviderData.get_request_provider_data()` to validate credentials, then dynamically fetches models from providers without caching them since they're user-specific. Registry models take precedence to respect any pre-configured aliases. ## Test Script ```python #!/usr/bin/env python3 import json import os from openai import OpenAI # Test 1: Without provider_data header client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy") models = client.models.list() anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id] print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic") # Test 2: With provider_data header containing Anthropic API key anthropic_api_key = os.environ["ANTHROPIC_API_KEY"] client_with_key = OpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="dummy", default_headers={ "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key}) } ) models_with_key = client_with_key.models.list() anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id] print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic") print(f"Anthropic models: {anthropic_with}") assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key" print("\n✓ Test passed!") ``` Run with a stack that has Anthropic provider configured (but without API key in config): ```bash ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py ```	2025-10-29 14:03:03 -07:00
Ashwin Bharambe	c9d4b6c54f	chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969 ) ## Summary Fixes all mypy type errors in `providers/inline/agents/meta_reference/` and removes exclusions from pyproject.toml. ## Changes - Fix type annotations for Safety API message parameters (OpenAIMessageParam) - Add Action enum usage in access control checks - Correct method signatures to match API supertype (parameter ordering) - Handle optional return types with proper None checks - Remove 3 meta_reference exclusions from mypy config Files fixed: 25 errors across 3 files (safety.py, persistence.py, agents.py)	2025-10-29 13:37:28 -07:00
Omar Abdelwahab	e6b27db30a	docs: A getting started notebook featuring simple agent examples. (#3955 ) # What does this PR do? Getting started notebook featuring simple agent examples. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-29 14:13:34 -04:00
Ashwin Bharambe	7dc48a75e5	chore: delete openapi.stainless.yaml for now. not source of truth. (#3967 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Test llama stack list-deps / list-deps (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 2m34s Details This is really not the source of truth yet and is causing more confusion right now.	2025-10-29 10:45:38 -07:00
Nathan Weinberg	b90c6a2c8b	fix(docs): remove leftover telemetry sidebar section (#3961 ) Leftover telemetry section was preventing `npm run build` from completing successfully Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 11:20:13 -04:00
Nathan Weinberg	10977caff3	fix: typo in .gitignore (#3960 ) typo in https://github.com/llamastack/llama-stack/pull/3959 (whoops) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 11:08:47 -04:00
Ashwin Bharambe	a4f97559d1	fix(mypy): part-03 completely resolve meta reference responses impl typing issues (#3951 ) ## Summary Resolves all mypy errors in meta reference agent OpenAI responses implementation by adding proper type narrowing, None checks, and Sequence type support. ## Changes - Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py, agent_instance.py - Added Sequence type support to schema generator (ensures correct JSON schema generation) - Applied union type narrowing and None checks throughout ## Test plan - All modified files pass mypy type checking (0 errors) - Schema generator produces correct `type: array` for Sequence types --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:07:15 -07:00
Ashwin Bharambe	e5c27dbcbf	fix(mypy): part-02 resolve OpenAI compatibility layer type issues (#3947 ) ## Summary Fixes 111 mypy type errors in OpenAI compatibility layer (PR3 in mypy remediation series). Changes: - `litellm_openai_mixin.py`: Added type annotations, None checks for tool_config/model_store access - `openai_compat.py`: Added None checks throughout, fixed TypedDict expansions, proper type conversions for messages/tool_calls Result: 23 → 1 errors in litellm file, 88 → 0 errors in openai_compat file --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:06:40 -07:00
Ashwin Bharambe	ce31aa1704	fix(mypy-cleanup): part-01 resolve meta reference agent type issues (126 errors) (#3945 ) Error fixes in Agents implementation (`meta-reference` provider) -- adding proper type annotations and using type narrowing for optional attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")` instead of `x.foo` directly Part of ongoing mypy remediation effort. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 07:54:30 -07:00
Nathan Weinberg	22bf0d0471	chore: ignore API docs generation (#3959 ) See `1432743473` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 10:27:53 -04:00
Nathan Weinberg	b6bb8fbf64	ci: add pre-commit check ensuring FIPS compliance (#3899 ) # What does this PR do? this commit adds a new pre-commit hook to scan for non-FIPS compliant function usage within llama-stack Closes #3427 ## Test Plan Ran locally Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 10:21:35 -04:00
Ashwin Bharambe	e809d21357	feat: add backward compatibility tests for run.yaml (#3952 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (push) Failing after 45s Details API Conformance Tests / check-schema-compatibility (push) Successful in 54s Details UI Tests / ui-tests (22) (push) Successful in 52s Details Pre-commit / pre-commit (push) Successful in 3m28s Details This adds automated backward compatibility testing for `run.yaml` files. As we evolve `StackRunConfig`, changes can inadvertently break existing user configurations. This workflow catches those breaks before merge. We test old run.yaml files (from main and the latest release) against the PR's new code. If configs that worked before now fail, the PR is blocked unless explicitly acknowledged as a breaking change. Two test layers: - Schema validation: Quick pytest checks that configs parse without errors - Integration tests: Full test suite execution to catch runtime semantic issues (cross-field validations, provider initialization, etc.) What we test against: - main branch: Breaking changes here block the PR (this is the gate) - Latest release: Informational only - shows if we've drifted from what users have If tests fail, the PR author must acknowledge the breaking change by adding `!:` to the PR title (e.g., `feat!: change xyz`) or including `BREAKING CHANGE:` in a commit message. Once acknowledged, the check passes with a warning. These jobs are run: 1. `check-main-compatibility` - Schema validation of all distribution run.yaml files from main 2. `test-integration-main` - Full integration test suite using main's ci-tests run.yaml 3. `test-integration-release` - Integration tests with latest release config (informational) 4. `check-schema-release-compatibility` - Schema checks against release (informational) The integration tests catch issues that schema validation alone would miss, like assertion failures in `StackRunConfig.validate_server_stores()` or provider-specific runtime logic. Resolves #3311 Related to #3237	2025-10-28 21:51:56 -07:00
Derek Higgins	c678682cdd	chore: remove unused methods from InferenceRouter (#3953 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 24s Details Test llama stack list-deps / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (push) Failing after 32s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 40s Details UI Tests / ui-tests (22) (push) Successful in 59s Details Test Llama Stack Build / build (push) Failing after 1m1s Details Pre-commit / pre-commit (push) Successful in 5m23s Details Remove unused methods that became obsolete after `d266c59c`: o _compute_and_log_token_usage o _count_tokens o stream_tokens_and_compute_metrics o count_tokens_and_compute_metrics These methods are no longer referenced anywhere in the codebase following the removal of deprecated inference.chat_completion implementations. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-28 17:12:41 -07:00
ehhuang	1aa8979050	test: enable telemetry tests in server mode (#3927 ) # What does this PR do? - added a server-based test OLTP collector ## Test Plan CI	2025-10-28 16:33:48 -07:00
ehhuang	1f9d48cd54	feat: openai files provider (#3946 ) # What does this PR do? - Adds OpenAI files provider - Note that file content retrieval is pretty limited by `purpose` https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com ## Test Plan Modify run yaml to use openai files provider: ``` files: - provider_id: openai provider_type: remote::openai config: api_key: ${env.OPENAI_API_KEY:=} metadata_store: backend: sql_default table_name: openai_files_metadata # Then run files tests ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files ```	2025-10-28 16:25:03 -07:00
raghotham	feabcdd67b	docs: add documentation on how to use custom run yaml in docker (#3949 ) as title test plan: ```yaml # custom-ollama-run.yaml version: 2 image_name: starter external_providers_dir: /.llama/providers.d apis: - inference - vector_io - files - safety - tool_runtime - agents providers: inference: # Single Ollama provider for all models - provider_id: ollama provider_type: remote::ollama config: url: ${env.OLLAMA_URL:=http://localhost:11434} vector_io: - provider_id: faiss provider_type: inline::faiss config: persistence: namespace: vector_io::faiss backend: kv_default files: - provider_id: meta-reference-files provider_type: inline::localfs config: storage_dir: /.llama/files metadata_store: table_name: files_metadata backend: sql_default safety: - provider_id: llama-guard provider_type: inline::llama-guard config: excluded_categories: [] tool_runtime: - provider_id: rag-runtime provider_type: inline::rag-runtime agents: - provider_id: meta-reference provider_type: inline::meta-reference config: persistence: agent_state: namespace: agents backend: kv_default responses: table_name: responses backend: sql_default max_write_queue_size: 10000 num_writers: 4 storage: backends: kv_default: type: kv_sqlite db_path: /.llama/kvstore.db sql_default: type: sql_sqlite db_path: /.llama/sql_store.db stores: metadata: namespace: registry backend: kv_default inference: table_name: inference_store backend: sql_default max_write_queue_size: 10000 num_writers: 4 conversations: table_name: openai_conversations backend: sql_default registered_resources: models: # All models use the same 'ollama' provider - model_id: llama3.2-vision:latest provider_id: ollama provider_model_id: llama3.2-vision:latest model_type: llm - model_id: llama3.2:3b provider_id: ollama provider_model_id: llama3.2:3b model_type: llm # Embedding models - model_id: nomic-embed-text-v2-moe provider_id: ollama provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K model_type: embedding metadata: embedding_dimension: 768 shields: [] vector_dbs: [] datasets: [] scoring_fns: [] benchmarks: [] tool_groups: [] server: port: 8321 telemetry: enabled: true vector_stores: default_provider_id: faiss default_embedding_model: provider_id: ollama model_id: toshk0/nomic-embed-text-v2-moe:Q6_K ``` ```bash docker run -it --pull always -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT -v ~/.llama:/root/.llama -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml -e RUN_CONFIG_PATH=/app/custom-run.yaml -e OLLAMA_URL=http://host.docker.internal:11434/ llamastack/distribution-starter:0.3.0 --port $LLAMA_STACK_PORT ```	2025-10-28 16:05:44 -07:00
ehhuang	e08fc119ba	Merge `e76303c001` into sapling-pr-archive-ehhuang	2025-10-28 13:05:17 -07:00
Eric Huang	e76303c001	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 13:05:13 -07:00
Eric Huang	45b77c3a10	merge commit for archive created by Sapling	2025-10-28 12:57:13 -07:00
Eric Huang	9a7c62de48	tele tests # What does this PR do? ## Test Plan	2025-10-28 12:57:03 -07:00
ehhuang	7dd2fb1ed6	Merge `23dfa39137` into sapling-pr-archive-ehhuang	2025-10-28 12:54:39 -07:00
Eric Huang	23dfa39137	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 12:54:33 -07:00
Eric Huang	a9d96166cc	merge commit for archive created by Sapling	2025-10-28 12:46:50 -07:00
Eric Huang	889ce058b9	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 12:46:45 -07:00
ehhuang	0745d308a8	Merge `a7b506718d` into sapling-pr-archive-ehhuang	2025-10-28 12:43:55 -07:00
Eric Huang	a7b506718d	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 12:38:18 -07:00
Eric Huang	6b386db8fc	merge commit for archive created by Sapling	2025-10-28 12:20:02 -07:00
Eric Huang	177958657f	tele tests # What does this PR do? ## Test Plan	2025-10-28 12:19:50 -07:00
Eric Huang	2e468e0d1f	merge commit for archive created by Sapling	2025-10-28 12:08:33 -07:00
Eric Huang	b3c9d5a15f	tele tests # What does this PR do? ## Test Plan	2025-10-28 12:08:27 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00

1 2 3 4 5 ...

3479 commits