# What does this PR do?
we used to have ` host = config.server.host or ["::", "0.0.0.0"]` but
now only bind to ` host = config.server.host or "0.0.0.0"`
revert back to the old logic, this allows us to curl
http://localhost:8321/v1/models on fedora, which defaults to using IPv6.
resolves#4210
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Address all feedback from PR #3962:
**Code Quality Improvements:**
- Rename `_uvicorn_run` → `_run_server` for accurate method naming
- Refactor error handling: move Gunicorn fallback logic from `_run_with_gunicorn` to caller
- Update comments to reflect both Uvicorn and Gunicorn behavior
- Update test mock from `_uvicorn_run` to `_run_server`
**Environment Variable:**
- Change `LLAMA_STACK_DISABLE_GUNICORN` → `LLAMA_STACK_ENABLE_GUNICORN`
- More intuitive positive logic (no double negatives)
- Defaults to `true` on Unix systems
- Clearer log messages distinguishing platform limitations vs explicit disable
**Documentation:**
- Remove unnecessary `uv sync --group unit --group test` from user docs
- Clarify SQLite limitations: "SQLite only allows one writer at a time"
- Accurate explanation: WAL mode enables concurrent reads but writes are serialized
- Strong recommendation for PostgreSQL in production with high traffic
**Architecture:**
- Better separation of concerns: `_run_with_gunicorn` just executes, caller handles fallback
- Exceptions propagate to caller for centralized decision making
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
# What does this PR do?
llama stack run --providers takes a list of providers in the format of
api1=provider1,api2=provider2
this allows users to run with a simple list of providers.
given the architecture of `create_app`, this run config needs to be
written to disk. use ~/.llama/distribution/providers-run/run.yaml each
time for consistency
resolves#3956
## Test Plan
new unit tests to ensure --providers.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR fixes the handling of the external_providers_dir configuration
field to align with its ongoing deprecation, in favor of the provider
`module` specification approach.
It addresses the issue in #3950, where using the default provided
run.yaml config resulted in the `external_providers_dir` parameter being
set to the literal string `None`, and crashing the llama-stack server
when starting.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3950
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
- Built a new container image from `podman build . -f
containers/Containerfile --build-arg DISTRO_NAME=starter --tag
llama-stack:starter`
- Tested it locally with `podman run -it localhost/llama-stack:starter`
- Tested it on an OpenShift 4.19 cluster, deployed via the
llama-stack-k8s-operator.
Signed-off-by: Doug Edgar <dedgar@redhat.com>
Telemetry tests use an OTLP collector that expects single-process
telemetry spans. Gunicorn's multi-process architecture spawns multiple
workers, each with separate telemetry instrumentation, preventing the
test collector from capturing all spans.
This commit adds LLAMA_STACK_DISABLE_GUNICORN environment variable
support and sets it in telemetry test configuration to ensure
single-process Uvicorn is used during tests while maintaining
production multi-process behavior.
Fixes failing tests:
- test_streaming_chunk_count
- test_telemetry_format_completeness
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs
## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)
**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import
**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package
## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Migrates package structure to src/ layout following Python packaging
best practices.
All code moved from `llama_stack/` to `src/llama_stack/`. Public API
unchanged - imports remain `import llama_stack.*`.
Updated build configs, pre-commit hooks, scripts, and GitHub workflows
accordingly. All hooks pass, package builds cleanly.
**Developer note**: Reinstall after pulling: `pip install -e .`
2025-10-27 12:02:21 -07:00
Renamed from llama_stack/cli/stack/run.py (Browse further)