Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
14 KiB
Changelog
v0.2.23
Published on: 2025-09-26T21:41:23Z
Highlights
- Overhauls documentation with Docusaurus migration and modern formatting.
- Standardizes Ollama and Fireworks provider with OpenAI compatibility layer.
- Combines dynamic model discovery with static embedding metadata for better model information.
- Refactors server.main for better code organization.
- Introduces API leveling with post_training and eval promoted to v1alpha.
v0.2.22
Published on: 2025-09-16T20:15:26Z
Highlights
- Migrated to unified "setups" system for test config
- Added default inference store automatically during llama stack build
- Introduced write queue for inference store
- Proposed API leveling framework
- Enhanced Together provider with embedding and dynamic model support
v0.2.21
Published on: 2025-09-08T22:30:47Z
Highlights
- Testing infrastructure improvements and fixes
- Backwards compatibility tests for core APIs
- Added OpenAI Prompts API
- Updated RAG Tool to use Files API and Vector Stores API
- Descriptive MCP server connection errors
v0.2.20
Published on: 2025-08-29T22:25:32Z
Here are some key changes that are coming as part of this release.
Build and Environment
- Environment improvements: fixed env var replacement to preserve types.
- Docker stability: fixed container startup failures for Fireworks AI provider.
- Removed absolute paths in build for better portability.
Features
- UI Enhancements: Implemented file upload and VectorDB creation/configuration directly in UI.
- Vector Store Improvements: Added keyword, vector, and hybrid search inside vector store.
- Added S3 authorization support for file providers.
- SQL Store: Added inequality support to where clause.
Documentation
- Fixed post-training docs.
- Added Contributor Guidelines for creating Internal vs. External providers.
Fixes
- Removed unsupported bfcl scoring function.
- Multiple reliability and configuration fixes for providers and environment handling.
Engineering / Chores
- Cleaner internal development setup with consistent paths.
- Incremental improvements to provider integration and vector store behavior.
New Contributors
- @omertuc made their first contribution in #3270
- @r3v5 made their first contribution in vector store hybrid search
v0.2.19
Published on: 2025-08-26T22:06:55Z
Highlights
- feat: Add CORS configuration support for server by @skamenan7 in https://github.com/llamastack/llama-stack/pull/3201
- feat(api): introduce /rerank by @ehhuang in https://github.com/llamastack/llama-stack/pull/2940
- feat: Add S3 Files Provider by @mattf in https://github.com/llamastack/llama-stack/pull/3202
v0.2.18
Published on: 2025-08-20T01:09:27Z
Highlights
- Add moderations create API
- Hybrid search in Milvus
- Numerous Responses API improvements
- Documentation updates
v0.2.17
Published on: 2025-08-05T01:51:14Z
Highlights
- feat(tests): introduce inference record/replay to increase test reliability by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2941
- fix(library_client): improve initialization error handling and prevent AttributeError by @mattf in https://github.com/meta-llama/llama-stack/pull/2944
- fix: use OLLAMA_URL to activate Ollama provider in starter by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2963
- feat(UI): adding MVP playground UI by @franciscojavierarceo in https://github.com/meta-llama/llama-stack/pull/2828
- Standardization of errors (@nathan-weinberg)
- feat: Enable DPO training with HuggingFace inline provider by @Nehanth in https://github.com/meta-llama/llama-stack/pull/2825
- chore: rename templates to distributions by @ashwinb in https://github.com/meta-llama/llama-stack/pull/3035
v0.2.16
Published on: 2025-07-28T23:35:23Z
Highlights
- Automatic model registration for self-hosted providers (ollama and vllm currently). No need for
INFERENCE_MODEL
environment variables which need to be updated, etc. - Much simplified starter distribution. Most
ENABLE_
env variables are now gone. When you setVLLM_URL
, thevllm
provider is auto-enabled. Similar forMILVUS_URL
,PGVECTOR_DB
, etc. Check the run.yaml for more details. - All tests migrated to pytest now (thanks @Elbehery)
- DPO implementation in the post-training provider (thanks @Nehanth)
- (Huge!) Support for external APIs and providers thereof (thanks @leseb, @cdoern and others). This is a really big deal -- you can now add more APIs completely out of tree and experiment with them before (optionally) wanting to contribute back.
inline::vllm
provider is gone thank you very much- several improvements to OpenAI inference implementations and LiteLLM backend (thanks @mattf)
- Chroma now supports Vector Store API (thanks @franciscojavierarceo).
- Authorization improvements: Vector Store/File APIs now supports access control (thanks @franciscojavierarceo); Telemetry read APIs are gated according to logged-in user's roles.
v0.2.15
Published on: 2025-07-16T03:30:01Z
v0.2.14
Published on: 2025-07-04T16:06:48Z
Highlights
- Support for Llama Guard 4
- Added Milvus support to vector-stores API
- Documentation and zero-to-hero updates for latest APIs
v0.2.13
Published on: 2025-06-28T04:28:11Z
Highlights
- search_mode support in OpenAI vector store API
- Security fixes
v0.2.12
Published on: 2025-06-20T22:52:12Z
Highlights
- Filter support in file search
- Support auth attributes in inference and response stores
v0.2.11
Published on: 2025-06-17T20:26:26Z
Highlights
- OpenAI-compatible vector store APIs
- Hybrid Search in Sqlite-vec
- File search tool in Responses API
- Pagination in inference and response stores
- Added
suffix
to completions API for fill-in-the-middle tasks
v0.2.10.1
Published on: 2025-06-06T20:11:02Z
Highlights
- ChromaDB provider fix
v0.2.10
Published on: 2025-06-05T23:21:45Z
Highlights
- OpenAI-compatible embeddings API
- OpenAI-compatible Files API
- Postgres support in starter distro
- Enable ingestion of precomputed embeddings
- Full multi-turn support in Responses API
- Fine-grained access control policy
v0.2.9
Published on: 2025-05-30T20:01:56Z
Highlights
- Added initial streaming support in Responses API
- UI view for Responses
- Postgres inference store support
v0.2.8
Published on: 2025-05-27T21:03:47Z
Release v0.2.8
Highlights
- Server-side MCP with auth firewalls now works in the Stack - both for Agents and Responses
- Get chat completions APIs and UI to show chat completions
- Enable keyword search for sqlite-vec
v0.2.7
Published on: 2025-05-16T20:38:10Z
Highlights
This is a small update. But a couple highlights:
- feat: function tools in OpenAI Responses by @bbrowning in https://github.com/meta-llama/llama-stack/pull/2094, getting closer to ready. Streaming is the next missing piece.
- feat: Adding support for customizing chunk context in RAG insertion and querying by @franciscojavierarceo in https://github.com/meta-llama/llama-stack/pull/2134
- feat: scaffolding for Llama Stack UI by @ehhuang in https://github.com/meta-llama/llama-stack/pull/2149, more to come in the coming releases.
v0.2.6
Published on: 2025-05-12T18:06:52Z
v0.2.5
Published on: 2025-05-04T20:16:49Z
v0.2.4
Published on: 2025-04-29T17:26:01Z
Highlights
- One-liner to install and run Llama Stack yay! by @reluctantfuturist in https://github.com/meta-llama/llama-stack/pull/1383
- support for NVIDIA NeMo datastore by @raspawar in https://github.com/meta-llama/llama-stack/pull/1852
- (yuge!) Kubernetes authentication by @leseb in https://github.com/meta-llama/llama-stack/pull/1778
- (yuge!) OpenAI Responses API by @bbrowning in https://github.com/meta-llama/llama-stack/pull/1989
- add api.llama provider, llama-guard-4 model by @ashwinb in https://github.com/meta-llama/llama-stack/pull/2058
v0.2.3
Published on: 2025-04-25T22:46:21Z
Highlights
- OpenAI compatible inference endpoints and client-SDK support.
client.chat.completions.create()
now works. - significant improvements and functionality added to the nVIDIA distribution
- many improvements to the test verification suite.
- new inference providers: Ramalama, IBM WatsonX
- many improvements to the Playground UI
v0.2.2
Published on: 2025-04-13T01:19:49Z
Main changes
- Bring Your Own Provider (@leseb) - use out-of-tree provider code to execute the distribution server
- OpenAI compatible inference API in progress (@bbrowning)
- Provider verifications (@ehhuang)
- Many updates and fixes to playground
- Several llama4 related fixes
v0.2.1
Published on: 2025-04-05T23:13:00Z
v0.2.0
Published on: 2025-04-05T19:04:29Z
Llama 4 Support
Checkout more at https://www.llama.com
v0.1.9
Published on: 2025-03-29T00:52:23Z
Build and Test Agents
- Agents: Entire document context with attachments
- RAG: Documentation with sqlite-vec faiss comparison
- Getting started: Fixes to getting started notebook.
Agent Evals and Model Customization
- (New) Post-training: Add nemo customizer
Better Engineering
- Moved sqlite-vec to non-blocking calls
- Don't return a payload on file delete
v0.1.8
Published on: 2025-03-24T01:28:50Z
v0.1.8 Release Notes
Build and Test Agents
- Safety: Integrated NVIDIA as a safety provider.
- VectorDB: Added Qdrant as an inline provider.
- Agents: Added support for multiple tool groups in agents.
- Agents: Simplified imports for Agents in client package
Agent Evals and Model Customization
- Introduced DocVQA and IfEval benchmarks.
Deploying and Monitoring Agents
- Introduced a Containerfile and image workflow for the Playground.
- Implemented support for Bearer (API Key) authentication.
- Added attribute-based access control for resources.
- Fixes on docker deployments: use --pull always and standardized the default port to 8321
- Deprecated: /v1/inspect/providers use /v1/providers/ instead
Better Engineering
- Consolidated scripts under the ./scripts directory.
- Addressed mypy violations in various modules.
- Added Dependabot scans for Python dependencies.
- Implemented a scheduled workflow to update the changelog automatically.
- Enforced concurrency to reduce CI loads.
New Contributors
- @cmodi-meta made their first contribution in https://github.com/meta-llama/llama-stack/pull/1650
- @jeffmaury made their first contribution in https://github.com/meta-llama/llama-stack/pull/1671
- @derekhiggins made their first contribution in https://github.com/meta-llama/llama-stack/pull/1698
- @Bobbins228 made their first contribution in https://github.com/meta-llama/llama-stack/pull/1745
Full Changelog: https://github.com/meta-llama/llama-stack/compare/v0.1.7...v0.1.8
v0.1.7
Published on: 2025-03-14T22:30:51Z
0.1.7 Release Notes
Build and Test Agents
- Inference: ImageType is now refactored to LlamaStackImageType
- Inference: Added tests to measure TTFT
- Inference: Bring back usage metrics
- Agents: Added endpoint for get agent, list agents and list sessions
- Agents: Automated conversion of type hints in client tool for lite llm format
- Agents: Deprecated ToolResponseMessage in agent.resume API
- Added Provider API for listing and inspecting provider info
Agent Evals and Model Customization
- Eval: Added new eval benchmarks Math 500 and BFCL v3
- Deploy and Monitoring of Agents
- Telemetry: Fix tracing to work across coroutines
Better Engineering
- Display code coverage for unit tests
- Updated call sites (inference, tool calls, agents) to move to async non blocking calls
- Unit tests also run on Python 3.11, 3.12, and 3.13
- Added ollama inference to Integration tests CI
- Improved documentation across examples, testing, CLI, updated providers table )
v0.1.6
Published on: 2025-03-08T04:35:08Z
0.1.6 Release Notes
Build and Test Agents
- Inference: Fixed support for inline vllm provider
- (New) Agent: Build & Monitor Agent Workflows with Llama Stack + Anthropic's Best Practice Notebook
- (New) Agent: Revamped agent documentation with more details and examples
- Agent: Unify tools and Python SDK Agents API
- Agent: AsyncAgent Python SDK wrapper supporting async client tool calls
- Agent: Support python functions without @client_tool decorator as client tools
- Agent: deprecation for allow_resume_turn flag, and remove need to specify tool_prompt_format
- VectorIO: MilvusDB support added
Agent Evals and Model Customization
- (New) Agent: Llama Stack RAG Lifecycle Notebook
- Eval: Documentation for eval, scoring, adding new benchmarks
- Eval: Distribution template to run benchmarks on llama & non-llama models
- Eval: Ability to register new custom LLM-as-judge scoring functions
- (New) Looking for contributors for open benchmarks. See documentation for details.
Deploy and Monitoring of Agents
- Better support for different log levels across all components for better monitoring
Better Engineering
- Enhance OpenAPI spec to include Error types across all APIs
- Moved all tests to /tests and created unit tests to run on each PR
- Removed all dependencies on llama-models repo
v0.1.5.1
Published on: 2025-02-28T22:37:44Z
0.1.5.1 Release Notes
- Fixes for security risk in https://github.com/meta-llama/llama-stack/pull/1327 and https://github.com/meta-llama/llama-stack/pull/1328
Full Changelog: https://github.com/meta-llama/llama-stack/compare/v0.1.5...v0.1.5.1