llama-stack-mirror/CHANGELOG.md
slekkala1 9938f01791
Some checks failed
Installer CI / lint (push) Failing after 3s
Installer CI / smoke-test-on-dev (push) Failing after 3s
docs: update CHANGELOG.md for v0.2.23
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-09-26 21:44:35 +00:00

14 KiB

Changelog

v0.2.23

Published on: 2025-09-26T21:41:23Z

Highlights

  • Overhauls documentation with Docusaurus migration and modern formatting.
  • Standardizes Ollama and Fireworks provider with OpenAI compatibility layer.
  • Combines dynamic model discovery with static embedding metadata for better model information.
  • Refactors server.main for better code organization.
  • Introduces API leveling with post_training and eval promoted to v1alpha.

v0.2.22

Published on: 2025-09-16T20:15:26Z

Highlights

  • Migrated to unified "setups" system for test config
  • Added default inference store automatically during llama stack build
  • Introduced write queue for inference store
  • Proposed API leveling framework
  • Enhanced Together provider with embedding and dynamic model support

v0.2.21

Published on: 2025-09-08T22:30:47Z

Highlights

  • Testing infrastructure improvements and fixes
  • Backwards compatibility tests for core APIs
  • Added OpenAI Prompts API
  • Updated RAG Tool to use Files API and Vector Stores API
  • Descriptive MCP server connection errors

v0.2.20

Published on: 2025-08-29T22:25:32Z

Here are some key changes that are coming as part of this release.

Build and Environment

  • Environment improvements: fixed env var replacement to preserve types.
  • Docker stability: fixed container startup failures for Fireworks AI provider.
  • Removed absolute paths in build for better portability.

Features

  • UI Enhancements: Implemented file upload and VectorDB creation/configuration directly in UI.
  • Vector Store Improvements: Added keyword, vector, and hybrid search inside vector store.
  • Added S3 authorization support for file providers.
  • SQL Store: Added inequality support to where clause.

Documentation

  • Fixed post-training docs.
  • Added Contributor Guidelines for creating Internal vs. External providers.

Fixes

  • Removed unsupported bfcl scoring function.
  • Multiple reliability and configuration fixes for providers and environment handling.

Engineering / Chores

  • Cleaner internal development setup with consistent paths.
  • Incremental improvements to provider integration and vector store behavior.

New Contributors

  • @omertuc made their first contribution in #3270
  • @r3v5 made their first contribution in vector store hybrid search

v0.2.19

Published on: 2025-08-26T22:06:55Z

Highlights


v0.2.18

Published on: 2025-08-20T01:09:27Z

Highlights

  • Add moderations create API
  • Hybrid search in Milvus
  • Numerous Responses API improvements
  • Documentation updates

v0.2.17

Published on: 2025-08-05T01:51:14Z

Highlights


v0.2.16

Published on: 2025-07-28T23:35:23Z

Highlights

  • Automatic model registration for self-hosted providers (ollama and vllm currently). No need for INFERENCE_MODEL environment variables which need to be updated, etc.
  • Much simplified starter distribution. Most ENABLE_ env variables are now gone. When you set VLLM_URL, the vllm provider is auto-enabled. Similar for MILVUS_URL, PGVECTOR_DB, etc. Check the run.yaml for more details.
  • All tests migrated to pytest now (thanks @Elbehery)
  • DPO implementation in the post-training provider (thanks @Nehanth)
  • (Huge!) Support for external APIs and providers thereof (thanks @leseb, @cdoern and others). This is a really big deal -- you can now add more APIs completely out of tree and experiment with them before (optionally) wanting to contribute back.
  • inline::vllm provider is gone thank you very much
  • several improvements to OpenAI inference implementations and LiteLLM backend (thanks @mattf)
  • Chroma now supports Vector Store API (thanks @franciscojavierarceo).
  • Authorization improvements: Vector Store/File APIs now supports access control (thanks @franciscojavierarceo); Telemetry read APIs are gated according to logged-in user's roles.

v0.2.15

Published on: 2025-07-16T03:30:01Z


v0.2.14

Published on: 2025-07-04T16:06:48Z

Highlights

  • Support for Llama Guard 4
  • Added Milvus support to vector-stores API
  • Documentation and zero-to-hero updates for latest APIs

v0.2.13

Published on: 2025-06-28T04:28:11Z

Highlights

  • search_mode support in OpenAI vector store API
  • Security fixes

v0.2.12

Published on: 2025-06-20T22:52:12Z

Highlights

  • Filter support in file search
  • Support auth attributes in inference and response stores

v0.2.11

Published on: 2025-06-17T20:26:26Z

Highlights

  • OpenAI-compatible vector store APIs
  • Hybrid Search in Sqlite-vec
  • File search tool in Responses API
  • Pagination in inference and response stores
  • Added suffix to completions API for fill-in-the-middle tasks

v0.2.10.1

Published on: 2025-06-06T20:11:02Z

Highlights

  • ChromaDB provider fix

v0.2.10

Published on: 2025-06-05T23:21:45Z

Highlights

  • OpenAI-compatible embeddings API
  • OpenAI-compatible Files API
  • Postgres support in starter distro
  • Enable ingestion of precomputed embeddings
  • Full multi-turn support in Responses API
  • Fine-grained access control policy

v0.2.9

Published on: 2025-05-30T20:01:56Z

Highlights

  • Added initial streaming support in Responses API
  • UI view for Responses
  • Postgres inference store support

v0.2.8

Published on: 2025-05-27T21:03:47Z

Release v0.2.8

Highlights

  • Server-side MCP with auth firewalls now works in the Stack - both for Agents and Responses
  • Get chat completions APIs and UI to show chat completions
  • Enable keyword search for sqlite-vec

v0.2.7

Published on: 2025-05-16T20:38:10Z

Highlights

This is a small update. But a couple highlights:


v0.2.6

Published on: 2025-05-12T18:06:52Z


v0.2.5

Published on: 2025-05-04T20:16:49Z


v0.2.4

Published on: 2025-04-29T17:26:01Z

Highlights


v0.2.3

Published on: 2025-04-25T22:46:21Z

Highlights

  • OpenAI compatible inference endpoints and client-SDK support. client.chat.completions.create() now works.
  • significant improvements and functionality added to the nVIDIA distribution
  • many improvements to the test verification suite.
  • new inference providers: Ramalama, IBM WatsonX
  • many improvements to the Playground UI

v0.2.2

Published on: 2025-04-13T01:19:49Z

Main changes

  • Bring Your Own Provider (@leseb) - use out-of-tree provider code to execute the distribution server
  • OpenAI compatible inference API in progress (@bbrowning)
  • Provider verifications (@ehhuang)
  • Many updates and fixes to playground
  • Several llama4 related fixes

v0.2.1

Published on: 2025-04-05T23:13:00Z


v0.2.0

Published on: 2025-04-05T19:04:29Z

Llama 4 Support

Checkout more at https://www.llama.com


v0.1.9

Published on: 2025-03-29T00:52:23Z

Build and Test Agents

  • Agents: Entire document context with attachments
  • RAG: Documentation with sqlite-vec faiss comparison
  • Getting started: Fixes to getting started notebook.

Agent Evals and Model Customization

  • (New) Post-training: Add nemo customizer

Better Engineering

  • Moved sqlite-vec to non-blocking calls
  • Don't return a payload on file delete

v0.1.8

Published on: 2025-03-24T01:28:50Z

v0.1.8 Release Notes

Build and Test Agents

  • Safety: Integrated NVIDIA as a safety provider.
  • VectorDB: Added Qdrant as an inline provider.
  • Agents: Added support for multiple tool groups in agents.
  • Agents: Simplified imports for Agents in client package

Agent Evals and Model Customization

  • Introduced DocVQA and IfEval benchmarks.

Deploying and Monitoring Agents

  • Introduced a Containerfile and image workflow for the Playground.
  • Implemented support for Bearer (API Key) authentication.
  • Added attribute-based access control for resources.
  • Fixes on docker deployments: use --pull always and standardized the default port to 8321
  • Deprecated: /v1/inspect/providers use /v1/providers/ instead

Better Engineering

  • Consolidated scripts under the ./scripts directory.
  • Addressed mypy violations in various modules.
  • Added Dependabot scans for Python dependencies.
  • Implemented a scheduled workflow to update the changelog automatically.
  • Enforced concurrency to reduce CI loads.

New Contributors

Full Changelog: https://github.com/meta-llama/llama-stack/compare/v0.1.7...v0.1.8


v0.1.7

Published on: 2025-03-14T22:30:51Z

0.1.7 Release Notes

Build and Test Agents

  • Inference: ImageType is now refactored to LlamaStackImageType
  • Inference: Added tests to measure TTFT
  • Inference: Bring back usage metrics
  • Agents: Added endpoint for get agent, list agents and list sessions
  • Agents: Automated conversion of type hints in client tool for lite llm format
  • Agents: Deprecated ToolResponseMessage in agent.resume API
  • Added Provider API for listing and inspecting provider info

Agent Evals and Model Customization

  • Eval: Added new eval benchmarks Math 500 and BFCL v3
  • Deploy and Monitoring of Agents
  • Telemetry: Fix tracing to work across coroutines

Better Engineering

  • Display code coverage for unit tests
  • Updated call sites (inference, tool calls, agents) to move to async non blocking calls
  • Unit tests also run on Python 3.11, 3.12, and 3.13
  • Added ollama inference to Integration tests CI
  • Improved documentation across examples, testing, CLI, updated providers table )

v0.1.6

Published on: 2025-03-08T04:35:08Z

0.1.6 Release Notes

Build and Test Agents

  • Inference: Fixed support for inline vllm provider
  • (New) Agent: Build & Monitor Agent Workflows with Llama Stack + Anthropic's Best Practice Notebook
  • (New) Agent: Revamped agent documentation with more details and examples
  • Agent: Unify tools and Python SDK Agents API
  • Agent: AsyncAgent Python SDK wrapper supporting async client tool calls
  • Agent: Support python functions without @client_tool decorator as client tools
  • Agent: deprecation for allow_resume_turn flag, and remove need to specify tool_prompt_format
  • VectorIO: MilvusDB support added

Agent Evals and Model Customization

  • (New) Agent: Llama Stack RAG Lifecycle Notebook
  • Eval: Documentation for eval, scoring, adding new benchmarks
  • Eval: Distribution template to run benchmarks on llama & non-llama models
  • Eval: Ability to register new custom LLM-as-judge scoring functions
  • (New) Looking for contributors for open benchmarks. See documentation for details.

Deploy and Monitoring of Agents

  • Better support for different log levels across all components for better monitoring

Better Engineering

  • Enhance OpenAPI spec to include Error types across all APIs
  • Moved all tests to /tests and created unit tests to run on each PR
  • Removed all dependencies on llama-models repo

v0.1.5.1

Published on: 2025-02-28T22:37:44Z

0.1.5.1 Release Notes

Full Changelog: https://github.com/meta-llama/llama-stack/compare/v0.1.5...v0.1.5.1