llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 02:03:44 +00:00

History

ehhuang c5e2e269e2 Some checks failed Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Failing after 7s Details Test Llama Stack Build / build-single-provider (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Test External API and Providers / test-external (venv) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s Details Test Llama Stack Build / generate-matrix (push) Failing after 21s Details Test Llama Stack Build / build (push) Has been skipped Details UI Tests / ui-tests (22) (push) Failing after 21s Details feat(api): introduce /rerank (#2940 ) # What does this PR do? Context: https://github.com/meta-llama/llama-stack/issues/2937 The API design is inspired by existing offerings, but not exactly the same: * `top_n` as the parameter to control number of results, instead of `top_k`, since `n` is conventional to control number * `truncation` bool instead of `max_token_per_doc`, since we should just handle the truncation automatically depending on model capability, instead of user setting the context length manually. * `data` field in the response, to be consistent with other OpenAI APIs (though they don't have a rerank API). Also, it is one less name to learn in the API. ## Test Plan Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>		2025-08-21 18:23:16 -07:00
..
css	fix: improve Mermaid diagram visibility in dark mode (#2092 )	2025-05-02 13:09:45 -07:00
js	docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093 )	2025-08-11 11:11:09 -07:00
providers/vector_io	docs: Document sqlite-vec faiss comparison (#1821 )	2025-03-28 17:41:33 +01:00
llama-stack-logo.png	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
llama-stack-spec.html	feat(api): introduce /rerank (#2940 )	2025-08-21 18:23:16 -07:00
llama-stack-spec.yaml	feat(api): introduce /rerank (#2940 )	2025-08-21 18:23:16 -07:00
llama-stack.png	Make a new llama stack image	2024-11-22 23:49:22 -08:00
remote_or_local.gif	[docs] update documentations (#356 )	2024-11-04 16:52:38 -08:00
safety_system.webp	[Docs] Zero-to-Hero notebooks and quick start documentation (#368 )	2024-11-08 17:16:44 -08:00