llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-06 12:37:33 +00:00

Author	SHA1	Message	Date
Xi Yan	07f9bf723f	fix broken --list-templates with adding build.yaml files for packaging (#327 ) * add build files to templates * fix templates * manifest * symlink * symlink * precommit * change everything to docker build.yaml * remove image_type in templates * fix build from templates CLI * fix readmes	2024-10-25 12:51:22 -07:00
Ashwin Bharambe	afae4e3d8e	Update docker build flow a little	2024-10-25 10:06:21 -07:00
Ashwin Bharambe	5bed6c276c	Move function around	2024-10-25 09:18:22 -07:00
Ashwin Bharambe	a387ca22e2	Update docker_base for meta-reference-gpu	2024-10-25 09:13:33 -07:00
Ashwin Bharambe	70d59b0f5d	Make vllm inference better Tests still don't pass completely (some hang) so I think there are some potential threading issues maybe	2024-10-24 22:52:47 -07:00
Xi Yan	cb43caa2c3	start_container.sh prefix llamastack->distribution name	2024-10-24 21:29:17 -07:00
Sarthak Deshpande	df141b6ef3	Fix for get_agents_session (#300 )	2024-10-24 18:36:27 -07:00
Justin Lee	b6d8246b82	added templates and enhanced readme (#307 ) Co-authored-by: Justin Lee <justinai@fb.com>	2024-10-24 17:07:06 -07:00
Dinesh Yeduguru	3e1c3fdb3f	completion() for tgi (#295 )	2024-10-24 16:02:41 -07:00
Xi Yan	cb84034567	[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * add all rows scores to ScoringResult * bugfix * scoring function def rename	2024-10-24 14:52:30 -07:00
Xi Yan	e70420a06e	Update getting_started.md	2024-10-24 14:19:35 -07:00
Xi Yan	8615bc9e08	update manifest for build templates	2024-10-24 14:04:13 -07:00
Ashwin Bharambe	94728d6983	Handle both ipv6 and ipv4 interfaces together	2024-10-24 13:59:01 -07:00
Ashwin Bharambe	0538cc297e	Bump version to 0.0.45	2024-10-24 12:14:18 -07:00
Ashwin Bharambe	205bcfdd4e	Fix score threshold in faiss	2024-10-24 12:11:58 -07:00
Ashwin Bharambe	161aef0aae	Small updates to quantization config	2024-10-24 12:08:56 -07:00
Dalton Flanagan	8eceebec98	Update iOS inference instructions for new quantization	2024-10-24 14:47:27 -04:00
Ashwin Bharambe	8aa8847b4a	Bump version to 0.0.44	2024-10-24 08:41:39 -07:00
Ashwin Bharambe	7afe51c84d	New quantized models (#301 )	2024-10-24 08:38:56 -07:00
Ashwin Bharambe	05a8d47b98	Add a meta-reference-quantized-gpu distribution	2024-10-23 21:45:50 -07:00
Xi Yan	f5dcc03742	use pytorch/pytorch as base	2024-10-23 20:22:00 -07:00
Xi Yan	0cec86453b	Fix issue w/ routing_table api getting added when router api is not specified (#298 ) * fix issue w/ enforcing api * cleanup * inference only yaml	2024-10-23 15:27:22 -07:00
Dinesh Yeduguru	21f2e9adf5	dont set num_predict for all providers (#294 )	2024-10-23 11:44:04 -07:00
Ashwin Bharambe	ffb561070d	Support structured output for Together (#289 )	2024-10-22 22:36:38 -07:00
Sarthak Deshpande	2e5e46d896	Added tests for persistence (#274 )	2024-10-22 19:41:46 -07:00
Xi Yan	821810657f	[Evals API][2/n] datasets / datasetio meta-reference implementation (#288 ) * skeleton dataset / datasetio * dataset datasetio * config * address comments * delete dataset_utils * address comments * naming fix	2024-10-22 16:12:16 -07:00
Sarthak Deshpande	8a01b9e40c	Added implementations for get_agents_session, delete_agents_session and delete_agents (#267 )	2024-10-22 13:50:43 -07:00
Suraj Subramanian	b81a3bd46a	Fix import conflict for SamplingParams (#285 ) Conflict between llama_models.llama3.api.datatypes.SamplingParams and vllm.sampling_params.SamplingParams results in errors while processing VLLM engine requests	2024-10-22 12:56:00 -07:00
Ashwin Bharambe	c06718fbd5	Add support for Structured Output / Guided decoding (#281 ) Added support for structured output in the API and added a reference implementation for meta-reference. A few notes: * Two formats are specified in the API: Json schema and EBNF based grammar * Implementation only supports Json for now We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library. Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.	2024-10-22 12:53:34 -07:00
Anush	4c3d33e6f4	feat: Qdrant Vector index support (#221 ) This PR adds support for Qdrant - https://qdrant.tech/ to be used as a vector memory. I've unit-tested the methods to confirm that they work as intended. To run Qdrant ``` docker run -p 6333:6333 qdrant/qdrant ```	2024-10-22 12:50:19 -07:00
Suraj Subramanian	668a495aba	Add REST api example for chat_completion (#286 )	2024-10-22 10:35:20 -07:00
Xi Yan	e45f121c77	[Evals API] [1/n] Initial API (#287 ) * type system api * datasets api * fix * datasetio api * kill reward scoring * scoring functions + evals * move jobs, fix errors	2024-10-22 09:31:19 -07:00
Xi Yan	b279d3bc58	Update README.md	2024-10-22 08:01:33 -07:00
Dinesh Yeduguru	1d241bf3fe	add completion() for ollama (#280 )	2024-10-21 22:26:33 -07:00
raghotham	e2a5a2e10d	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
Xi Yan	dbb5ce43fc	Bump version to 0.0.43	2024-10-21 19:10:01 -07:00
Xi Yan	a2ff74a686	telemetry WARNING->WARN fix	2024-10-21 18:52:48 -07:00
Xi Yan	b1451afbc8	Update README.md	2024-10-21 18:21:30 -07:00
Xi Yan	4d2bd2d39e	add more distro templates (#279 ) * verify dockers * together distro verified * readme * fireworks distro * fireworks compose up * fireworks verified	2024-10-21 18:15:08 -07:00
Xi Yan	cf27d19dd5	fix sse_generator async	2024-10-21 14:03:42 -07:00
Ashwin Bharambe	1944405dca	Update new_api_provider.md	2024-10-21 14:02:51 -07:00
Ashwin Bharambe	606c48309e	Small updates to encourage integration testing	2024-10-21 13:52:33 -07:00
Xi Yan	cb203b14b4	update README.md	2024-10-21 13:51:39 -07:00
Xi Yan	3a7884345a	Update new_api_provider.md	2024-10-21 13:41:56 -07:00
Xi Yan	25b37c9ff7	Update new_api_provider.md	2024-10-21 13:41:46 -07:00
Xi Yan	af75618348	remove distribution/templates	2024-10-21 13:23:58 -07:00
Xi Yan	23210e8679	llama stack distributions / templates / docker refactor (#266 ) * docker compose ollama * comment * update compose file * readme for distributions * readme * move distribution folders * move distribution/templates to distributions/ * rename * kill distribution/templates * readme * readme * build/developer cookbook/new api provider * developer cookbook * readme * readme * [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264) * fix case where memory bank is registered without provider_id * memory test * agents unit test * Add an option to not use elastic agents for meta-reference inference (#269) * Allow overridding checkpoint_dir via config * Small rename * Make all methods `async def` again; add completion() for meta-reference (#270) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :) * Improve an important error message * update ollama for llama-guard3 * Add vLLM inference provider for OpenAI compatible vLLM server (#178) This PR adds vLLM inference provider for OpenAI compatible vLLM server. * Create .readthedocs.yaml Trying out readthedocs * Update event_logger.py (#275) spelling error * vllm * build templates * delete templates * tmp add back build to avoid merge conflicts * vllm * vllm --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: nehal-a2z <nehal@coderabbit.ai>	2024-10-21 11:17:53 -07:00
nehal-a2z	c995219731	Update event_logger.py (#275 ) spelling error	2024-10-21 10:46:53 -07:00
raghotham	cae5b0708b	Create .readthedocs.yaml Trying out readthedocs	2024-10-21 11:48:19 +05:30
Yuan Tang	a27a2cd2af	Add vLLM inference provider for OpenAI compatible vLLM server (#178 ) This PR adds vLLM inference provider for OpenAI compatible vLLM server.	2024-10-20 18:43:25 -07:00

... 30 31 32 33 34 ...

1913 commits