llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-12 20:12:33 +00:00

Author	SHA1	Message	Date
Xi Yan	f6340a47d1	inclusion->subsetof	2024-10-24 16:13:49 -07:00
Xi Yan	29e48cc5c1	chatcompletion & completion input type validation	2024-10-24 16:11:25 -07:00
Xi Yan	e468e23249	Merge branch 'main' into evals_6	2024-10-24 14:59:41 -07:00
Ashwin Bharambe	7b1a45ee0f	Fix score threshold in faiss	2024-10-24 14:57:10 -07:00
Dalton Flanagan	1721d91c95	Update iOS inference instructions for new quantization	2024-10-24 14:57:10 -07:00
Xi Yan	cb84034567	[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * add all rows scores to ScoringResult * bugfix * scoring function def rename	2024-10-24 14:52:30 -07:00
Xi Yan	d4887fc746	address comments	2024-10-24 14:49:02 -07:00
Xi Yan	ba0186f2c8	refactor	2024-10-24 14:00:41 -07:00
Xi Yan	3db1b3fbcd	rebase name	2024-10-24 13:53:41 -07:00
Xi Yan	97ca72288c	Merge branch 'evals_5' into evals_6	2024-10-24 13:53:00 -07:00
Xi Yan	6053b8dd34	scoring function def rename	2024-10-24 13:51:11 -07:00
Xi Yan	689990b48b	Merge branch 'evals_5' into evals_6	2024-10-24 13:06:11 -07:00
Xi Yan	42bac85e1f	bugfix	2024-10-24 12:16:28 -07:00
Ashwin Bharambe	205bcfdd4e	Fix score threshold in faiss	2024-10-24 12:11:58 -07:00
Xi Yan	24dce9cb7a	minor typing	2024-10-24 12:08:57 -07:00
Xi Yan	32a496ab0f	Merge branch 'evals_5' into evals_6	2024-10-24 12:01:41 -07:00
Xi Yan	a3a8f32541	add all rows scores to ScoringResult	2024-10-24 11:53:15 -07:00
Dalton Flanagan	8eceebec98	Update iOS inference instructions for new quantization	2024-10-24 14:47:27 -04:00
Xi Yan	737fcb795f	evals with generation	2024-10-24 11:30:13 -07:00
Xi Yan	071dba8871	Merge branch 'main' into evals_5	2024-10-24 09:18:15 -07:00
Ashwin Bharambe	7afe51c84d	New quantized models (#301 )	2024-10-24 08:38:56 -07:00
Xi Yan	afa0c2b146	address comments	2024-10-23 22:17:38 -07:00
Xi Yan	59c93548bc	validate scorer input	2024-10-23 17:43:41 -07:00
Xi Yan	0ee82571a8	refactor	2024-10-23 17:30:10 -07:00
Xi Yan	7c803cef86	update scoring test	2024-10-23 17:22:48 -07:00
Xi Yan	3c6555c408	score batch	2024-10-23 16:38:00 -07:00
Xi Yan	eb572faf6f	score batch impl	2024-10-23 16:19:25 -07:00
Xi Yan	4b1d7da030	equality scorer	2024-10-23 16:07:17 -07:00
Xi Yan	35981a1a3b	scorer wip	2024-10-23 15:02:54 -07:00
Xi Yan	70c08e694d	basic scoring function works	2024-10-23 14:42:28 -07:00
Xi Yan	5930a92dc7	datasetio client	2024-10-23 14:04:51 -07:00
Xi Yan	bb43369521	dataset client	2024-10-23 13:53:58 -07:00
Xi Yan	c5db025320	error checking	2024-10-23 13:17:47 -07:00
Xi Yan	d8bbce6f7c	comments	2024-10-23 13:16:08 -07:00
Xi Yan	555f6e1531	cleanup	2024-10-23 13:07:15 -07:00
Xi Yan	92e32f80ad	test_scoring	2024-10-23 13:01:49 -07:00
Xi Yan	7c280e18fb	dataset validation	2024-10-23 12:08:39 -07:00
Xi Yan	aefa84e70a	wip	2024-10-22 20:00:43 -07:00
Xi Yan	821810657f	[Evals API][2/n] datasets / datasetio meta-reference implementation (#288 ) * skeleton dataset / datasetio * dataset datasetio * config * address comments * delete dataset_utils * address comments * naming fix	2024-10-22 16:12:16 -07:00
Sarthak Deshpande	8a01b9e40c	Added implementations for get_agents_session, delete_agents_session and delete_agents (#267 )	2024-10-22 13:50:43 -07:00
Suraj Subramanian	b81a3bd46a	Fix import conflict for SamplingParams (#285 ) Conflict between llama_models.llama3.api.datatypes.SamplingParams and vllm.sampling_params.SamplingParams results in errors while processing VLLM engine requests	2024-10-22 12:56:00 -07:00
Ashwin Bharambe	c06718fbd5	Add support for Structured Output / Guided decoding (#281 ) Added support for structured output in the API and added a reference implementation for meta-reference. A few notes: * Two formats are specified in the API: Json schema and EBNF based grammar * Implementation only supports Json for now We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library. Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.	2024-10-22 12:53:34 -07:00
Ashwin Bharambe	2089427d60	Make all methods `async def` again; add completion() for meta-reference (#270 ) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)	2024-10-18 20:50:59 -07:00
Ashwin Bharambe	71a905e93f	Allow overridding checkpoint_dir via config	2024-10-18 14:28:06 -07:00
Ashwin Bharambe	33afd34e6f	Add an option to not use elastic agents for meta-reference inference (#269 )	2024-10-18 12:51:10 -07:00
Ashwin Bharambe	09b793c4d6	Fix fp8 implementation which had bit-rotten a bit I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load from fp8" codepath. YAML I tested with: ``` providers: - provider_id: quantized provider_type: meta-reference-quantized config: model: Llama3.1-8B-Instruct quantization: type: fp8 ```	2024-10-15 13:57:01 -07:00
Yuan Tang	80ada04f76	Remove request arg from chat completion response processing (#240 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-10-15 13:03:17 -07:00
Yuan Tang	2128e61da2	Fix incorrect completion() signature for Databricks provider (#236 )	2024-10-11 08:47:57 -07:00
Xi Yan	ca29980c6b	fix agents context retriever	2024-10-10 20:17:29 -07:00
Ashwin Bharambe	1ff0476002	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00

1 2

90 commits