llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-13 19:32:37 +00:00

Author	SHA1	Message	Date
Xi Yan	202667f3db	delete templates	2024-10-21 11:03:34 -07:00
Xi Yan	3ca822f4cd	build templates	2024-10-21 11:02:32 -07:00
Xi Yan	ca2e7f52bd	vllm	2024-10-21 11:00:50 -07:00
nehal-a2z	8ef3d3d239	Update event_logger.py (#275 ) spelling error	2024-10-21 10:48:50 -07:00
raghotham	af52c22c5e	Create .readthedocs.yaml Trying out readthedocs	2024-10-21 10:46:47 -07:00
Yuan Tang	74e6356b51	Add vLLM inference provider for OpenAI compatible vLLM server (#178 ) This PR adds vLLM inference provider for OpenAI compatible vLLM server.	2024-10-21 10:46:45 -07:00
Ashwin Bharambe	391dedd1c0	update ollama for llama-guard3	2024-10-21 10:46:40 -07:00
Ashwin Bharambe	89759a0ad3	Improve an important error message	2024-10-21 10:46:40 -07:00
Ashwin Bharambe	5863f65874	Make all methods `async def` again; add completion() for meta-reference (#270 ) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)	2024-10-21 10:46:40 -07:00
Ashwin Bharambe	92aca57bfa	Small rename	2024-10-21 10:46:40 -07:00
Ashwin Bharambe	6f4537b4c4	Allow overridding checkpoint_dir via config	2024-10-21 10:46:40 -07:00
Ashwin Bharambe	a90ab5878b	Add an option to not use elastic agents for meta-reference inference (#269 )	2024-10-21 10:46:40 -07:00
Xi Yan	2f5c410c73	[bugfix] fix case for agent when memory bank registered without specifying provider_id (#264 ) * fix case where memory bank is registered without provider_id * memory test * agents unit test	2024-10-21 10:46:40 -07:00
Xi Yan	29c8edb4f6	readme	2024-10-21 09:11:25 -07:00
Xi Yan	5ea36b0274	readme	2024-10-21 09:03:05 -07:00
Xi Yan	d4caab3c67	developer cookbook	2024-10-21 09:01:34 -07:00
Xi Yan	302fa5c4bb	build/developer cookbook/new api provider	2024-10-21 09:01:22 -07:00
Xi Yan	f58441cc21	readme	2024-10-18 18:55:29 -07:00
Xi Yan	100b5fecd4	readme	2024-10-18 18:53:49 -07:00
Xi Yan	955743ba7a	kill distribution/templates	2024-10-18 17:32:11 -07:00
Xi Yan	c830235936	rename	2024-10-18 17:28:26 -07:00
Xi Yan	cbb423a32f	move distribution/templates to distributions/	2024-10-18 17:21:50 -07:00
Xi Yan	b4aca0aeb6	move distribution folders	2024-10-18 17:05:41 -07:00
Xi Yan	fd90d2ae97	readme	2024-10-18 14:30:44 -07:00
Xi Yan	a3f748a875	readme for distributions	2024-10-18 14:21:44 -07:00
Xi Yan	dcac9e4874	update compose file	2024-10-18 11:12:27 -07:00
Xi Yan	542ffbee72	comment	2024-10-17 19:37:22 -07:00
Xi Yan	293d8f2895	docker compose ollama	2024-10-17 19:31:29 -07:00
Ashwin Bharambe	9fcf5d58e0	Allow overriding MODEL_IDS for inference test	2024-10-17 10:03:27 -07:00
Xi Yan	02be26098a	getting started	2024-10-16 23:56:21 -07:00
Xi Yan	cf9e5b76b2	Update getting_started.md	2024-10-16 23:52:29 -07:00
Xi Yan	7cc47da8f2	Update getting_started.md	2024-10-16 23:50:31 -07:00
Xi Yan	d787d1e84f	config templates restructure, docs (#262 ) * wip * config templates * readmes	2024-10-16 23:25:10 -07:00
Tam	a07dfffbbf	initial changes (#261 ) Update the parsing logic for comma-separated list and download function	2024-10-16 23:15:59 -07:00
ATH	319a6b5f83	Update getting_started.md (#260 )	2024-10-16 18:05:36 -07:00
Xi Yan	c4d5d6bb91	Docker compose scripts for remote adapters (#241 ) * tgi docker compose * path * wait for tgi server to start before starting server * update provider-id * move scripts to distribution/ folder * add readme * readme	2024-10-15 16:32:53 -07:00
Matthieu FRONTON	770647dede	Fix broken rendering in Google Colab (#247 )	2024-10-15 15:41:49 -07:00
Ashwin Bharambe	09b793c4d6	Fix fp8 implementation which had bit-rotten a bit I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load from fp8" codepath. YAML I tested with: ``` providers: - provider_id: quantized provider_type: meta-reference-quantized config: model: Llama3.1-8B-Instruct quantization: type: fp8 ```	2024-10-15 13:57:01 -07:00
Yuan Tang	80ada04f76	Remove request arg from chat completion response processing (#240 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-10-15 13:03:17 -07:00
Xi Yan	209cd3d35e	Bump version to 0.0.42	2024-10-14 11:13:04 -07:00
Yuan Tang	a2b87ed0cb	Switch to pre-commit/action (#239 )	2024-10-11 11:09:11 -07:00
Yuan Tang	05282d1234	Enable pre-commit on main branch (#237 )	2024-10-11 10:03:59 -07:00
Yuan Tang	2128e61da2	Fix incorrect completion() signature for Databricks provider (#236 )	2024-10-11 08:47:57 -07:00
Dalton Flanagan	9fbe8852aa	Add Swift Package Index badge	2024-10-10 23:39:25 -04:00
Xi Yan	ca29980c6b	fix agents context retriever	2024-10-10 20:17:29 -07:00
Ashwin Bharambe	1ff0476002	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
Xi Yan	7ff5800dea	generate openapi	2024-10-10 15:30:34 -07:00
Dalton Flanagan	a3e65d58a9	Add logo	2024-10-10 15:04:21 -04:00
Russell Bryant	eba9d1ea14	ci: Run pre-commit checks in CI (#176 ) Run the pre-commit checks in a github workflow to validate that a PR or a direct push to the repo does not introduce new errors.	2024-10-10 11:21:59 -07:00
Ashwin Bharambe	89d24a07f0	Bump version to 0.0.41	2024-10-10 10:27:03 -07:00

1 2 3 4 5 ...

334 commits