llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-08 19:10:56 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	b55034c0de	Another round of simplification and clarity for models/shields/memory_banks stuff	2024-10-09 19:19:26 -07:00
Ashwin Bharambe	f40cd62306	Test fixes	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	640c5c54f7	rename augment_messages	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	ed899a5dec	Convert TGI to work with openai_compat	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	05e73d12b3	introduce openai_compat with the completions (not chat-completions) API This keeps the prompt encoding layer in our control (see `chat_completion_request_to_prompt()` method)	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	f8752ab8dc	weaviate fixes, test now passes	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	4ab6e1b81a	Add really basic testing for memory API weaviate does not work; the cluster URL seems malformed	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	dba7caf1d0	Fix fireworks and update the test Don't look for eom_id / eot_id sadly since providers don't return the last token	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	bbd3a02615	Make Together inference work using the raw completions API	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	3ae2b712e8	Add inference test Run it as: ``` PROVIDER_ID=test-remote \ PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \ pytest -s llama_stack/providers/tests/inference/test_inference.py \ --tb=auto \ --disable-warnings ```	2024-10-08 17:23:02 -07:00