llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-08 11:07:22 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	8a175129fc	fix weaviate, update run.yamls	2024-10-09 22:15:28 -07:00
Ashwin Bharambe	b55034c0de	Another round of simplification and clarity for models/shields/memory_banks stuff	2024-10-09 19:19:26 -07:00
Ashwin Bharambe	73a0a34e39	Kill non-llama guard shields	2024-10-08 17:47:03 -07:00
Ashwin Bharambe	24c61403b7	Fixes	2024-10-08 17:43:25 -07:00
Ashwin Bharambe	a86f3ae07d	Update run.yaml	2024-10-08 17:41:06 -07:00
Ashwin Bharambe	924b1fba09	minor	2024-10-08 17:26:26 -07:00
Ashwin Bharambe	8eee5b9adc	Fix server conditional awaiting on coroutines	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	7f1160296c	Updates to server.py to clean up streaming vs non-streaming stuff Also make sure agent turn create is correctly marked	2024-10-08 17:23:42 -07:00
Ashwin Bharambe	0c9eb3341c	Separate chat_completion stream and non-stream implementations This is a pretty important requirement. The streaming response type is an AsyncGenerator while the non-stream one is a single object. So far this has worked _sometimes_ due to various pre-existing hacks (and in some cases, just failed.)	2024-10-08 17:23:40 -07:00
Ashwin Bharambe	4ab6e1b81a	Add really basic testing for memory API weaviate does not work; the cluster URL seems malformed	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	3ae2b712e8	Add inference test Run it as: ``` PROVIDER_ID=test-remote \ PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \ pytest -s llama_stack/providers/tests/inference/test_inference.py \ --tb=auto \ --disable-warnings ```	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	4fa467731e	Fix a bug in meta-reference inference when stream=False Also introduce a gross hack (to cover grosser(?) hack) to ensure non-stream requests don't send back responses in SSE format. Not sure which of these hacks is grosser.	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	353c7dc82a	A few bug fixes for covering corner cases	2024-10-08 17:23:02 -07:00
Zain Hasan	118c0ef105	Partial cleanup of weaviate	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	3725e74906	memory bank registration fixes	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	91e0063593	Introduce model_store, shield_store, memory_bank_store	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	e45a417543	more fixes, plug shutdown handlers still, FastAPIs sigint handler is not calling ours	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	60dead6196	apis_to_serve -> apis	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	59302a86df	inference registry updates	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	4215cc9331	Push registration methods onto the backing providers	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	5a7b01d292	Significantly upgrade the interactive configuration experience	2024-10-08 17:23:02 -07:00
Ashwin Bharambe	f3923e3f0b	Redo the { models, shields, memory_banks } typeset	2024-10-08 17:23:02 -07:00
Xi Yan	27587f32bc	fix db path	2024-10-06 11:46:08 -07:00
Xi Yan	cfe3ad33b3	fix db path	2024-10-06 11:45:35 -07:00
Prithu Dasgupta	7abab7604b	add databricks provider (#83 ) * add databricks provider * update provider and test	2024-10-05 23:35:54 -07:00
Russell Bryant	f73e247ba1	Inline vLLM inference provider (#181 ) This is just like `local` using `meta-reference` for everything except it uses `vllm` for inference. Docker works, but So far, `conda` is a bit easier to use with the vllm provider. The default container base image does not include all the necessary libraries for all vllm features. More cuda dependencies are necessary. I started changing this base image used in this template, but it also required changes to the Dockerfile, so it was getting too involved to include in the first PR. Working so far: * `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream True` * `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False` Example: ``` $ python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False User>hello world, write me a 2 sentence poem about the moon Assistant> The moon glows bright in the midnight sky A beacon of light, ``` I have only tested these models: * `Llama3.1-8B-Instruct` - across 4 GPUs (tensor_parallel_size = 4) * `Llama3.2-1B-Instruct` - on a single GPU (tensor_parallel_size = 1)	2024-10-05 23:34:16 -07:00
Ashwin Bharambe	7f49315822	Kill a derpy import	2024-10-03 11:25:58 -07:00
Xi Yan	62d266f018	[CLI] avoid configure twice (#171 ) * avoid configure twice * cleanup tmp config * update output msg * address comment * update msg * script update	2024-10-03 11:20:54 -07:00
Ashwin Bharambe	210b71b0ba	fix prompt guard (#177 ) Several other fixes to configure. Add support for 1b/3b models in ollama.	2024-10-03 11:07:53 -07:00
Ashwin Bharambe	e9f6150588	A bit cleanup to avoid breakages	2024-10-02 21:31:09 -07:00
Xi Yan	703ab9385f	fix routing table key list	2024-10-02 18:23:31 -07:00
Ashwin Bharambe	8d049000e3	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
Ashwin Bharambe	fe4aabd690	provider_id => provider_type, adapter_id => adapter_type	2024-10-02 14:05:59 -07:00
Ashwin Bharambe	df68db644b	Refactoring distribution/distribution.py This file was becoming too large and unclear what it housed. Split it into pieces.	2024-10-02 14:03:02 -07:00
Russell Bryant	204eb6d810	docker: Check for selinux before using `--security-opt` (#167 ) Before using `--security-opt label=disable`, check that SELinux is enabled. Otherwise, the option is not relevant. This fixes errors on Mac. Closes #166 Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-02 10:37:41 -07:00
Ashwin Bharambe	bf0d111c53	Fix build script	2024-10-02 10:04:23 -07:00
Ashwin Bharambe	eb2d8a31a5	Add a RoutableProvider protocol, support for multiple routing keys (#163 ) * Update configure.py to use multiple routing keys for safety * Refactor distribution/datatypes into a providers/datatypes * Cleanup	2024-09-30 17:30:21 -07:00
Xi Yan	d28c3dfe0f	[CLI] simplify docker run (#159 ) * bake run.yaml inside docker, simplify run * add docker template examples * delete generated Dockerfile * unique deps * clean up debug * default entrypoint * address comments, update output msg * update msg * build output msg * configure msg * unique special_deps * remove quotes in configure	2024-09-30 15:04:04 -07:00
Russell Bryant	8db49de961	docker: Install in editable mode for dev purposes (#160 ) While rebuilding a stack using the `docker` image type and having `LLAMA_STACK_DIR` set so it installs `llama_stack` from my local source, I noticed that once built, it just used the image build cache and didn't pull in changes to my source. 1. Install in editable mode (`pip install -e`) for dev purposes. 2. Mount the source into the container for `configure` and `run` so that the editable install works. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-30 11:56:31 -07:00
Russell Bryant	cb36be320f	Fix podman+selinux compatibility (#132 ) When I ran `llama stack configure` for my `docker` based stack on my system using podman + SELinux (CentOS Stream 9), The `podman run` command failed due to SELinux blocking access to the volume mount. As a simple fix, disable SELinux label checking. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-29 20:19:44 -07:00
Ashwin Bharambe	5bf679cab6	Pull (extract) provider data from the provider instead of pushing from the top (#148 )	2024-09-29 20:00:51 -07:00
Xi Yan	6a8c2ae1df	[CLI] remove dependency on CONDA_PREFIX in CLI (#144 ) * remove dependency on CONDA_PREFIX in CLI * lint * typo * more robust	2024-09-28 16:46:47 -07:00
Xi Yan	4ae8c63a2b	pre-commit lint	2024-09-28 16:04:41 -07:00
Xi Yan	6236634d84	[bugfix] fix duplicate api endpoints (#139 ) * fix server api to serve * remove print	2024-09-27 15:32:50 -07:00
Xi Yan	208b861289	add env for LLAMA_STACK_CONFIG_DIR (#137 )	2024-09-27 14:16:46 -07:00
Xi Yan	ca7602a642	fix #100	2024-09-25 15:11:56 -07:00
Lucain	615ed4bfbc	Make TGI adapter compatible with HF Inference API (#97 )	2024-09-25 14:08:31 -07:00
Ashwin Bharambe	56aed59eb4	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
poegej	95abbf576b	Bump version to 0.0.24 (#94 ) Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-09-25 09:31:12 -07:00
Yogish Baliga	b85d675c6f	Adding safety adapter for Together	2024-09-24 18:35:48 -07:00

1 2

62 commits