llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 12:07:34 +00:00

History

Russell Bryant f73e247ba1 Inline vLLM inference provider (#181 ) This is just like `local` using `meta-reference` for everything except it uses `vllm` for inference. Docker works, but So far, `conda` is a bit easier to use with the vllm provider. The default container base image does not include all the necessary libraries for all vllm features. More cuda dependencies are necessary. I started changing this base image used in this template, but it also required changes to the Dockerfile, so it was getting too involved to include in the first PR. Working so far: * `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream True` * `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False` Example: ``` $ python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False User>hello world, write me a 2 sentence poem about the moon Assistant> The moon glows bright in the midnight sky A beacon of light, ``` I have only tested these models: * `Llama3.1-8B-Instruct` - across 4 GPUs (tensor_parallel_size = 4) * `Llama3.2-1B-Instruct` - on a single GPU (tensor_parallel_size = 1)		2024-10-05 23:34:16 -07:00
..
routers	fix routing table key list	2024-10-02 18:23:31 -07:00
server	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
templates	Inline vLLM inference provider (#181 )	2024-10-05 23:34:16 -07:00
utils	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
build.py	Kill a derpy import	2024-10-03 11:25:58 -07:00
build_conda_env.sh	fix prompt guard (#177 )	2024-10-03 11:07:53 -07:00
build_container.sh	[CLI] avoid configure twice (#171 )	2024-10-03 11:20:54 -07:00
common.sh	API Updates (#73 )	2024-09-17 19:51:35 -07:00
configure.py	fix prompt guard (#177 )	2024-10-03 11:07:53 -07:00
configure_container.sh	docker: Check for selinux before using `--security-opt` (#167 )	2024-10-02 10:37:41 -07:00
datatypes.py	fix prompt guard (#177 )	2024-10-03 11:07:53 -07:00
distribution.py	A bit cleanup to avoid breakages	2024-10-02 21:31:09 -07:00
inspect.py	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
request_headers.py	provider_id => provider_type, adapter_id => adapter_type	2024-10-02 14:05:59 -07:00
resolver.py	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
start_conda_env.sh	API Updates (#73 )	2024-09-17 19:51:35 -07:00
start_container.sh	docker: Check for selinux before using `--security-opt` (#167 )	2024-10-02 10:37:41 -07:00