mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
Embedding models are tiny and can be pulled on-demand. Let's do that so the user doesn't have to do "yet another thing" to get themselves set up. Thanks @hardikjshah for the suggestion. Also fixed a build dependency miss (TODO: distro_codegen needs to actually check that the build template contains all providers mentioned for the run.yaml file) ## Test Plan First run `ollama rm all-minilm:latest`. Run `llama stack build --template ollama && llama stack run ollama --env INFERENCE_MODEL=llama3.2:3b-instruct-fp16`. See that it outputs a "Pulling embedding model `all-minilm:latest`" output and the stack starts up correctly. Verify that `ollama list` shows the model is correctly downloaded. |
||
---|---|---|
.. | ||
bedrock | ||
cerebras | ||
dell | ||
experimental-post-training | ||
fireworks | ||
hf-endpoint | ||
hf-serverless | ||
meta-reference-gpu | ||
meta-reference-quantized-gpu | ||
nvidia | ||
ollama | ||
passthrough | ||
remote-vllm | ||
sambanova | ||
tgi | ||
together | ||
vllm-gpu | ||
__init__.py | ||
template.py |