mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-27 06:28:50 +00:00
ci: use ollama container image with loaded models
Instead of downloading the models each time we now have a single Ollama container that is baked with the models pulled and ready to use. This will remove the CI flakiness on model pulling. Signed-off-by: Sébastien Han <seb@redhat.com>
This commit is contained in:
parent
692709cd45
commit
c8b5774ff3
4 changed files with 29 additions and 194 deletions
23
.github/actions/setup-ollama/action.yml
vendored
23
.github/actions/setup-ollama/action.yml
vendored
|
@ -1,26 +1,9 @@
|
|||
name: Setup Ollama
|
||||
description: Start Ollama and cache model
|
||||
inputs:
|
||||
models:
|
||||
description: Comma-separated list of models to pull
|
||||
default: "llama3.2:3b-instruct-fp16,all-minilm:latest"
|
||||
description: Start Ollama
|
||||
runs:
|
||||
using: "composite"
|
||||
steps:
|
||||
- name: Install and start Ollama
|
||||
- name: Start Ollama
|
||||
shell: bash
|
||||
run: |
|
||||
# the ollama installer also starts the ollama service
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
|
||||
# Do NOT cache models - pulling the cache is actually slower than just pulling the model.
|
||||
# It takes ~45 seconds to pull the models from the cache and unpack it, but only 30 seconds to
|
||||
# pull them directly.
|
||||
# Maybe this is because the cache is being pulled at the same time by all the matrix jobs?
|
||||
- name: Pull requested models
|
||||
if: inputs.models != ''
|
||||
shell: bash
|
||||
run: |
|
||||
for model in $(echo "${{ inputs.models }}" | tr ',' ' '); do
|
||||
ollama pull "$model"
|
||||
done
|
||||
docker run -d --name ollama -p 11434:11434 docker.io/leseb/ollama-with-models
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue