ci: add new action to install ollama, cache the model (#2054)

# What does this PR do? This PR introduces a reusable GitHub Actions workflow for pulling and running an Ollama model, with caching to avoid repeated downloads. [//]: # (If resolving an issue, uncomment and update the line below) Closes: #1949 ## Test Plan 1. Trigger a workflow that uses the Ollama setup. Confirm that: - The model is pulled successfully. - It is placed in the correct directory, official at the moment (not ~ollama/.ollama/models as per comment so need to confirm this). 2. Re-run the same workflow to validate that: - The model is restored from the cache. - Execution succeeds with the cached model. [//]: # (## Documentation)
2025-05-06 13:56:20 +01:00 · 2025-05-06 13:56:20 +01:00 · 2413447467
commit 2413447467
parent 3022f7b642
2 changed files with 28 additions and 13 deletions
--- a/.github/actions/setup-ollama/action.yml
+++ b/.github/actions/setup-ollama/action.yml
@ -0,0 +1,26 @@
+name: Setup Ollama
+description: Start Ollama and cache model
+inputs:
+  models:
+    description: Comma-separated list of models to pull
+    default: "llama3.2:3b-instruct-fp16,all-minilm:latest"
+runs:
+  using: "composite"
+  steps:
+    - name: Install and start Ollama
+      shell: bash
+      run: |
+        # the ollama installer also starts the ollama service
+        curl -fsSL https://ollama.com/install.sh | sh
+
+    # Do NOT cache models - pulling the cache is actually slower than just pulling the model.
+    # It takes ~45 seconds to pull the models from the cache and unpack it, but only 30 seconds to
+    # pull them directly.
+    # Maybe this is because the cache is being pulled at the same time by all the matrix jobs?
+    - name: Pull requested models
+      if: inputs.models != ''
+      shell: bash
+      run: |
+        for model in $(echo "${{ inputs.models }}" | tr ',' ' '); do
+          ollama pull "$model"
+        done
--- a/.github/workflows/integration-tests.yml
+++ b/.github/workflows/integration-tests.yml
@ -38,19 +38,8 @@ jobs:
          python-version: "3.10"
          activate-environment: true

-      - name: Install and start Ollama
-        run: |
-          # the ollama installer also starts the ollama service
-          curl -fsSL https://ollama.com/install.sh | sh
-
-      # Do NOT cache models - pulling the cache is actually slower than just pulling the model.
-      # It takes ~45 seconds to pull the models from the cache and unpack it, but only 30 seconds to
-      # pull them directly.
-      # Maybe this is because the cache is being pulled at the same time by all the matrix jobs?
-      - name: Pull Ollama models (instruct and embed)
-        run: |
-          ollama pull llama3.2:3b-instruct-fp16
-          ollama pull all-minilm:latest
+      - name: Setup ollama
+        uses: ./.github/actions/setup-ollama

      - name: Set Up Environment and Install Dependencies
        run: |