llama-stack-mirror/docs/source
Wen Zhou 4bca4af3e4
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 4s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 26s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 36s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.12, inference) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.12, scoring) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 5s
Integration Tests / test-matrix (server, 3.12, datasets) (push) Failing after 32s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.12, inspect) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 22s
Integration Tests / test-matrix (server, 3.12, agents) (push) Failing after 16s
Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 24s
Integration Tests / test-matrix (server, 3.12, providers) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 18s
Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 20s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 34s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 33s
Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 30s
Python Package Build Test / build (3.12) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 39s
Update ReadTheDocs / update-readthedocs (push) Failing after 41s
Unit Tests / unit-tests (3.12) (push) Failing after 46s
Pre-commit / pre-commit (push) Successful in 1m30s
refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- we are using `all-minilm:l6-v2` but the model we download from ollama
is `all-minilm:latest`
  latest: https://ollama.com/library/all-minilm:latest 1b226e2802db
  l6-v2: https://ollama.com/library/all-minilm:l6-v2 pin 1b226e2802db
- even currently they are exactly the same model but if
[all-minilm:l12-v2](https://ollama.com/library/all-minilm:l12-v2) is
updated, "latest" might not be the same for l6-v2.
- the only change in this PR is pin the model id in ollama
- also update detailed_tutorial with "starter" to replace deprecated
"ollama".

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
>INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
>llama stack build --run --template ollama --image-type venv
...
Build Successful!
You can find the newly-built template here: /home/wenzhou/zdtsw-forking/lls/llama-stack/llama_stack/templates/ollama/run.yaml
....
 - metadata:                                                                                                                                  
     embedding_dimension: 384                                                                                                                 
   model_id: all-MiniLM-L6-v2                                                                                                                 
   model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType                                                                 
   - embedding                                                                                                                                
   provider_id: ollama                                                                                                                        
   provider_model_id: all-minilm:l6-v2  
   ...
```
test
```
>llama-stack-client inference chat-completion --message "Write me a 2-sentence poem about the moon"
           INFO:httpx:HTTP Request: GET http://localhost:8321/v1/models "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/chat/completions "HTTP/1.1 200 OK"
OpenAIChatCompletion(
    id='chatcmpl-04f99071-3da2-44ba-a19f-03b5b7fc70b7',
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content="Here is a 2-sentence poem about the moon:\n\nSilver crescent in the midnight sky,\nLuna's gentle face, a beauty to the eye.",
                name=None,
                tool_calls=None,
                refusal=None,
                annotations=None,
                audio=None,
                function_call=None
            ),
            logprobs=None
        )
    ],
    created=1751644429,
    model='llama3.2:3b-instruct-fp16',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_ollama',
    usage={'completion_tokens': 33, 'prompt_tokens': 36, 'total_tokens': 69, 'completion_tokens_details': None, 'prompt_tokens_details': None}
)
```

---------

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
2025-07-06 09:07:37 +05:30
..
building_applications feat: improve telemetry (#2590) 2025-07-04 17:29:09 +02:00
concepts docs: specify the ability to train non-Llama models (#2573) 2025-07-01 19:29:06 +05:30
contributing docs: revamp testing documentation (#2155) 2025-05-13 11:28:29 -07:00
distributions refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627) 2025-07-06 09:07:37 +05:30
getting_started refactor: set proper name for embedding all-minilm:l6-v2 and update to use "starter" in detailed_tutorial (#2627) 2025-07-06 09:07:37 +05:30
introduction docs: Remove mentions of focus on Llama models (#1690) 2025-03-19 00:17:22 -04:00
openai docs: Add OpenAI API compatibility page (#2316) 2025-06-04 06:51:52 -04:00
playground chore: simplify running the demo UI (#1907) 2025-04-09 11:22:29 -07:00
providers feat: improve telemetry (#2590) 2025-07-04 17:29:09 +02:00
references chore: remove last instances of code-interpreter provider (#2143) 2025-05-12 10:54:43 -07:00
conf.py fix: use pypi browser agent (#2260) 2025-05-24 23:26:30 -07:00
index.md docs: update full list of providers with matched APIs and dockerhub images (#2452) 2025-07-03 10:12:56 +02:00