ehhuang
07ff15d917
chore: distrogen enables telemetry by default ( #3828 )
...
# What does this PR do?
leftover from #3815
## Test Plan
CI
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com ). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3828 ).
* #3830
* __->__ #3828
2025-10-16 11:29:51 -07:00
Charlie Doern
f22aaef42f
chore!: remove telemetry API usage ( #3815 )
...
# What does this PR do?
remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc
since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.
This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.
so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.
relates to #3806
Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-16 10:39:32 -07:00
ehhuang
6ba9db3929
chore!: BREAKING CHANGE: remove sqlite from telemetry config ( #3808 )
...
# What does this PR do?
- Removed sqlite sink from telemetry config.
- Removed related code
- Updated doc related to telemetry
## Test Plan
CI
2025-10-15 14:24:45 -07:00
IAN MILLER
007efa6eb5
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack ( #3183 )
...
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.
These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data )
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard ) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.
More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418 )
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2418
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 10:44:20 -04:00
Francisco Arceo
e7d21e1ee3
feat: Add support for Conversations in Responses API ( #3743 )
...
# What does this PR do?
This PR adds support for Conversations in Responses.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Unit tests
Integration tests
<Details>
<Summary>Manual testing with this script: (click to expand)</Summary>
```python
from openai import OpenAI
client = OpenAI()
client = OpenAI(base_url="http://localhost:8321/v1/ ", api_key="none")
def test_conversation_create():
print("Testing conversation create...")
conversation = client.conversations.create(
metadata={"topic": "demo"},
items=[
{"type": "message", "role": "user", "content": "Hello!"}
]
)
print(f"Created: {conversation}")
return conversation
def test_conversation_retrieve(conv_id):
print(f"Testing conversation retrieve for {conv_id}...")
retrieved = client.conversations.retrieve(conv_id)
print(f"Retrieved: {retrieved}")
return retrieved
def test_conversation_update(conv_id):
print(f"Testing conversation update for {conv_id}...")
updated = client.conversations.update(
conv_id,
metadata={"topic": "project-x"}
)
print(f"Updated: {updated}")
return updated
def test_conversation_delete(conv_id):
print(f"Testing conversation delete for {conv_id}...")
deleted = client.conversations.delete(conv_id)
print(f"Deleted: {deleted}")
return deleted
def test_conversation_items_create(conv_id):
print(f"Testing conversation items create for {conv_id}...")
items = client.conversations.items.create(
conv_id,
items=[
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "How are you?"}]
}
]
)
print(f"Items created: {items}")
return items
def test_conversation_items_list(conv_id):
print(f"Testing conversation items list for {conv_id}...")
items = client.conversations.items.list(conv_id, limit=10)
print(f"Items list: {items}")
return items
def test_conversation_item_retrieve(conv_id, item_id):
print(f"Testing conversation item retrieve for {conv_id}/{item_id}...")
item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id)
print(f"Item retrieved: {item}")
return item
def test_conversation_item_delete(conv_id, item_id):
print(f"Testing conversation item delete for {conv_id}/{item_id}...")
deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id)
print(f"Item deleted: {deleted}")
return deleted
def test_conversation_responses_create():
print("\nTesting conversation create for a responses example...")
conversation = client.conversations.create()
print(f"Created: {conversation}")
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
conversation=conversation.id,
)
print(f"Created response: {response} for conversation {conversation.id}")
return response, conversation
def test_conversations_responses_create_followup(
conversation,
content="Repeat what you just said but add 'this is my second time saying this'",
):
print(f"Using: {conversation.id}")
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": content}],
conversation=conversation.id,
)
print(f"Created response: {response} for conversation {conversation.id}")
conv_items = client.conversations.items.list(conversation.id)
print(f"\nRetrieving list of items for conversation {conversation.id}:")
print(conv_items.model_dump_json(indent=2))
def test_response_with_fake_conv_id():
fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44"
print(f"Using {fake_conv_id}")
try:
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "say hello"}],
conversation=fake_conv_id,
)
print(f"Created response: {response} for conversation {fake_conv_id}")
except Exception as e:
print(f"failed to create response for conversation {fake_conv_id} with error {e}")
def main():
print("Testing OpenAI Conversations API...")
# Create conversation
conversation = test_conversation_create()
conv_id = conversation.id
# Retrieve conversation
test_conversation_retrieve(conv_id)
# Update conversation
test_conversation_update(conv_id)
# Create items
items = test_conversation_items_create(conv_id)
# List items
items_list = test_conversation_items_list(conv_id)
# Retrieve specific item
if items_list.data:
item_id = items_list.data[0].id
test_conversation_item_retrieve(conv_id, item_id)
# Delete item
test_conversation_item_delete(conv_id, item_id)
# Delete conversation
test_conversation_delete(conv_id)
response, conversation2 = test_conversation_responses_create()
print('\ntesting reseponse retrieval')
test_conversation_retrieve(conversation2.id)
print('\ntesting responses follow up')
test_conversations_responses_create_followup(conversation2)
print('\ntesting responses follow up x2!')
test_conversations_responses_create_followup(
conversation2,
content="Repeat what you just said but add 'this is my third time saying this'",
)
test_response_with_fake_conv_id()
print("All tests completed!")
if __name__ == "__main__":
main()
```
</Details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 11:57:40 -07:00
ehhuang
a3f5072776
chore!: remove --env from llama stack run
( #3711 )
...
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?
user can simply set env vars in the beginning of the command.`FOO=BAR
llama stack run ...`
## Test Plan
Run
TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build
--distro=starter --image-type=venv --run
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com ). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711 ).
* #3714
* __->__ #3711
2025-10-07 20:58:15 -07:00
Chacksu
426dc54883
docs: Fix Dell distro documentation code snippets ( #3640 )
...
# What does this PR do?
* Updates code snippets for Dell distribution, fixing specific user home
directory in code (replacing with $HOME) and updates docker instructions
to use `docker` instead of `podman`.
## Test Plan
N.A.
Co-authored-by: Connor Hack <connorhack@fb.com>
2025-10-02 11:11:30 +02:00
Ashwin Bharambe
42414a1a1b
fix(logging): disable console telemetry sink by default ( #3623 )
...
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 21s
Test Llama Stack Build / build-single-provider (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 22s
API Conformance Tests / check-schema-compatibility (push) Successful in 33s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m12s
The current span processing dumps so much junk on the console that it
makes actual understanding of what is going on in the server impossible.
I am killing the console sink as a default. If you want, you are always
free to change your run.yaml to add it.
Before:
<img width="1877" height="1107" alt="image"
src="https://github.com/user-attachments/assets/3a7ad261-e2ba-4d40-9820-fcc282c8df37 "
/>
After:
<img width="1919" height="470" alt="image"
src="https://github.com/user-attachments/assets/bc7cf763-fba9-4e95-a4b5-f65f6d1c5332 "
/>
2025-09-30 14:58:05 -07:00
Chacksu
fffdab4f5c
fix: Dell distribution missing kvstore ( #3113 )
...
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Tests (Replay) / discover-tests (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do?
- Added kvstore config to ChromaDB provider config for Dell distribution
similar to [starter
config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112 )
- Fixed
[error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425 )
getting endpoint information by adding `hf-inference` as the provider to
the `AsyncInferenceClient` (TGI client).
## Test Plan
```
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=8000
export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321
export HF_TOKEN=[redacted]
# TGI Server
docker run --rm -it \
--pull always \
--network host \
-v $HOME/.cache/huggingface:/data \
-e HF_TOKEN=$HF_TOKEN \
-e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
-p $INFERENCE_PORT:$INFERENCE_PORT \
--gpus all \
ghcr.io/huggingface/text-generation-inference:latest \
--dtype float16 \
--usage-stats off \
--sharded false \
--cuda-memory-fraction 0.8 \
--model-id meta-llama/Llama-3.2-3B-Instruct \
--port $INFERENCE_PORT \
--hostname 0.0.0.0
# Chrome DB
docker run --rm -it \
--name chromadb \
--net=host -p 8000:8000 \
-v ~/chroma:/chroma/chroma \
-e IS_PERSISTENT=TRUE \
-e ANONYMIZED_TELEMETRY=FALSE \
chromadb/chroma:latest
# Llama Stack
llama stack run dell \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
```
---------
Co-authored-by: Connor Hack <connorhack@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-13 06:18:25 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions ( #3035 )
...
As the title says. Distributions is in, Templates is out.
`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.
Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00