Commit graph

676 commits

Author SHA1 Message Date
Xi Yan
ce040ad111 precommit 2025-02-19 22:35:24 -08:00
Shrinit Goyal
b74f25035c
Added support for mongoDB KV store (#543)
Added the support for mongoDB as KV store
validated in mongodb, it is able to store agent data, session data and
turn data
<img width="1332" alt="image"
src="https://github.com/user-attachments/assets/867700a4-b9ee-4a3c-8278-f39074d39d56">
this is how run.yaml would look:
```
    config:
      persistence_store:
        type: mongodb
        namespace: null
        host: localhost
        port: 27017
        db: llamastack
        user: ""
        password: ""
        collection_name: llamastack_kvstore
```

---------

Co-authored-by: shrinitgoyal <shrinit.goyal@engati.com>
2025-02-19 22:30:50 -08:00
Yuan Tang
5966079770
fix: More robust handling of the arguments in tool call response in remote::vllm (#1169)
# What does this PR do?

This fixes the following issue on the server side when the tool call
response contains empty args. This happens when running
`examples.agents.e2e_loop_with_client_tools` but `get_ticker_data`
returns `[]`:

```
Traceback (most recent call last):
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 169, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 189, in create_and_execute_turn
    async for chunk in self.run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 258, in run
    async for res in self._run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 499, in _run
    async for chunk in await self.inference_api.chat_completion(
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 182, in <genexpr>
    return (chunk async for chunk in await provider.chat_completion(**params))
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 296, in _stream_chat_completion
    async for chunk in res:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 162, in _process_vllm_chat_completion_stream_response
    arguments=json.loads(tool_call_buf.arguments),
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```
## Test Plan

All existing tests in
`tests/client-sdk/inference/test_text_inference.py` passed.

[//]: # (## Documentation)

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-19 22:27:02 -08:00
Sébastien Han
69eebaf5bf
build: add missing dev dependencies for unit tests (#1004)
# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:

```
uv run pytest -v -k "ollama" \
     --inference-model=llama3.2:3b-instruct-fp16
     llama_stack/providers/tests/inference/test_model_registration.py
```

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py

```

You can observe that some tests pass while others fail, but the test
runs successfully.

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 22:26:11 -08:00
Botao Chen
2b995c22eb
feat: inference passthrough provider (#1166)
##  What does this PR do?
In this PR, we implement a passthrough inference provider that works for
any endpoints that respect llama stack inference API definition.

## Test Plan
config some endpoint that respect llama stack inference API definition
and got the inference results successfully

<img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM"
src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1"
/>
2025-02-19 21:47:00 -08:00
Botao Chen
b751f7003d
feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164)
as title, to let scoring function llm_as_judge_405b_simpleqa output
aggregated_results.

We can leverage categorical_count to calculate the % of correctness as
eval benchmark metrics
2025-02-19 19:42:04 -08:00
Ihar Hrachyshka
c1f7d7f005
fix: miscellaneous job management improvements in torchtune (#1136)
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**

# What does this PR do?

A failed job is now registered in API, and one can consult its status.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73                                                      
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 19:09:37 -08:00
Francisco Arceo
7972daa72e
feat: Chunk sqlite-vec writes (#1094)
# What does this PR do?
1. This PR adds batch inserts into sqlite-vec as requested in
https://github.com/meta-llama/llama-stack/pull/1040
- Note: the inserts uses a uuid generated from the hash of the document
id and chunk content.
2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can
add similar tests to Faiss.

## Test Plan
1. Integration tests:
```python
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```
3. Unit tests:
```python
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py  -v -s --tb=short --disable-warnings --asyncio-mode=auto
...
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```
I also tested using the same example RAG script in
https://github.com/meta-llama/llama-stack/pull/1040 and received the
output.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 19:07:46 -08:00
Ashwin Bharambe
cdcbeb005b
chore: remove llama_models.llama3.api imports from providers (#1107)
There should be a choke-point for llama3.api imports -- this is the
prompt adapter. Creating a ChatFormat() object on demand is inexpensive.
The underlying Tokenizer is a singleton anyway.
2025-02-19 19:01:29 -08:00
Ben Browning
e9b8259cf9
fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123)
# What does this PR do?

Before this change, `distro_codegen.py` would only work if the user
manually installed multiple provider-specific dependencies (see #1122).
Now, users can run `distro_codegen.py` without any provider-specific
dependencies because we avoid importing the entire provider
implementations just to get the config needed to build the provider
template.

Concretely, this mostly means moving the
MODEL_ALIASES (and related variants) definitions to a new models.py
class within the provider implementation for those providers that
require additional dependencies. It also meant moving a couple of
imports from top-level imports to inside `get_adapter_impl` for some
providers, which follows the pattern used by multiple existing
providers.

To ensure we don't regress and accidentally add new imports that cause
distro_codegen.py to fail, the stubbed-in pre-commit hook for
distro_codegen.py was uncommented and slightly tweaked to run via `uv
run python ...` to ensure it runs with only the project's default
dependencies and to run automatically instead of manually.

Lastly, this updates distro_codegen.py itself to keep track of paths it
might have changed and to only `git diff` those specific paths when
checking for changed files instead of doing a diff on the entire working
tree. The latter was overly broad and would require a user have no other
unstaged changes in their working tree, even if those unstaged changes
were unrelated to generated code. Now it only flags uncommitted changes
for paths distro_codegen.py actually writes to.

Our generated code was also out-of-date, presumably because of these
issues, so this commit also has some updates to the generated code
purely because it was out of sync, and the pre-commit hook now enforces
things to be updated.

(Closes #1122)

## Test Plan

I manually tested distro_codegen.py and the pre-commit hook to verify
those work as expected, flagging any uncommited changes and catching any
imports that attempt to pull in provider-specific dependencies.

However, I do not have valid api keys to the impacted provider
implementations, and am unable to easily run the inference tests against
each changed provider. There are no functional changes to the provider
implementations here, but I'd appreciate a second set of eyes on the
changed import statements and moving of MODEL_ALIASES type code to a
separate models.py to ensure I didn't make any obvious errors.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 18:39:20 -08:00
ehhuang
8de7cf103b
feat: support tool_choice = {required, none, <function>} (#1059)
Summary:

titled


Test Plan:

added tests and

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-02-18 23:25:15 -05:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Yuan Tang
743f434860
fix: Ensure a tool call can be converted before adding to buffer (#1119)
# What does this PR do?

This fixes an issue when running the e2e agent example:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py

```
    |   File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response
    |     tool_call = convert_tool_call(choice.delta.tool_calls[0])
    |   File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call
    |     return ToolCall(
    |   File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    |     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    | pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall
    | call_id
    |   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/string_type
    | tool_name.enum[BuiltinTool]
    |   Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/enum
    | tool_name.str
    |   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/string_type
    | arguments
    |   Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int]
    |     For further information visit https://errors.pydantic.dev/2.10/v/dict_type
```

This issue happened because not all arguments have been appended to the
tool call buffer yet. The current code assumes that we are ready to
convert the tool call whenever args can be converted to JSON
successfully. In this case, `json.loads("202")` would succeed but the
rest of the arguments have not been properly parsed yet.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

The e2e example worked successfully (although note that I ran the script
twice with each function call separately due to
https://github.com/meta-llama/llama-stack/issues/1120):
```
tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'}
tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]"

tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'}
tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton | whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"4484174096/\", \"title\": \"January 20, 1993, President Clinton was sworn in as the 42nd ...\", \"description\": \"WATCH: On January 20, 1993, President Bill Clinton was sworn in as the 42nd President of the United States. #InaugurationDay Video courtesy of the...\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared thoughts ...\", \"description\": \"AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features \\u00b7 \\u00a9 2024 Google LLC\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/shorts/vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=PHihhihVth0\", \"title\": \"Bill & Hillary Clinton returning to Little Rock for 20th ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}]]}"
```

All text inference tests passed.

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-15 00:19:16 -05:00
ehhuang
ab2b46e528
feat: log start, complete time to Agent steps (#1116) 2025-02-14 17:48:06 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Sébastien Han
c0ee512980
build: configure ruff from pyproject.toml (#1100)
# What does this PR do?

- Remove hardcoded configurations from pre-commit.
- Allow configuration to be set via pyproject.toml.
- Merge .ruff.toml settings into pyproject.toml.
- Ensure the linter and formatter use the defined configuration instead
of being overridden by pre-commit.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:01:57 -08:00
Ben Browning
406465622e
fix: Update QdrantConfig to QdrantVectorIOConfig (#1104)
# What does this PR do?

This fixes an import introduced due to merging #1079 before #1039, and
thus the changes from #1039 needing to update `QdrantConfig` to
`QdrantVectorIOConfig`.


## Test Plan

I ran the remote vllm provider inference tests against the latest main:
```
VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote"
```

That failed with:
```
  File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module>
    from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig
ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py)
```

After this change, the import no longer fails and the tests pass.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-14 06:31:00 -08:00
Xi Yan
8b655e3cd2
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do?

- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- This change is backward compatible 
- Run notebook test with

```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```

<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>



[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 16:40:58 -08:00
Bill Murdock
32d1e50a6f
test: Add qdrant to provider tests (#1039)
# What does this PR do?

This is a follow on to #1022 . It includes the changes I needed to be
able to test the Qdrant support as requested by @terrytangyuan .

I uncovered a lot of bigger, more systemic issues with the vector DB
testing and I will open a new issue for those. For now, I am just
delivering the work I already did on that.

## Test Plan

As discussed on #1022:

```
podman pull qdrant/qdrant
mkdir qdrant-data
podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant
```


```
ollama pull all-minilm:l6-v2
curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}'
```

```
EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings
```

These show 3 tests passing and 15 deselected which is presumably working
as intended.

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-02-13 15:44:55 -08:00
Yuan Tang
5858777ff0
fix: Update VectorIO config classes in registry (#1079)
This was missed in https://github.com/meta-llama/llama-stack/pull/1023. 

```
Traceback (most recent call last):
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module>
    main()
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main
    impls = asyncio.run(construct_stack(config))
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack
    impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls
    impl = await instantiate_provider(
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider
    config_type = instantiate_class_type(provider_spec.config_class)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type
    return getattr(module, class_name)
AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig'

```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 15:39:13 -08:00
Yuan Tang
8ff27b58fa
chore: Consistent naming for VectorIO providers (#1023)
# What does this PR do?

This changes all VectorIO providers classes to follow the pattern
`<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All
API endpoints for VectorIOs are currently consistent with `/vector-io`.

Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. 

## Test Plan

I don't have a way to test all providers. This is a simple renaming so
things should work as expected.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 13:15:49 -05:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Yuan Tang
f9ca441974
chore: Link to Groq docs in the warning message for preview model (#1060)
This should be `llama-3.2-3b` instead of `llama-3.2-3b-instruct`.
2025-02-13 12:14:57 -05:00
Sébastien Han
418645696a
fix: improve signal handling and update dependencies (#1044)
# What does this PR do?
This commit enhances the signal handling mechanism in the server by
improving the `handle_signal` (previously handle_sigint) function. It
now properly retrieves the signal name, ensuring clearer logging when a
termination signal is received. Additionally, it cancels all running
tasks and waits for their completion before stopping the event loop,
allowing for a more graceful shutdown. Support for handling
SIGTERM has also been added alongside SIGINT.

Before the changes, handle_sigint used asyncio.run(run_shutdown()).
However, asyncio.run() is meant to start a new event loop, and calling
it inside an existing one (like when running Uvicorn) raises an error.
The fix replaces asyncio.run(run_shutdown()) with an async function
scheduled on the existing loop using loop.create_task(shutdown()). This
ensures that the shutdown coroutine runs within the current event loop
instead of trying to create a new one.

Furthermore, this commit updates the project dependencies. `fastapi` and
`uvicorn` have been added to the development dependencies in
`pyproject.toml` and `uv.lock`, ensuring that the necessary packages are
available for development and execution.

Closes: https://github.com/meta-llama/llama-stack/issues/1043
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run a server and send SIGINT:

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => ollama
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => sentence-transformers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  models => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inference => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-vector_io => faiss
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-safety => llama-guard
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  shields => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  safety => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_dbs => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_io => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => brave-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => tavily-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => code-interpreter
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => rag-runtime
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_groups => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_runtime => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  agents => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => huggingface
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => localfs
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasets => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasetio => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  telemetry => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => basic
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => llm-as-judge
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => braintrust
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring_functions => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-eval => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval_tasks => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inspect => __builtin__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: 
INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`...
INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss.
INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss.
INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes.
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available.
INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2...
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: 
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API safety
 POST /v1/safety/run-shield
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [65372]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Finished server process [65372]
Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl
```

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 08:07:59 -08:00
Ben Browning
dd1a366347
fix: logprobs support in remote-vllm provider (#1074)
# What does this PR do?

The remote-vllm provider was not passing logprobs options from
CompletionRequest or ChatCompletionRequests through to the OpenAI client
parameters. I manually verified this, as well as observed this provider
failing `TestInference::test_completion_logprobs`. This was filed as
issue #1073.

This fixes that by passing the `logprobs.top_k` value through to the
parameters we pass into the OpenAI client.

Additionally, this fixes a bug in `test_text_inference.py` where it
mistakenly assumed chunk.delta were of type `ContentDelta` for
completion requests. The deltas are of type `ContentDelta` for chat
completion requests, but for basic completion requests the deltas are of
type string. This test was likely failing for other providers that did
properly support logprobs because of this latter issue in the test,
which was hit while fixing the above issue with the remote-vllm
provider.

(Closes #1073)

## Test Plan

First, you need a vllm running. I ran one locally like this:
```
vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json
```

Next, run test_text_inference.py against this vllm using the remote vllm
provider like this:
```
VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote"
```

Before my change, the test failed with this error:
```
llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs
    assert 1 <= len(response.logprobs) <= 5
E   TypeError: object of type 'NoneType' has no len()
```

After my change, the test passes.

[//]: # (## Documentation)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-13 11:00:00 -05:00
Francisco Arceo
119fe8742a
feat: Adding sqlite-vec as a vectordb (#1040)
# What does this PR do?
This PR adds `sqlite_vec` as an additional inline vectordb.

Tested with `ollama` by adding the `vector_io` object in
`./llama_stack/templates/ollama/run.yaml` :

```yaml
  vector_io:
  - provider_id: sqlite_vec
    provider_type: inline::sqlite_vec
    config:
      kvstore:
        type: sqlite
        namespace: null
        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db
      db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db
```
I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test
file with:
```python
INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"]
```
And parameterized the relevant tests. 

[//]: # (If resolving an issue, uncomment and update the line below)
# Closes 
https://github.com/meta-llama/llama-stack/issues/1005

## Test Plan
I ran the tests with:
```bash
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
```
Which outputs:
```python
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```

In addition, I ran the `rag_with_vector_db.py`
[example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py)
using the script below with `uv run rag_example.py`.
<details>
<summary>CLICK TO SHOW SCRIPT 👋  </summary>

```python
#!/usr/bin/env python3
import os
import uuid
from termcolor import cprint

# Set environment variables
os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16'
os.environ['LLAMA_STACK_CONFIG'] = 'ollama'

# Import libraries after setting environment variables
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.types import Document


def main():
    # Initialize the client
    client = LlamaStackAsLibraryClient("ollama")
    vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"

    _ = client.initialize()

    model_id = 'llama3.2:3b-instruct-fp16'

    # Define the list of document URLs and create Document objects
    urls = [
        "chat.rst",
        "llama3.rst",
        "memory_optimizations.rst",
        "lora_finetune.rst",
    ]
    documents = [
        Document(
            document_id=f"num-{i}",
            content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}",
            mime_type="text/plain",
            metadata={},
        )
        for i, url in enumerate(urls)
    ]
    # (Optional) Use the documents as needed with your client here

    client.vector_dbs.register(
        provider_id='sqlite_vec',
        vector_db_id=vector_db_id,
        embedding_model="all-MiniLM-L6-v2",
        embedding_dimension=384,
    )

    client.tool_runtime.rag_tool.insert(
        documents=documents,
        vector_db_id=vector_db_id,
        chunk_size_in_tokens=512,
    )
    # Create agent configuration
    agent_config = AgentConfig(
        model=model_id,
        instructions="You are a helpful assistant",
        enable_session_persistence=False,
        toolgroups=[
            {
                "name": "builtin::rag",
                "args": {
                    "vector_db_ids": [vector_db_id],
                }
            }
        ],
    )

    # Instantiate the Agent
    agent = Agent(client, agent_config)

    # List of user prompts
    user_prompts = [
        "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.",
        "Was anything related to 'Llama3' discussed, if so what?",
        "Tell me how to use LoRA",
        "What about Quantization?",
    ]

    # Create a session for the agent
    session_id = agent.create_session("test-session")

    # Process each prompt and display the output
    for prompt in user_prompts:
        cprint(f"User> {prompt}", "green")
        response = agent.create_turn(
            messages=[
                {
                    "role": "user",
                    "content": prompt,
                }
            ],
            session_id=session_id,
        )
        # Log and print events from the response
        for log in EventLogger().log(response):
            log.print()


if __name__ == "__main__":
    main()
```
</details>

Which outputs a large summary of RAG generation.

# Documentation

Will handle documentation updates in follow-up PR.

# (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-12 10:50:03 -08:00
Yuan Tang
5e97dd9919
feat: Support tool calling for streaming chat completion in remote vLLM provider (#1063)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Closes https://github.com/meta-llama/llama-stack/issues/1046. 

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py
================================================================= test session starts =================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 14 items                                                                                                                                    

tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                  [  7%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                      [ 14%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 21%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 28%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED              [ 35%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 42%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 57%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 64%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 71%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 78%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED       [ 85%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] PASSED [ 92%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] PASSED [100%]

=============================================== 12 passed, 2 xfailed, 1 warning in 366.56s (0:06:06) ================================================

```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-12 06:17:21 -08:00
Sébastien Han
bf11cc0450
chore: update return type to Optional[str] (#982) 2025-02-11 22:10:28 -08:00
Xi Yan
66d7e15c93
perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041)
# What does this PR do?

**Problem**
- Using script:
https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085

- This hits an issue on server with `code_interpreter` not found, as we
do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`.

This is a general issue where model always tries to output
`code_interpreter` in `ToolCall` even when we do not have
`code_interpreter` available for execution.

**Reproduce Deeper Problem in chat-completion**
- Use script:
https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603

1. We currently always populate `code_interpreter` in `ToolCall` in
ChatCompletionResponse if the model's response begins with
`<|python_tag|>`. See
c5f5958498/models/llama3/api/chat_format.py (L200-L213)

<img width="913" alt="image"
src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6"
/>

2. This happens even if we do not pass the `code_interpreter` as a
`tools` in ChatCompletionRequest.

**This PR**

Explicitly make sure that the tools returned in
`ChatCompletionResponse.tool_calls` is always a tool requested by
`ChatCompletionRequest.tools`.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

**Before**
<img width="913" alt="image"
src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6"
/>
<img width="997" alt="image"
src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110"
/>


**After**
<img width="856" alt="image"
src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9"
/>
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef"
/>



**Unit Test**
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/
```
<img width="1002" alt="image"
src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779"
/>



**Streaming**
- Chat Completion
<img width="902" alt="image"
src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a"
/>

- Agent
<img width="916" alt="image"
src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb"
/>


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
2025-02-11 18:31:35 -08:00
Yuan Tang
dd37e58868
feat: Support tool calling for non-streaming chat completion in remote vLLM provider (#1034)
# What does this PR do?


This PR adds support for tool calling for non-streaming chat completion.
Prior to this, tool calls were not passed to chat completion requests
and the tools object needs to be restructured properly to be compatible
with vLLM provider.

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py
================================================================= test session starts =================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 12 items                                                                                                                                    

tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                  [  8%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED                      [ 16%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%]
tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%]
tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED              [ 41%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%]
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED         [100%]

```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-11 21:08:29 -05:00
Ihar Hrachyshka
24385cfd03
fix: filter out remote::sample providers when listing (#1057)
# What does this PR do?

Before:

```
 llama stack list-providers agents
+------------------------+-----------------------------------------------------------------------+
| Provider Type          | PIP Package Dependencies                                              |
+------------------------+-----------------------------------------------------------------------+
| inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis |
+------------------------+-----------------------------------------------------------------------+
| remote::sample         |                                                                       |
+------------------------+-----------------------------------------------------------------------+
```

After:

```
 llama stack list-providers agents
+------------------------+-----------------------------------------------------------------------+
| Provider Type          | PIP Package Dependencies                                              |
+------------------------+-----------------------------------------------------------------------+
| inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis |
+------------------------+-----------------------------------------------------------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Manually.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-11 16:12:46 -08:00
Dinesh Yeduguru
d8a20e034b
feat: make telemetry attributes be dict[str,PrimitiveType] (#1055)
# What does this PR do?
Make attributes in telemetry be only primitive types and avoid arbitrary
nesting.

## Test Plan
```
 LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py  -k "test_builtin_tool_web_search"
# Verified that attributes still show up correclty in jaeger
```
2025-02-11 15:10:17 -08:00
ehhuang
96c88397da
fix: agent config validation (#1053)
Summary:

Fixes AgentConfig init bug introduced with ToolConfig.

Namely, the below doesn't work
```
    agent_config = AgentConfig(
        **common_params,
        tool_config=ToolConfig(
            tool_choice="required",
        ),
    )
```
bvecause tool_choice was defaulted to 'auto' leading to validation check
failing.

Test Plan:

added unittests

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-02-11 14:48:42 -08:00
Yuan Tang
d954f2752e
fix: Added missing tool_config arg in SambaNova chat_completion() (#1042)
# What does this PR do?

`tool_config` is missing from the signature but is used in
`ChatCompletionRequest()`.


## Test Plan

This is a small fix. I don't have SambaNova to test the change but I
doubt that this is currently working.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-10 21:20:50 -08:00
Sébastien Han
b34c1dd8ad
test: replace blocked image URLs with GitHub-hosted (#1025)
# What does this PR do?

The previous image URLs were sometimes blocked by Cloudflare, causing
test failures for some users. This update replaces them with a
GitHub-hosted image (`dog.png`) from the `llama-stack` repository,
ensuring more reliable access during testing.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ ollama run llama3.2-vision:latest --keep-alive 2m &

$ uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 39 items / 36 deselected / 3 selected                                                              

llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] 
PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] PASSED

========================== 3 passed, 36 deselected, 2 warnings in 62.23s (0:01:02) ==========================
```

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-10 22:38:11 -05:00
Bill Murdock
3856927ee8
fix: Update Qdrant support post-refactor (#1022)
# What does this PR do?

I tried running the Qdrant provider and found some bugs. See #1021 for
details. @terrytangyuan wrote there:

> Please feel free to submit your changes in a PR. I fixed similar
issues for pgvector provider. This might be an issue introduced from a
refactoring.

So I am submitting this PR.

Closes #1021

## Test Plan

Here are the highlights for what I did to test this:

References:
-
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html
-
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py
-
https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md#build-configure-and-run-llama-stack

Install and run Qdrant server:

```
podman pull qdrant/qdrant
mkdir qdrant-data
podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant
```

Install and run Llama Stack from the venv-support PR (mainly because I
didn't want to install conda):

```
brew install cmake # Should just need this once

git clone https://github.com/meta-llama/llama-models.git
gh repo clone cdoern/llama-stack
cd llama-stack
gh pr checkout 1018 # This is the checkout that introduces venv support for build/run.  Otherwise you have to use conda.  Eventually this wil be part of main, hopefully.

uv sync --extra dev
uv pip install -e .
source .venv/bin/activate
uv pip install qdrant_client

LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack build --template ollama --image-type venv
```
```
edit llama_stack/templates/ollama/run.yaml
```

in that editor under:
```
  vector_io:
```
add:
```
  - provider_id: qdrant
    provider_type: remote::qdrant
    config: {}
```

see
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/vector_io/qdrant/config.py#L14
for config options (but I didn't need any)

```
LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack run ollama --image-type venv \
   --port $LLAMA_STACK_PORT \
   --env INFERENCE_MODEL=$INFERENCE_MODEL \
   --env SAFETY_MODEL=$SAFETY_MODEL \
   --env OLLAMA_URL=$OLLAMA_URL
```

Then I tested it out in a notebook.  Key highlights included:

```
qdrant_provider = None
for provider in client.providers.list():
    if provider.api == "vector_io" and provider.provider_id == "qdrant":
        qdrant_provider = provider
qdrant_provider
assert qdrant_provider is not None, "QDrant is not a provider.  You need to edit the run yaml file you use in your `llama stack run` call"

vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
    provider_id=qdrant_provider.provider_id,
)
```

Other than that, I just followed what was in
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

It would be good to have automated tests for this in the future, but
that would be a big undertaking.

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-02-10 18:08:33 -05:00
Ashwin Bharambe
10bda65b94 Nuke use_proxy from code execution 2025-02-07 09:55:55 -08:00
Sébastien Han
316c43fdaf
refactor(ollama): model availability check (#986)
# What does this PR do?

Moved model availability check logic into a dedicated
check_model_availability function. Eliminated redundant code by reusing
the helper function in both embedding and non-embedding model
registration.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Run Ollama and serve 2 models to get most the unit test pass:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
ollama run llama3.1:8b  --keepalive 2m &
```

Run the unit test:

```
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 65 items / 60 deselected / 5 selected                                                              

llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED [ 20%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED [ 40%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] FAILED [ 60%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] FAILED [ 80%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED [100%]

================================================== FAILURES ==================================================
_______________________ TestModelRegistration.test_register_with_llama_model[-ollama] ________________________
llama_stack/providers/tests/inference/test_model_registration.py:54: in test_register_with_llama_model
    _ = await models_impl.register_model(
llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/distribution/routers/routing_tables.py:245: in register_model
    registered_model = await self.register_object(model)
llama_stack/distribution/routers/routing_tables.py:192: in register_object
    registered_obj = await register_object_with_provider(obj, p)
llama_stack/distribution/routers/routing_tables.py:53: in register_object_with_provider
    return await p.register_model(obj)
llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/providers/remote/inference/ollama/ollama.py:368: in register_model
    await check_model_availability(model.provider_resource_id)
llama_stack/providers/remote/inference/ollama/ollama.py:359: in check_model_availability
    raise ValueError(
E   ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-instruct-fp16
__________________ TestModelRegistration.test_initialize_model_during_registering[-ollama] ___________________
llama_stack/providers/tests/inference/test_model_registration.py:85: in test_initialize_model_during_registering
    mock_load_model.assert_called_once()
/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/mock.py:956: in assert_called_once
    raise AssertionError(msg)
E   AssertionError: Expected 'load_model' to have been called once. Called 0 times.
-------------------------------------------- Captured stderr call --------------------------------------------
W0207 11:55:26.777000 90854 .venv/lib/python3.13/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
========================================== short test summary info ===========================================
FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] - ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-i...
FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] - AssertionError: Expected 'load_model' to have been called once. Called 0 times.
=========================== 2 failed, 3 passed, 60 deselected, 2 warnings in 1.84s ===========================
``` 

We only "care" about the `test_register_nonexistent_model` for this
code.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 09:52:16 -08:00
Sébastien Han
0b7098493a
test: encode image data as base64 (#1003)
# What does this PR do?

Previously, the test was failing due to a pydantic validation error
caused by passing raw binary image data instead of a valid Unicode
string. This fix encodes the image data as base64, ensuring it is a
valid string format compatible with `ImageContentItem`.

Error:

```
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:31: in <module>
    class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:37: in TestVisionModelInference
    ImageContentItem(image=dict(data=PASTA_IMAGE)),
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E   image.data
E     Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E       For further information visit
https://errors.pydantic.dev/2.10/v/string_unicode
```

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Execute the following:

```
ollama run llama3.2-vision --keepalive 2m &
uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py

llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] FAILED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] FAILED
```

The last two tests are failing because Cloudflare blocked me from
accessing
https://www.healthypawspetinsurance.com/Images/V3/DogAndPuppyInsurance/Dog_CTA_Desktop_HeroImage.jpg
but this has no impact on the current fix.


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 09:44:16 -08:00
Sébastien Han
657f24b964
chore: add missing ToolConfig import in groq.py (#983)
# What does this PR do?

Imported `ToolConfig` from the `llama_stack.apis.inference` module to
resolve missing reference and ensure proper functionality within the
`groq.py` file.

Signed-off-by: Sébastien Han <seb@redhat.com>


## Test Plan

Without the change, pytest will run with the following error:

```
uv run pytest -v -s -k "ollama" llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 379 items / 1 error / 349 deselected / 30 selected                                                 

=================================================== ERRORS ===================================================
__________________ ERROR collecting llama_stack/providers/tests/inference/groq/test_init.py __________________
llama_stack/providers/tests/inference/groq/test_init.py:11: in <module>
    from llama_stack.providers.remote.inference.groq.groq import GroqInferenceAdapter
llama_stack/providers/remote/inference/groq/groq.py:72: in <module>
    class GroqInferenceAdapter(Inference, ModelRegistryHelper, NeedsRequestProviderData):
llama_stack/providers/remote/inference/groq/groq.py:102: in GroqInferenceAdapter
    tool_config: Optional[ToolConfig] = None,
E   NameError: name 'ToolConfig' is not defined
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/groq/test_init.py - NameError: name 'ToolConfig' is not defined
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================== 349 deselected, 22 warnings, 1 error in 0.28s ================================
```

With the change the test continues to run and fails with a different
error:

```
uv run pytest -v -s llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 342 items / 1 error                                                                                

=================================================== ERRORS ===================================================
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:29: in <module>
    class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:35: in TestVisionModelInference
    ImageContentItem(image=dict(data=PASTA_IMAGE)),
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E   image.data
E     Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E       For further information visit https://errors.pydantic.dev/2.10/v/string_unicode
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/test_vision_inference.py - pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 22 warnings, 1 error in 0.25s ========================================
```

Which is fixed in https://github.com/meta-llama/llama-stack/pull/1003.

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 09:35:00 -08:00
Sébastien Han
840344975d
test: rm unused exception alias in pytest.raises (#991)
# What does this PR do?

Refactored tests by removing unused exception alias (as exc_info) in
pytest.raises, improving code clarity and reducing lint warnings.
exc_info was never used.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 08:04:25 -08:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Yuan Tang
0a0ee5ca96
Fix incorrect handling of chat completion endpoint in remote::vLLM (#951)
# What does this PR do?

Fixes https://github.com/meta-llama/llama-stack/issues/949.


## Test Plan

Verified that the correct chat completion endpoint is called after the
change.

Llama Stack server:
```
INFO:     ::1:32838 - "POST /v1/inference/chat-completion HTTP/1.1" 200 OK
18:36:28.187 [END] /v1/inference/chat-completion [StatusCode.OK] (1276.12ms)

```

vLLM server:
```
INFO:     ::1:36866 - "POST /v1/chat/completions HTTP/1.1" 200 OK
```

```bash
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -s -v tests/client-sdk/inference/test_inference.py -k "test_image_chat_completion_base64 or test_image_chat_completion_non_streaming or test_image_chat_completion_streaming"
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 16 items / 12 deselected / 4 selected                                                                                                          

tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-url] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-data] PASSED
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-06 10:45:19 -08:00
Sébastien Han
403292fcf6
test: replace memory with vector_io fixture (#984)
# What does this PR do?

Replaced references to `memory` with `vector_io` in
`DEFAULT_PROVIDER_COMBINATIONS` and adjusted corresponding fixture
imports to ensure proper configuration for vector I/O during tests. This
change aligns with the new testing structure.

Followup of https://github.com/meta-llama/llama-stack/pull/830 when the
memory fixture was removed.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-06 10:12:59 -08:00
ehhuang
3922999118
sys_prompt support in Agent (#938)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan


LLAMA_STACK_CONFIG=together pytest
\-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v
tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 21:11:32 -08:00
Yuan Tang
a79a083e39
Fix broken pgvector provider and memory leaks (#947)
This PR fixes the broken pgvector provider as well as wraps all cursor
object creations with context manager to ensure that they get properly
closed to avoid potential memory leaks.

```
> pytest llama_stack/providers/tests/vector_io/test_vector_io.py   -m "pgvector" --env EMBEDDING_DIMENSION=384 --env PGVECTOR_PORT=7432 --env PGVECTOR_DB=db --env PGVECTOR_USER=user --env PGVECTOR_PASSWORD=pass   -v -s --tb=short --disable-warnings

llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_list[-pgvector] PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_register[-pgvector] PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_query_documents[-pgvector] The scores are: [0.8168284974053789, 0.8080469278964486, 0.8050996198466661]
PASSED
```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-05 09:32:05 -08:00
Ihar Hrachyshka
5c8e35a9e2
docs, tests: replace datasets.rst with memory_optimizations.rst (#968)
datasets.rst was removed from torchtune repo.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Replace a missing 404 document with another one that exists. (Removed it
from
the list when memory_optimizations.rst was already pulled.)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-05 11:25:56 -05:00
ehhuang
c9ab72fa82
Support sys_prompt behavior in inference (#937)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan

python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
2025-02-03 23:35:16 -08:00
Ashwin Bharambe
f98efe68c9
Misc fixes (#944)
- Make sure torch + torchvision go together as deps, otherwise bad stuff
happens
- Add a pre-commit for requirements.txt
2025-02-03 14:08:47 -08:00
Yuan Tang
83a51c7bfb
Properly close PGVector DB connection during shutdown() (#931)
The connection to the DB was not closed during shutdown.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 21:23:13 -08:00