Commit graph

1267 commits

Author SHA1 Message Date
LESSuseLESS
3a31611486
feat: completing text /chat-completion and /completion tests (#1223)
# What does this PR do?

The goal is to have a fairly complete set of provider and e2e tests for
/chat-completion and /completion. This is the current list,
```
grep -oE "def test_[a-zA-Z_+]*" llama_stack/providers/tests/inference/test_text_inference.py | cut -d' ' -f2
```
- test_model_list
- test_text_completion_non_streaming
- test_text_completion_streaming
- test_text_completion_logprobs_non_streaming
- test_text_completion_logprobs_streaming
- test_text_completion_structured_output
- test_text_chat_completion_non_streaming
- test_text_chat_completion_structured_output
- test_text_chat_completion_streaming
- test_text_chat_completion_with_tool_calling
- test_text_chat_completion_with_tool_calling_streaming

```
grep -oE "def test_[a-zA-Z_+]*" tests/client-sdk/inference/test_text_inference.py | cut -d' ' -f2
```
- test_text_completion_non_streaming
- test_text_completion_streaming
- test_text_completion_log_probs_non_streaming
- test_text_completion_log_probs_streaming
- test_text_completion_structured_output
- test_text_chat_completion_non_streaming
- test_text_chat_completion_streaming
- test_text_chat_completion_with_tool_calling_and_non_streaming
- test_text_chat_completion_with_tool_calling_and_streaming
- test_text_chat_completion_with_tool_choice_required
- test_text_chat_completion_with_tool_choice_none
- test_text_chat_completion_structured_output
- test_text_chat_completion_tool_calling_tools_not_in_request

## Test plan

== Set up Ollama local server
```
OLLAMA_HOST=127.0.0.1:8321 with-proxy ollama serve
OLLAMA_HOST=127.0.0.1:8321 ollama run llama3.2:3b-instruct-fp16 --keepalive 60m
```

==  Run a provider test
```
conda activate stack
OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \
llama_stack/providers/tests/inference/test_text_inference.py::TestInference
```

== Run an e2e test
```
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
```
```
conda activate stack
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \
tests/client-sdk/inference/test_text_inference.py
```
2025-02-25 11:37:04 -08:00
Charlie Doern
9b130f96a7
fix: build_venv expects an extra argument (#1233)
# What does this PR do?


currently, build_venv.sh expects a `distribution_type` as the first
argument but the only things ever passed are:

1. image name
2. pip dependencies

so distribution_type is never passed in meaning the script errors when
calling something like:

`llama stack build --image-type venv --template ollama --image-name
test`

before output:

```
llama stack build --image-type venv --template ollama --image-name venv-test
Usage: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> <env_name> <pip_dependencies> [<special_pip_deps>]
Example: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> mybuild ./my-stack-build.yaml 'numpy pandas scipy'
Failed to build target venv-test with return code 1
Run config path is empty
```
after:

```
llama stack build --image-type venv --template ollama --image-name venv-test
Environment 'venv-test' already exists, re-using it.
Using virtual environment venv-test
Using CPython 3.13.0 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: venv-test
Activate with: source venv-test/bin/activate
Using Python 3.13.0 environment at: venv-test
Resolved 55 packages in 640ms
      Built fire==0.7.0
Prepared 54 packages in 1.14s
Installed 55 packages in 82ms
 + annotated-types==0.7.0
 ```

## Test Plan

ran locally with output above

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-25 11:08:50 -08:00
Sébastien Han
c223b1862b
fix: resolve type hint issues and import dependencies (#1176)
# What does this PR do?

- Fixed type hinting and missing imports across multiple modules.
- Improved compatibility by using `TYPE_CHECKING` for conditional
imports.
- Updated `pyproject.toml` to enforce stricter linting.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-25 11:06:47 -08:00
Yuan Tang
1a044ef894
fix: Raise exception when tool call result is None (#1253)
# What does this PR do?

When there are issues with the tool call function, an exception is
raised but the error message is not informative. This adds a clearer
message to tell users to check their functions.
```
Traceback (most recent call last):
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
    async for chunk in self.run(
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
    async for res in self._run(
  File "/Users/phayes/projects/llama-stack/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 811, in _run
    content=tool_result.content,
AttributeError: 'NoneType' object has no attribute 'content'
```

## Test Plan

Ran the same script and exception is raised with clearer error message.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-25 13:10:50 -05:00
Jeff Tang
73a0c7a0e7
LocalInferenceImpl update for LS013 (#1242)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-02-25 09:58:34 -08:00
ehhuang
dc3c881ffe
fix: include timezone in Agent steps' timestamps (#1247)
Summary:

kotlin SDK expects this format

Test Plan:

python prints the expected format
>>> str(datetime.now().astimezone())
'2025-02-24 22:02:58.729763-08:00'
2025-02-25 09:49:25 -08:00
Sébastien Han
1bd080c23d
build: hint on Python version for uv venv (#1172)
# What does this PR do?

Whenever uv is instantiated and creates a virtual environment, it will
use the minimal Python interpreter version supported by the project
which is 3.10.

Closes: https://github.com/meta-llama/llama-stack/issues/1170
Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-25 10:37:45 -05:00
Hardik Shah
30f79fafcb
fix: Update Llama_Stack_Benchmark_Evals.ipynb (#1246)
Update eval notebook to use `--image-name __system__`
2025-02-24 18:22:42 -08:00
Hardik Shah
a1fe3c30dd
fix: Update getting_started.ipynb (#1245)
update to install properly in system python in colab
2025-02-24 18:22:32 -08:00
Charlie Doern
de878e15a9
fix: pre-commit updates (#1243)
# What does this PR do?

PR #1139 caused pre-commit failures on main likely due to improper
rebase before merge. run pre-commit on main and commit the changes

see runs here:
3775148428

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-24 17:20:29 -08:00
Charlie Doern
4684fd3f8d
refactor: combine start scripts for each env (#1139)
# What does this PR do?

now that llama stack supports running in venv, conda, and container
modes and the 3 scripts overlap alot, combine these three into ons
`start_stack.sh` script

## Test Plan

tested this locally on venv, conda, and container

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-24 16:53:31 -08:00
github-actions[bot]
47f8c592b9 Bump version to 0.1.4 2025-02-24 15:59:26 -08:00
Ashwin Bharambe
9b0f783e54
test: add a ci-tests distro template for running e2e tests (#1237) 2025-02-24 14:43:21 -08:00
Hardik Shah
27a08b7266 test fix for sometimes tools get called more than once 2025-02-24 13:16:40 -08:00
ehhuang
e8f4efba44
test: fix test_tool_choice (#1234)
Summary:

Test Plan:
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1234).
* __->__ #1234
* #1214
2025-02-24 12:42:42 -08:00
ehhuang
14c38acf97
fix: set default tool_prompt_format in inference api (#1214)
Summary:
Currently we don't set the best tool_prompt_format according to model as
promisd.

Test Plan:
Added print around raw model input and inspected manually
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1214).
* #1234
* __->__ #1214
2025-02-24 12:38:37 -08:00
Sébastien Han
c4987bc349
fix: avoid failure when no special pip deps and better exit (#1228)
# What does this PR do?

When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:

1. Environment name
2. Pip dependencies

Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

This command shouldn't fail:

```
llama stack build --template ollama --image-type venv
```

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-24 13:18:52 -05:00
Ashwin Bharambe
d6356f822a fix: remove UV_SYSTEM_PYTHON from getting started notebook since llama stack build detects notebook environment 2025-02-24 10:05:02 -08:00
Ashwin Bharambe
e8e8fe7c93 fix: add LLAMA_STACK_CLIENT_DIR mount when installing in docker from source 2025-02-24 10:00:57 -08:00
Ashwin Bharambe
641549c631 Add llama stack client overrides also; necessary for correct docker building 2025-02-24 07:51:11 -08:00
Reid
1842eeb96f
docs: small fixes (#1224) 2025-02-24 07:59:58 -05:00
Ashwin Bharambe
0973d386e6 fix: update build_container.sh to ensure llama-models is installed first 2025-02-23 21:47:26 -08:00
Yuan Tang
17162b9978
docs: Add vLLM to the list of inference providers in concepts and providers pages (#1227)
This increases visibility of the vLLM provider.
2025-02-23 20:16:30 -08:00
Charlie Doern
34e3faa4e8
feat: add --run to llama stack build (#1156)
# What does this PR do?

--run runs the stack that was just build using the same arguments during
the build process (image-name, type, etc)

This simplifies the workflow a lot and makes the UX better for most
local users trying to get started rather than having to match the flags
of the two commands (build and then run)

Also, moved `ImageType` to distribution.utils since there were circular
import errors with its old location

## Test Plan

tested locally using the following command: 

`llama stack build --run --template ollama --image-type venv`

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-23 22:06:09 -05:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225)
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Francisco Arceo
19ae4b35d9
docs: Adding Provider sections to docs (#1195)
# What does this PR do?
Adding Provider sections to docs (some of these will be empty and need
updating).


This PR is still a draft while I seek feedback from other contributors.
I opened it to make the structure visible in the linked GitHub Issue.

# Closes https://github.com/meta-llama/llama-stack/issues/1189

- Providers Overview Page
![Screenshot 2025-02-21 at 12 15
09 PM](https://github.com/user-attachments/assets/e83e5a17-0d96-4de0-8251-68161799a054)

- SQLite-Vec specific page
![Screenshot 2025-02-21 at 12 15
34 PM](https://github.com/user-attachments/assets/14773900-fc8f-49e9-832a-b060b7ca010a)

## Test Plan
N/A

[//]: # (## Documentation)

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-22 11:59:34 -08:00
Ashwin Bharambe
b890d7a611 Test be not having prints yo 2025-02-21 16:43:00 -08:00
ehhuang
c9e08cc0a8
test: do not overwrite agent_config (#1216)
Summary:

Test Plan:
2025-02-21 16:38:56 -08:00
Reid
187524d4ae
feat: add substring search for model list (#1099)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

`llama model list` or `llama model list --show-all` will list more or
all for the models, so add the `search` option to simplify the output.
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [-s SEARCH]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  -s SEARCH, --search SEARCH
                        Search for the input string as a substring in the model descriptor(ID)

$ llama model list -s 70b
+-----------------------+-----------------------------------+----------------+
| Model Descriptor(ID)  | Hugging Face Repo                 | Context Length |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B          | meta-llama/Llama-3.1-70B          | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B-Instruct | meta-llama/Llama-3.1-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+

$ llama model list -s 3.1-8b
+----------------------+----------------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo                | Context Length |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B          | meta-llama/Llama-3.1-8B          | 128K           |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B-Instruct | meta-llama/Llama-3.1-8B-Instruct | 128K           |
+----------------------+----------------------------------+----------------+

$ llama model list --show-all -s pro
+----------------------+-----------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo           | Context Length |
+----------------------+-----------------------------+----------------+
| Prompt-Guard-86M     | meta-llama/Prompt-Guard-86M | 2K             |
+----------------------+-----------------------------+----------------+

$ llama model list -s k
Not found for search.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 16:38:10 -08:00
Ashwin Bharambe
5be628f637 Add test jsons to MANIFEST for now 2025-02-21 16:25:51 -08:00
Ashwin Bharambe
45ffe87d7c Kill noise from test output 2025-02-21 15:37:23 -08:00
ehhuang
bf38d0aba0
test: fix test_rag_agent test (#1215)
Summary:

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/client-sdk/agents/test_agents.py::test_rag_agent --safety-shield
meta-llama/Llama-Guard-3-8B
2025-02-21 15:24:28 -08:00
Ashwin Bharambe
e7d261ef4a Fix test infra, sentence embeddings mixin 2025-02-21 15:11:46 -08:00
Ashwin Bharambe
182608d4bf better test naming 2025-02-21 14:27:08 -08:00
Ashwin Bharambe
ab54b8cd58
feat(providers): support non-llama models for inference providers (#1200)
This PR begins the process of supporting non-llama models within Llama
Stack. We start simple by adding support for this functionality within a
few existing providers: fireworks, together and ollama.

## Test Plan

```bash
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/inference/test_text_inference.py \
  --inference-model accounts/fireworks/models/phi-3-vision-128k-instruct
```

^ this passes most of the tests but as expected fails the tool calling
related tests since they are very specific to Llama models

```
inference/test_text_inference.py::test_text_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_completion_log_probs_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct-completion-01] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet do humans live on?-Earth] PASSED
inference/test_text_inference.py::test_text_chat_completion_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-Which planet has rings around it with a name starting w
ith letter S?-Saturn] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What's the name of the Sun in latin?-Sol] PASSED
inference/test_text_inference.py::test_text_chat_completion_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct-What is the name of the US captial?-Washington] PASSED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_required[accounts/fireworks/models/phi-3-vision-128k-instruct] FAILED
inference/test_text_inference.py::test_text_chat_completion_with_tool_choice_none[accounts/fireworks/models/phi-3-vision-128k-instruct] PASSED
inference/test_text_inference.py::test_text_chat_completion_structured_output[accounts/fireworks/models/phi-3-vision-128k-instruct] ERROR
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-True] PASSED
inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[accounts/fireworks/models/phi-3-vision-128k-instruct-False] PASSED
```
2025-02-21 13:21:28 -08:00
Sébastien Han
9bbe34694d
ci: add mypy for static type checking (#1101)
# What does this PR do?

- Enable mypy to run in the CI on a subset of the repository
- Fix a few mypy errors
- Run mypy from pre-commit

Signed-off-by: Sébastien Han <seb@redhat.com>
 
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-21 13:15:40 -08:00
ehhuang
25fddccfd8
feat: tool outputs metadata (#1155)
Summary:

Allows tools to output metadata. This is useful for evaluating tool
outputs, e.g. RAG tool will output document IDs, which can be used to
score recall.

Will need to make a similar change on the client side to support
ClientTool outputting metadata.

Test Plan:

LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/client-sdk/agents/test_agents.py
2025-02-21 13:15:31 -08:00
Ashwin Bharambe
36162c8c82 fix(ollama): register model with the helper first so it gets normalized 2025-02-21 12:51:38 -08:00
Xi Yan
0fe071764f
feat(1/n): api: unify agents for handling server & client tools (#1178)
# Problem

Our current Agent framework has discrepancies in definition on how we
handle server side and client side tools.

1. Server Tools: a single Turn is returned including `ToolExecutionStep`
in agenst
2. Client Tools: `create_agent_turn` is called in loop with client agent
lib yielding the agent chunk

ad6ffc63df/src/llama_stack_client/lib/agents/agent.py (L186-L211)

This makes it inconsistent to work with server & client tools. It also
complicates the logs to telemetry to get information about agents turn /
history for observability.

#### Principle
The same `turn_id` should be used to represent the steps required to
complete a user message including client tools.

## Solution

1. `AgentTurnResponseEventType.turn_awaiting_input` status to indicate
that the current turn is not completed, and awaiting tool input
2. `continue_agent_turn` endpoint to update agent turn with client's
tool response.


# What does this PR do?
- Skeleton API as example

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

- Just API update, no functionality change
```
llama stack run + client-sdk test
```

<img width="842" alt="image"
src="https://github.com/user-attachments/assets/7ac56b5f-f424-4632-9476-7e0f57555bc3"
/>


[//]: # (## Documentation)
2025-02-21 11:48:27 -08:00
Ashwin Bharambe
992f865b2e
chore: move embedding deps to RAG tool where they are needed (#1210)
`EMBEDDING_DEPS` were wrongly associated with `vector_io` providers.
They are needed by
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142
and related code and is used by the RAG tool and as such should only be
needed by the `inline::rag-runtime` provider.
2025-02-21 11:33:41 -08:00
Ashwin Bharambe
11697f85c5
fix: pull ollama embedding model if necessary (#1209)
Embedding models are tiny and can be pulled on-demand. Let's do that so
the user doesn't have to do "yet another thing" to get themselves set
up.

Thanks @hardikjshah for the suggestion.

Also fixed a build dependency miss (TODO: distro_codegen needs to
actually check that the build template contains all providers mentioned
for the run.yaml file)

## Test Plan 

First run `ollama rm all-minilm:latest`. 

Run `llama stack build --template ollama && llama stack run ollama --env
INFERENCE_MODEL=llama3.2:3b-instruct-fp16`. See that it outputs a
"Pulling embedding model `all-minilm:latest`" output and the stack
starts up correctly. Verify that `ollama list` shows the model is
correctly downloaded.
2025-02-21 10:35:56 -08:00
Jamie Land
840fae2259
fix: Updating images so that they are able to run without root access (#1208)
# What does this PR do?
Addresses issues where the container is unable to run as root. Gives
write access to required folders.

[//]: # (If resolving an issue, uncomment and update the line below)
(Closes #[1207])

## Test Plan
I built locally and ran `llama stack build --template remote-vllm
--image-type container` and validated I could see my changes in the
output:

```
#11 1.186 Installed 11 packages in 61ms
#11 1.186  + llama-models==0.1.3
#11 1.186  + llama-stack==0.1.3
#11 1.186  + llama-stack-client==0.1.3
#11 1.186  + markdown-it-py==3.0.0
#11 1.186  + mdurl==0.1.2
#11 1.186  + prompt-toolkit==3.0.50
#11 1.186  + pyaml==25.1.0
#11 1.186  + pygments==2.19.1
#11 1.186  + rich==13.9.4
#11 1.186  + tiktoken==0.9.0
#11 1.186  + wcwidth==0.2.13
#11 DONE 1.6s

#12 [ 9/10] RUN mkdir -p /.llama /.cache
#12 DONE 0.3s

#13 [10/10] RUN chmod -R g+rw /app /.llama /.cache
#13 DONE 0.3s

#14 exporting to image
#14 exporting layers
#14 exporting layers 3.7s done
#14 writing image sha256:11cc8bd954db6d036037bcaf471b173ddd5261ac4b1e72074cccf85d18aefb96 done
#14 naming to docker.io/library/distribution-remote-vllm:0.1.3 done
#14 DONE 3.7s
+ set +x
Success!
```
This is what the resulting image looks like:


![image](https://github.com/user-attachments/assets/070b9c05-b40f-4e7e-aa24-fef260c395e3)

Also tagged the image as `0.1.3-test` and [pushed to
quay](https://quay.io/repository/jland/distribution-remote-vllm?tab=tags)
(note there are a bunch of critical vulnerabilities we may want to look
into)

And for good measure I deployed the resulting image on my Openshift
environment using the default Security Context and validated that there
were no issue with it coming up.

My validation was all done with the `vllm-remote` distribution, but if I
am understanding everything correctly the other distributions are just
different run.yaml configs.


[//]: # (## Documentation)


Please let me know if there is anything else I need to do.

Co-authored-by: Jamie Land <hokie10@gmail.com>
2025-02-21 11:32:56 -05:00
Yuan Tang
6634864b19
docs: Add missing uv command and clarify website rebuild (#1199)
# What does this PR do?

This fixes the following error:

```
$ make html
/bin/sh: line 1: sphinx-build: command not found
make: *** [Makefile:20: html] Error 127
```

Also clarifies that this command only rebuilds the website without
watching/refreshes.

## Test Plan

New command works.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-21 11:29:32 -05:00
Reid
9898589f12
fix: convert back to model descriptor for model in list --downloaded (#1201)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Currently , `model` in `--downloaded` just use the directory(already
replace `:`), so covert back to descriptor keep the same with ` llama
model list`, and remove command also use `descriptor`.
```
before:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct-int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+

after:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct:int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:10:34 -08:00
Rashmi Pawar
da9f0b7869
test(client-sdk): Update embedding test types to use latest imports (#1203)
# What does this PR do?
- Updates ImageContentItemImageURL import
- fixes `embedding_dimensions` metadata param

## Test Plan
- Ran pytest locally, verified embedding tests pass with new types

![Screenshot 2025-02-21 at 6 54
27 PM](https://github.com/user-attachments/assets/f80e3785-04c3-415e-9276-88aa8136bf00)

cc: @dglogo @sumitb
2025-02-21 08:09:17 -08:00
Matthew Farrellee
46da187c07
fix: remove list of list tests, no longer relevant after #1161 (#1205)
# What does this PR do?

#1161 updated the embedding signature making the nested list tests
irrelevant
2025-02-21 08:07:35 -08:00
Reid
d2701b0d6a
chore: remove configure subcommand (#1202)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

When tried to use `configure`, and found it `DEPRECATED`, and found pr
https://github.com/meta-llama/llama-stack/pull/371 to remove it, not
sure why not remove the `configure.py`?
```
$ llama stack configure /tmp/test.yaml
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config
llama stack configure: error:
    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

It would better better to tell when user check it how to use with
`--help` first:

```
before:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

positional arguments:

after:
$ llama stack configure --help
usage: llama stack configure [-h] [--output-dir OUTPUT_DIR] config

Configure a llama stack distribution

    DEPRECATED! llama stack configure has been deprecated.
    Please use llama stack run <path/to/run.yaml> instead.
    Please see example run.yaml in /distributions folder.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:06:25 -08:00
Reid
c9c4a3c921
feat: model remove cmd (#1128)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

add a subcommand, help to clean the unneeded model:
```
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

$ llama model remove --help
usage: llama model remove [-h] -m MODEL [-f]

Remove the downloaded llama model

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        Specify the llama downloaded model name
  -f, --force           Used to forcefully remove the llama model from the storage without further confirmation

$ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8
Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n
Removal aborted.

$ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f
Llama3.2-1B-Instruct:int4-qlora-eo8 removed.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:05:12 -08:00
Matthew Farrellee
3099c5243f
fix: update URL import, URL -> ImageContentItemImageURL (#1204)
# What does this PR do?

fixes test to use new name for URL import

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-21 08:02:21 -08:00
Ashwin Bharambe
34226d6c93 Another test_case related breakage fix 2025-02-20 23:10:33 -08:00