Commit graph

362 commits

Author SHA1 Message Date
Sébastien Han
c4987bc349
fix: avoid failure when no special pip deps and better exit (#1228)
# What does this PR do?

When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:

1. Environment name
2. Pip dependencies

Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

This command shouldn't fail:

```
llama stack build --template ollama --image-type venv
```

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-24 13:18:52 -05:00
Ashwin Bharambe
e8e8fe7c93 fix: add LLAMA_STACK_CLIENT_DIR mount when installing in docker from source 2025-02-24 10:00:57 -08:00
Ashwin Bharambe
641549c631 Add llama stack client overrides also; necessary for correct docker building 2025-02-24 07:51:11 -08:00
Ashwin Bharambe
0973d386e6 fix: update build_container.sh to ensure llama-models is installed first 2025-02-23 21:47:26 -08:00
Charlie Doern
34e3faa4e8
feat: add --run to llama stack build (#1156)
# What does this PR do?

--run runs the stack that was just build using the same arguments during
the build process (image-name, type, etc)

This simplifies the workflow a lot and makes the UX better for most
local users trying to get started rather than having to match the flags
of the two commands (build and then run)

Also, moved `ImageType` to distribution.utils since there were circular
import errors with its old location

## Test Plan

tested locally using the following command: 

`llama stack build --run --template ollama --image-type venv`

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-23 22:06:09 -05:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225)
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Ashwin Bharambe
e7d261ef4a Fix test infra, sentence embeddings mixin 2025-02-21 15:11:46 -08:00
Jamie Land
840fae2259
fix: Updating images so that they are able to run without root access (#1208)
# What does this PR do?
Addresses issues where the container is unable to run as root. Gives
write access to required folders.

[//]: # (If resolving an issue, uncomment and update the line below)
(Closes #[1207])

## Test Plan
I built locally and ran `llama stack build --template remote-vllm
--image-type container` and validated I could see my changes in the
output:

```
#11 1.186 Installed 11 packages in 61ms
#11 1.186  + llama-models==0.1.3
#11 1.186  + llama-stack==0.1.3
#11 1.186  + llama-stack-client==0.1.3
#11 1.186  + markdown-it-py==3.0.0
#11 1.186  + mdurl==0.1.2
#11 1.186  + prompt-toolkit==3.0.50
#11 1.186  + pyaml==25.1.0
#11 1.186  + pygments==2.19.1
#11 1.186  + rich==13.9.4
#11 1.186  + tiktoken==0.9.0
#11 1.186  + wcwidth==0.2.13
#11 DONE 1.6s

#12 [ 9/10] RUN mkdir -p /.llama /.cache
#12 DONE 0.3s

#13 [10/10] RUN chmod -R g+rw /app /.llama /.cache
#13 DONE 0.3s

#14 exporting to image
#14 exporting layers
#14 exporting layers 3.7s done
#14 writing image sha256:11cc8bd954db6d036037bcaf471b173ddd5261ac4b1e72074cccf85d18aefb96 done
#14 naming to docker.io/library/distribution-remote-vllm:0.1.3 done
#14 DONE 3.7s
+ set +x
Success!
```
This is what the resulting image looks like:


![image](https://github.com/user-attachments/assets/070b9c05-b40f-4e7e-aa24-fef260c395e3)

Also tagged the image as `0.1.3-test` and [pushed to
quay](https://quay.io/repository/jland/distribution-remote-vllm?tab=tags)
(note there are a bunch of critical vulnerabilities we may want to look
into)

And for good measure I deployed the resulting image on my Openshift
environment using the default Security Context and validated that there
were no issue with it coming up.

My validation was all done with the `vllm-remote` distribution, but if I
am understanding everything correctly the other distributions are just
different run.yaml configs.


[//]: # (## Documentation)


Please let me know if there is anything else I need to do.

Co-authored-by: Jamie Land <hokie10@gmail.com>
2025-02-21 11:32:56 -05:00
Ashwin Bharambe
81ce39a607
feat(api): Add options for supporting various embedding models (#1192)
We need to support:
- asymmetric embedding models (#934)
- truncation policies (#933)
- varying dimensional output (#932) 

## Test Plan

```bash
$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
   --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$  pytest -s -v -k together test_embeddings.py \
   --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
   --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784
```
2025-02-20 22:27:12 -08:00
Ashwin Bharambe
6f9d622340
fix(api): update embeddings signature so inputs and outputs list align (#1161)
See Issue #922 

The change is slightly backwards incompatible but no callsite (in our
client codebases or stack-apps) every passes a depth-2
`List[List[InterleavedContentItem]]` (which is now disallowed.)

## Test Plan

```bash
$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
   --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$  pytest -s -v -k together test_embeddings.py \
   --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
   --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784
```

Also ran `tests/client-sdk/inference/test_embeddings.py`
2025-02-20 21:43:13 -08:00
ehhuang
1166afdf76
fix: some telemetry APIs don't currently work (#1188)
Summary:

This bug is surfaced by using the http LS client. The issue is that
non-scalar values in 'GET' method are `body` params in fastAPI, but our
spec generation script doesn't respect that. We fix by just making them
POST method instead.

Test Plan:
Test API call with newly sync'd client
(https://github.com/meta-llama/llama-stack-client-python/pull/149)

<img width="1114" alt="image"
src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2"
/>
2025-02-20 14:09:25 -08:00
Xi Yan
ea1faae50e
chore!: deprecate eval/tasks (#1186)
# What does this PR do?
- Fully deprecate eval/tasks

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1088 

NOTE: this will be a breaking change. We have introduced the new API in
0.1.3 .

Notebook has been updated to use the new endpoints.

## Test Plan
```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb 
```
<img width="611" alt="image"
src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f"
/>



cc @SLR722  for awareness

[//]: # (## Documentation)
2025-02-20 14:06:21 -08:00
Vladimir Ivić
f7161611c6
feat: adding endpoints for files and uploads (#1070)
Summary:
Adds spec definitions for file uploads operations.

This API focuses around two high level operations:
* Initiating and managing upload session
* Accessing uploaded file information

Usage examples:

To start a file upload session:
```
curl -X POST https://localhost:8321/v1/files \
-d '{
   "key": "image123.jpg',
   "bucket": "images",
   "mime_type": "image/jpg",
   "size": 12345
}'

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 0,
  "size": 12345
}

```

To upload file content to an existing session
```
curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \
  --data-binary @<path_to_local_file>

# Returns
{
  "key": "image123.jpg",
  "bucket": "images",
  "mime_type": "image/jpg",
  "bytes": 12345,
  "created_at": 1737492240
}

# Implementing on server side (Flask example for simplicity):
@app.route('/uploads/{upload_id}', methods=['POST'])
def upload_content_to_session(upload_id):
    try:
        # Get the binary file data from the request body
        file_data = request.data

        # Save the file to disk
        save_path = f"./uploads/{upload_id}"
        with open(save_path, 'wb') as f:
            f.write(file_data)
        return {__uploaded_file_json__}, 200
    except Exception as e:
        return 500

```

To read information about an existing upload session
```
curl -i -X GET "https://localhost:8321/v1/files/session:<session_id>

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 1024,
  "size": 12345
}
```

To list buckets
```
GET /files

# Returns
{
  "data": [
     {"name": "bucket1"},
     {"name": "bucket2"},
   ]
}
```

To list all files in a bucket
```
GET /files/{bucket}

# Returns
{
  "data": [
    {
      "key": "shiba.jpg",
      "bucket": "dogs",
      "mime_type": "image/jpg",
      "bytes": 82334,
      "created_at": 1737492240,
    },
    {
      "key": "persian_cat.jpg",
      "mime_type": "image/jpg",
      "bucket": "cats",
      "bytes": 39924,
      "created_at": 1727493440,
    },
  ]
}
```

To get specific file info
```
GET /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```

To delete specific file
```
DELETE /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```
2025-02-20 13:09:00 -08:00
Xi Yan
ca687d3e86 style: env var in build_venv 2025-02-19 22:32:59 -08:00
Xi Yan
61f43b8677
fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment (#1163)
# What does this PR do?
- resolves issue: #1159 
- Root cause: https://github.com/meta-llama/llama-stack/pull/980 forces
`build_venv.sh` to install in a venv environment, which do not work on
Colab notebook environment

<img width="1004" alt="image"
src="https://github.com/user-attachments/assets/1f9be409-5313-4926-b078-74e141cf29eb"
/>

## This PR
Use `UV_SYSTEM_PYTHON` to make sure dependencies are installed in
current system environment. Which will be used in the Colab environment.
```
UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv
```

## Test Plan
- Works in Colab environment
<img width="621" alt="image"
src="https://github.com/user-attachments/assets/ae93bc3d-e05a-44b9-bb21-fb88f29969b8"
/>
2025-02-19 22:21:16 -08:00
Francisco Arceo
2b752df79a
fix: Fixing some small issues with the build scripts (#1132)
# What does this PR do?
I was encountering build issues when building my `ollama` environment
using `llama stack build`

```bash
llama stack build --template ollama --image-type venv
Traceback (most recent call last):
  File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command
    return run_stack_build_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command
    _run_stack_build_command_from_build_config(
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config
    return_code = build_image(
                  ^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image
    return_code = run_with_pty(args)
                  ^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty
    return _run_with_pty_unix(command)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix
    process = subprocess.Popen(
              ^^^^^^^^^^^^^^^^^
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh'
make: *** [build-ollama] Error 1
```

I also had to adjust the script when testing the `common.sh` file
because it returned:

```bash
> source llama_stack/distribution/common.sh
llama_stack/distribution/common.sh:6: command not found: ^M
llama_stack/distribution/common.sh:50: parse error near `\n'
```
On my branch, I ran:
```bash
sed -i '' 's/\r$//' llama_stack/distribution/common.sh
```
And then I was able to successfully build the environment.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
N/A

[//]: # (## Documentation)
N/A

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 22:20:49 -08:00
ehhuang
8de7cf103b
feat: support tool_choice = {required, none, <function>} (#1059)
Summary:

titled


Test Plan:

added tests and

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-02-18 23:25:15 -05:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Xi Yan
e8cb9e0adb
fix: direct client pydantic type casting (#1145)
# What does this PR do?
- Closes #1142 
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`

## Test Plan
- Test with script described in issue. 

- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>


[//]: # (## Documentation)
2025-02-18 16:07:54 -08:00
Sébastien Han
369cc513cb
fix: improve stack build on venv (#980)
# What does this PR do?

Added a pre_run_checks function to ensure a smooth environment setup by
verifying prerequisites. It checks for an existing virtual environment,
ensures uv is installed, and deactivates any active environment if
necessary.

Run the full build inside a venv created by 'uv'.

Improved string handling in printf statements and added shellcheck
suppressions for expected word splitting in pip commands.

These enhancements improve robustness, prevent
conflicts, and ensure a seamless setup process.

Signed-off-by: Sébastien Han <seb@redhat.com>

- [ ] Addresses issue (#issue)


## Test Plan

Run the following command on either Linux or MacOS:

```
llama stack build --template ollama --image-type venv --image-name foo
+ build_name=foo
+ env_name=llamastack-foo
+ pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ RED='\033[0;31m'
+ NC='\033[0m'
+ ENVNAME=
+++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
+ SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution
+ source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh
+ pre_run_checks llamastack-foo
+ local env_name=llamastack-foo
+ is_command_available uv
+ command -v uv
+ '[' -d llamastack-foo ']'
+ run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ local env_name=llamastack-foo
+ local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ echo 'Creating new virtual environment llamastack-foo'
Creating new virtual environment llamastack-foo
+ uv venv llamastack-foo
Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: llamastack-foo
Activate with: source llamastack-foo/bin/activate
+ source llamastack-foo/bin/activate
++ '[' -n x ']'
++ SCRIPT_PATH=llamastack-foo/bin/activate
++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ '[' darwin24 = cygwin ']'
++ '[' darwin24 = msys ']'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ export PATH
++ '[' x '!=' x ']'
+++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ VIRTUAL_ENV_PROMPT='(llamastack-foo) '
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(llamastack-foo) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ '[' -n '' ']'
+ '[' -n '' ']'
+ uv pip install --no-cache-dir llama-stack
Using Python 3.13.1 environment at: llamastack-foo
Resolved 50 packages in 1.25s
   Built fire==0.7.0
Prepared 50 packages in 1.22s
Installed 50 packages in 126ms
 + annotated-types==0.7.0
 + anyio==4.8.0
 + blobfile==3.0.0
 + certifi==2025.1.31
 + charset-normalizer==3.4.1
 + click==8.1.8
 + distro==1.9.0
 + filelock==3.17.0
 + fire==0.7.0
 + fsspec==2025.2.0
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + huggingface-hub==0.28.1
 + idna==3.10
 + jinja2==3.1.5
 + llama-models==0.1.2
 + llama-stack==0.1.2
 + llama-stack-client==0.1.2
 + lxml==5.3.1
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + numpy==2.2.2
 + packaging==24.2
 + pandas==2.2.3
 + pillow==11.1.0
 + prompt-toolkit==3.0.50
 + pyaml==25.1.0
 + pycryptodomex==3.21.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + pygments==2.19.1
 + python-dateutil==2.9.0.post0
 + python-dotenv==1.0.1
 + pytz==2025.1
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests==2.32.3
 + rich==13.9.4
 + setuptools==75.8.0
 + six==1.17.0
 + sniffio==1.3.1
 + termcolor==2.5.0
 + tiktoken==0.8.0
 + tqdm==4.67.1
 + typing-extensions==4.12.2
 + tzdata==2025.1
 + urllib3==2.3.0
 + wcwidth==0.2.13
+ '[' -n '' ']'
+ printf 'Installing pip dependencies\n'
Installing pip dependencies
+ uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn
Using Python 3.13.1 environment at: llamastack-foo
Resolved 105 packages in 37ms
Uninstalled 2 packages in 65ms
Installed 72 packages in 195ms
 + aiohappyeyeballs==2.4.6
 + aiohttp==3.11.12
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + attrs==25.1.0
 + autoevals==0.0.119
 + backoff==2.2.1
 + braintrust-core==0.0.58
 + chardet==5.2.0
 + chevron==0.14.0
 + chromadb-client==0.6.3
 + contourpy==1.3.1
 + cycler==0.12.1
 + datasets==3.2.0
 + deprecated==1.2.18
 + dill==0.3.8
 + faiss-cpu==1.10.0
 + fastapi==0.115.8
 + fonttools==4.56.0
 + frozenlist==1.5.0
 - fsspec==2025.2.0
 + fsspec==2024.9.0
 + googleapis-common-protos==1.66.0
 + grpcio==1.70.0
 + importlib-metadata==8.5.0
 + jiter==0.8.2
 + joblib==1.4.2
 + jsonschema==4.23.0
 + jsonschema-specifications==2024.10.1
 + kiwisolver==1.4.8
 + levenshtein==0.26.1
 + matplotlib==3.10.0
 + monotonic==1.6
 + multidict==6.1.0
 + multiprocess==0.70.16
 + nltk==3.9.1
 - numpy==2.2.2
 + numpy==1.26.4
 + ollama==0.4.7
 + openai==1.61.1
 + opentelemetry-api==1.30.0
 + opentelemetry-exporter-otlp-proto-common==1.30.0
 + opentelemetry-exporter-otlp-proto-grpc==1.30.0
 + opentelemetry-exporter-otlp-proto-http==1.30.0
 + opentelemetry-proto==1.30.0
 + opentelemetry-sdk==1.30.0
 + opentelemetry-semantic-conventions==0.51b0
 + orjson==3.10.15
 + overrides==7.7.0
 + posthog==3.12.0
 + propcache==0.2.1
 + protobuf==5.29.3
 + psycopg2-binary==2.9.10
 + pyarrow==19.0.0
 + pyparsing==3.2.1
 + pypdf==5.3.0
 + rapidfuzz==3.12.1
 + redis==5.2.1
 + referencing==0.36.2
 + rpds-py==0.22.3
 + safetensors==0.5.2
 + scikit-learn==1.6.1
 + scipy==1.15.1
 + sentencepiece==0.2.0
 + starlette==0.45.3
 + tenacity==9.0.0
 + threadpoolctl==3.5.0
 + tokenizers==0.21.0
 + transformers==4.48.3
 + uvicorn==0.34.0
 + wrapt==1.17.2
 + xxhash==3.5.0
 + yarl==1.18.3
 + zipp==3.21.0
+ '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']'
+ IFS='#'
+ read -ra parts
+ for part in '"${parts[@]}"'
+ echo 'sentence-transformers --no-deps'
sentence-transformers --no-deps
+ uv pip install sentence-transformers --no-deps
Using Python 3.13.1 environment at: llamastack-foo
Resolved 1 package in 141ms
Installed 1 package in 6ms
 + sentence-transformers==3.4.1
+ for part in '"${parts[@]}"'
+ echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu'
torch torchvision --index-url https://download.pytorch.org/whl/cpu
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Using Python 3.13.1 environment at: llamastack-foo
Resolved 13 packages in 2.15s
Installed 5 packages in 324ms
 + mpmath==1.3.0
 + networkx==3.3
 + sympy==1.13.1
 + torch==2.6.0
 + torchvision==0.21.0
Build Successful!
```

Run:

```
$ source llamastack-foo/bin/activate
$ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s]
README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s]
config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s]
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API safety
 POST /v1/safety/run-shield

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [39145]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:22:03 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Hardik Shah
b0b696cb4f
fix: regex pattern matching to support :path suffix in the routes (#1089)
This PR fixes client sdk test failure --
3720312204

by updating the regex matching pattern to also consider `:path` in the
routes
2025-02-13 18:18:23 -08:00
Xi Yan
8b655e3cd2
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do?

- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- This change is backward compatible 
- Run notebook test with

```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```

<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>



[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 16:40:58 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Sébastien Han
418645696a
fix: improve signal handling and update dependencies (#1044)
# What does this PR do?
This commit enhances the signal handling mechanism in the server by
improving the `handle_signal` (previously handle_sigint) function. It
now properly retrieves the signal name, ensuring clearer logging when a
termination signal is received. Additionally, it cancels all running
tasks and waits for their completion before stopping the event loop,
allowing for a more graceful shutdown. Support for handling
SIGTERM has also been added alongside SIGINT.

Before the changes, handle_sigint used asyncio.run(run_shutdown()).
However, asyncio.run() is meant to start a new event loop, and calling
it inside an existing one (like when running Uvicorn) raises an error.
The fix replaces asyncio.run(run_shutdown()) with an async function
scheduled on the existing loop using loop.create_task(shutdown()). This
ensures that the shutdown coroutine runs within the current event loop
instead of trying to create a new one.

Furthermore, this commit updates the project dependencies. `fastapi` and
`uvicorn` have been added to the development dependencies in
`pyproject.toml` and `uv.lock`, ensuring that the necessary packages are
available for development and execution.

Closes: https://github.com/meta-llama/llama-stack/issues/1043
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run a server and send SIGINT:

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => ollama
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => sentence-transformers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  models => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inference => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-vector_io => faiss
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-safety => llama-guard
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  shields => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  safety => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_dbs => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_io => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => brave-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => tavily-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => code-interpreter
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => rag-runtime
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_groups => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_runtime => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  agents => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => huggingface
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => localfs
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasets => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasetio => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  telemetry => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => basic
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => llm-as-judge
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => braintrust
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring_functions => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-eval => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval_tasks => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inspect => __builtin__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: 
INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`...
INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss.
INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss.
INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes.
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available.
INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2...
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: 
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API safety
 POST /v1/safety/run-shield
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [65372]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Finished server process [65372]
Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl
```

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 08:07:59 -08:00
Charlie Doern
025f615868
feat: add support for running in a venv (#1018)
# What does this PR do?

add --image-type to `llama stack run`. Which takes conda, container or
venv also add start_venv.sh which start the stack using a venv

resolves #1007

## Test Plan

running locally:

`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
...
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
Run configuration:
apis:
- agents
- datasetio
...
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-12 11:13:04 -05:00
Hardik Shah
c335ed8765 raise when client initialize fails 2025-02-07 12:24:07 -08:00
Charlie Doern
2a4a612373
fix: Ensure a better error stack trace when llama-stack is not built (#950)
# What does this PR do?

currently this is the output when you run a distribution locally without
running `llama stack build`:

```
Traceback (most recent call last):
  File "/Users/charliedoern/Documents/llama-sdk.py", line 25, in <module>
    models = client.models.list()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 107, in list
    raise exc
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 95, in list
    return self._get(
           ^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/_base_client.py", line 1212, in get
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 168, in request
    return asyncio.run(self.async_client.request(*args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 258, in request
    if not self.endpoint_impls:
           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncLlamaStackAsLibraryClient' object has no attribute 'endpoint_impls'
```

the intended exception is never raised, add an except for an
AttributeError so users can catch when they call things like
`models.list()` and so that a more useful error telling them that the
client is not properly initialized is printed.

## Test Plan

Please describe:
- I ran the script found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk
locally with the changes in this PR and the exception was caught
successfully.

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-07 09:47:02 -08:00
Ashwin Bharambe
f8f2f7f9bb
feat: Add HTTPS serving option (#1000)
# What does this PR do?

Enables HTTPS option for Llama Stack. 

While doing so, introduces a `ServerConfig` sub-structure to house all
server related configuration (port, ssl, etc.)

Also simplified the `start_container.sh` entrypoint to simply be
`python` instead of a complex bash command line.

## Test Plan

Conda: 

Run:
```bash
$ llama stack build --template together
$ llama stack run --port 8322        # ensure server starts 

$ llama-stack-client configure --endpoint http://localhost:8322
$ llama-stack-client models list
```

Create a self-signed SSL key / cert pair. Then, using a local checkout
of `llama-stack-client-python`, change
https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759
to add `kwargs.setdefault("verify", False)` so SSL verification is
disabled. Then:

```bash
$ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE>
$ llama-stack-client configure --endpoint https://localhost:8322  # notice the `https`
$ llama-stack-client models list
```

Also tested with containers (but of course one needs to make sure the
cert and key files are appropriately provided to the container.)
2025-02-07 09:39:08 -08:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Charlie Doern
26aef50bc5
if client.initialize fails, the example should exit (#954)
# What does this PR do?

the example script can gracefully exit if the boolean returned from
initialize is used properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-04 13:54:21 -08:00
ehhuang
c9ab72fa82
Support sys_prompt behavior in inference (#937)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan

python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
2025-02-03 23:35:16 -08:00
Yuan Tang
7558678b8c
Fix uv pip install timeout issue for PyTorch (#929)
This fixes the following timeout issue when installing PyTorch via uv.
Also see reference: https://github.com/astral-sh/uv/pull/1694,
https://github.com/astral-sh/uv/issues/1549

```
Installing pip dependencies
Using Python 3.10.16 environment at: /home/yutang/.conda/envs/distribution-myenv
  × Failed to download and build `antlr4-python3-runtime==4.9.3`
  ├─▶ Failed to extract archive
  ├─▶ failed to unpack
  │   `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
  ├─▶ failed to unpack
  │   `antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py` into
  │   `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
  ├─▶ error decoding response body
  ├─▶ request or response body error
  ╰─▶ operation timed out
  help: `antlr4-python3-runtime` (v4.9.3) was included because `torchtune`
        (v0.5.0) depends on `omegaconf` (v2.3.0) which depends on
        `antlr4-python3-runtime>=4.9.dev0, <4.10.dev0`
Failed to build target distribution-myenv with return code 1
```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-03 06:39:35 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Yuan Tang
4773092dd1
Fix UBI9 image build when installing Python packages via uv (#926)
This was missed in https://github.com/meta-llama/llama-stack/pull/921. 

cc @ashwinb @hardikjshah

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-01 19:14:29 -08:00
Ashwin Bharambe
b03e093e80 Add a COPY option for copying source files into docker 2025-02-01 15:35:38 -08:00
Ashwin Bharambe
942e8b96ac Fix uv pip uninstall 2025-02-01 11:42:28 -08:00
Ashwin Bharambe
5b1e69e58e
Use uv pip install instead of pip install (#921)
## What does this PR do? 

See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.

## Test Plan 

First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.

Docker: `llama stack build --template together --image-type container`
works!
2025-01-31 22:29:41 -08:00
Ashwin Bharambe
0d96070af9
Update OpenAPI generator to add param and field documentation (#896)
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)

This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.

## Test Plan:

Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
2025-01-29 10:04:30 -08:00
Ashwin Bharambe
aee6237685 Small refactor for run_with_pty 2025-01-28 09:32:33 -08:00
Vladislav Bronzov
09299e908e
Add windows support for build execution (#889)
# What does this PR do?

This PR implements windows platform support for build_container.sh
execution from terminal. Additionally, it resolves "no support for
Terminos and PTY for Window PC" issues.

- [x] Addresses issue (#issue)
Releates issues: https://github.com/meta-llama/llama-stack/issues/826,
https://github.com/meta-llama/llama-stack/issues/726

## Test Plan

Changes were tested manually by executing standard scripts from LLama
guide:
- llama stack build --template ollama --image-type container
- llama stack build --list-templates
- llama stack build

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 07:41:41 -08:00
Ashwin Bharambe
891bf704eb
Ensure llama stack build --config <> --image-type <> works (#879)
Fix the issues brought up in
https://github.com/meta-llama/llama-stack/issues/870

Test all combinations of (conda, container) vs. (template, config)
combos.
2025-01-25 11:13:36 -08:00
Dinesh Yeduguru
cb11336886
remove logger handler only in notebook (#868)
remove logger handler only in notebook
2025-01-23 16:58:17 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Ashwin Bharambe
35c71d5bbe
Update OpenAPI generator to output discriminator (#848)
oneOf should have discriminators so Stainless can generate better code

## Test Plan

Going to generate the SDK now and check.
2025-01-22 22:15:23 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
a8345f5f76 Fix llama stack build docker creation to have correct entrypoint 2025-01-22 16:53:54 -08:00
Ashwin Bharambe
a63a43c646
[memory refactor][6/n] Update naming and routes (#839)
Making a few small naming changes as per feedback:

- RAGToolRuntime methods are called `insert` and `query` to keep them
more general
- The tool names are changed to non-namespaced forms
`insert_into_memory` and `query_from_memory`
- The REST endpoints are more REST-ful
2025-01-22 10:39:13 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00