mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
1160 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
e8cb9e0adb
|
fix: direct client pydantic type casting (#1145)
# What does this PR do? - Closes #1142 - Root cause is due to having `Union[str, AgenToolGroupWithArgs]` ## Test Plan - Test with script described in issue. - Print out final converted pydantic object <img width="1470" alt="image" src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6" /> [//]: # (## Documentation) |
||
|
8585b95a28 | rename | ||
|
4e76d312fa
|
fix: modify the model id title for model list (#1095)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Re-check and based on the doc, the download model id, actually is model descriptor(also without `meta-llama/`). https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html ``` $ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor Fetching 8 files: 0%| | 0/8 [00:00<?, ?it/s] LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s] $ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/ usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<--- $ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found $ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8 Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last): $ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com> |
||
|
d9f5beb15a
|
style: update download help text (#1135)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the cade:
|
||
|
92aefec191
|
style: update verify-download help text (#1134)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the code
|
||
|
89d37687dd
|
chore: remove --no-list-templates option (#1121)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] From the code and the usage, seems cannot see that need to use `--no-list-templates` to handle, and also make the user confused from the help text, so try to remove it. ``` $ llama stack build --no-list-templates > Enter a name for your Llama Stack (e.g. my-local-stack): $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): before: $ llama stack build --help --list-templates, --no-list-templates Show the available templates for building a Llama Stack distribution (default: False) after: --list-templates Show the available templates for building a Llama Stack distribution ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com> |
||
|
6b1773d530
|
docs: Fix incorrect link and command for generating API reference (#1124) | ||
|
743f434860
|
fix: Ensure a tool call can be converted before adding to buffer (#1119)
# What does this PR do?
This fixes an issue when running the e2e agent example:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py
```
| File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response
| tool_call = convert_tool_call(choice.delta.tool_calls[0])
| File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call
| return ToolCall(
| File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
| validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
| pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall
| call_id
| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/string_type
| tool_name.enum[BuiltinTool]
| Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/enum
| tool_name.str
| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/string_type
| arguments
| Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int]
| For further information visit https://errors.pydantic.dev/2.10/v/dict_type
```
This issue happened because not all arguments have been appended to the
tool call buffer yet. The current code assumes that we are ready to
convert the tool call whenever args can be converted to JSON
successfully. In this case, `json.loads("202")` would succeed but the
rest of the arguments have not been properly parsed yet.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
The e2e example worked successfully (although note that I ran the script
twice with each function call separately due to
https://github.com/meta-llama/llama-stack/issues/1120):
```
tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'}
tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]"
tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'}
tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton | whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"
|
||
|
ab2b46e528
|
feat: log start, complete time to Agent steps (#1116) | ||
|
8dc1cac333
|
style: fix the capitalization issue (#1117)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` before: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. After: $ llama stack run --help usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE] [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}] config Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com> |
||
|
ab210ec59e
|
Update README.md | ||
|
df864ee575
|
Update index.md to refer to v0.1.3 | ||
|
00613d9014
|
build: resync uv and deps on 0.1.3 (#1108)
# What does this PR do? The bot just updated the project to 0.1.3 in https://github.com/meta-llama/llama-stack/commits?author=github-actions%5Bbot%5D but the deps need to be synced. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
9b2fe6beb1 | Bump version to 0.1.3 | ||
|
3d88b81ccf
|
fix: remove the empty line (#1097)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Remove the empty line from help ``` before: $ llama model download --help --max-parallel MAX_PARALLEL Maximum number of concurrent downloads --ignore-patterns IGNORE_PATTERNS <<<<<<<<<empty line>>>>>>>>>> For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring safetensors files to avoid downloading duplicate weights. after: $ llama model download --help --max-parallel MAX_PARALLEL Maximum number of concurrent downloads --ignore-patterns IGNORE_PATTERNS For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring safetensors files to avoid downloading duplicate weights. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com> |
||
|
369cc513cb
|
fix: improve stack build on venv (#980)
# What does this PR do? Added a pre_run_checks function to ensure a smooth environment setup by verifying prerequisites. It checks for an existing virtual environment, ensures uv is installed, and deactivates any active environment if necessary. Run the full build inside a venv created by 'uv'. Improved string handling in printf statements and added shellcheck suppressions for expected word splitting in pip commands. These enhancements improve robustness, prevent conflicts, and ensure a seamless setup process. Signed-off-by: Sébastien Han <seb@redhat.com> - [ ] Addresses issue (#issue) ## Test Plan Run the following command on either Linux or MacOS: ``` llama stack build --template ollama --image-type venv --image-name foo + build_name=foo + env_name=llamastack-foo + pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + RED='\033[0;31m' + NC='\033[0m' + ENVNAME= +++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh + SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution + source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh + pre_run_checks llamastack-foo + local env_name=llamastack-foo + is_command_available uv + command -v uv + '[' -d llamastack-foo ']' + run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + local env_name=llamastack-foo + local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' + local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' + echo 'Creating new virtual environment llamastack-foo' Creating new virtual environment llamastack-foo + uv venv llamastack-foo Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13 Creating virtual environment at: llamastack-foo Activate with: source llamastack-foo/bin/activate + source llamastack-foo/bin/activate ++ '[' -n x ']' ++ SCRIPT_PATH=llamastack-foo/bin/activate ++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']' ++ deactivate nondestructive ++ unset -f pydoc ++ '[' -z '' ']' ++ '[' -z '' ']' ++ hash -r ++ '[' -z '' ']' ++ unset VIRTUAL_ENV ++ unset VIRTUAL_ENV_PROMPT ++ '[' '!' nondestructive = nondestructive ']' ++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ '[' darwin24 = cygwin ']' ++ '[' darwin24 = msys ']' ++ export VIRTUAL_ENV ++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand' ++ export PATH ++ '[' x '!=' x ']' +++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo ++ VIRTUAL_ENV_PROMPT='(llamastack-foo) ' ++ export VIRTUAL_ENV_PROMPT ++ '[' -z '' ']' ++ '[' -z '' ']' ++ _OLD_VIRTUAL_PS1= ++ PS1='(llamastack-foo) ' ++ export PS1 ++ alias pydoc ++ true ++ hash -r + '[' -n '' ']' + '[' -n '' ']' + uv pip install --no-cache-dir llama-stack Using Python 3.13.1 environment at: llamastack-foo Resolved 50 packages in 1.25s Built fire==0.7.0 Prepared 50 packages in 1.22s Installed 50 packages in 126ms + annotated-types==0.7.0 + anyio==4.8.0 + blobfile==3.0.0 + certifi==2025.1.31 + charset-normalizer==3.4.1 + click==8.1.8 + distro==1.9.0 + filelock==3.17.0 + fire==0.7.0 + fsspec==2025.2.0 + h11==0.14.0 + httpcore==1.0.7 + httpx==0.28.1 + huggingface-hub==0.28.1 + idna==3.10 + jinja2==3.1.5 + llama-models==0.1.2 + llama-stack==0.1.2 + llama-stack-client==0.1.2 + lxml==5.3.1 + markdown-it-py==3.0.0 + markupsafe==3.0.2 + mdurl==0.1.2 + numpy==2.2.2 + packaging==24.2 + pandas==2.2.3 + pillow==11.1.0 + prompt-toolkit==3.0.50 + pyaml==25.1.0 + pycryptodomex==3.21.0 + pydantic==2.10.6 + pydantic-core==2.27.2 + pygments==2.19.1 + python-dateutil==2.9.0.post0 + python-dotenv==1.0.1 + pytz==2025.1 + pyyaml==6.0.2 + regex==2024.11.6 + requests==2.32.3 + rich==13.9.4 + setuptools==75.8.0 + six==1.17.0 + sniffio==1.3.1 + termcolor==2.5.0 + tiktoken==0.8.0 + tqdm==4.67.1 + typing-extensions==4.12.2 + tzdata==2025.1 + urllib3==2.3.0 + wcwidth==0.2.13 + '[' -n '' ']' + printf 'Installing pip dependencies\n' Installing pip dependencies + uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn Using Python 3.13.1 environment at: llamastack-foo Resolved 105 packages in 37ms Uninstalled 2 packages in 65ms Installed 72 packages in 195ms + aiohappyeyeballs==2.4.6 + aiohttp==3.11.12 + aiosignal==1.3.2 + aiosqlite==0.21.0 + attrs==25.1.0 + autoevals==0.0.119 + backoff==2.2.1 + braintrust-core==0.0.58 + chardet==5.2.0 + chevron==0.14.0 + chromadb-client==0.6.3 + contourpy==1.3.1 + cycler==0.12.1 + datasets==3.2.0 + deprecated==1.2.18 + dill==0.3.8 + faiss-cpu==1.10.0 + fastapi==0.115.8 + fonttools==4.56.0 + frozenlist==1.5.0 - fsspec==2025.2.0 + fsspec==2024.9.0 + googleapis-common-protos==1.66.0 + grpcio==1.70.0 + importlib-metadata==8.5.0 + jiter==0.8.2 + joblib==1.4.2 + jsonschema==4.23.0 + jsonschema-specifications==2024.10.1 + kiwisolver==1.4.8 + levenshtein==0.26.1 + matplotlib==3.10.0 + monotonic==1.6 + multidict==6.1.0 + multiprocess==0.70.16 + nltk==3.9.1 - numpy==2.2.2 + numpy==1.26.4 + ollama==0.4.7 + openai==1.61.1 + opentelemetry-api==1.30.0 + opentelemetry-exporter-otlp-proto-common==1.30.0 + opentelemetry-exporter-otlp-proto-grpc==1.30.0 + opentelemetry-exporter-otlp-proto-http==1.30.0 + opentelemetry-proto==1.30.0 + opentelemetry-sdk==1.30.0 + opentelemetry-semantic-conventions==0.51b0 + orjson==3.10.15 + overrides==7.7.0 + posthog==3.12.0 + propcache==0.2.1 + protobuf==5.29.3 + psycopg2-binary==2.9.10 + pyarrow==19.0.0 + pyparsing==3.2.1 + pypdf==5.3.0 + rapidfuzz==3.12.1 + redis==5.2.1 + referencing==0.36.2 + rpds-py==0.22.3 + safetensors==0.5.2 + scikit-learn==1.6.1 + scipy==1.15.1 + sentencepiece==0.2.0 + starlette==0.45.3 + tenacity==9.0.0 + threadpoolctl==3.5.0 + tokenizers==0.21.0 + transformers==4.48.3 + uvicorn==0.34.0 + wrapt==1.17.2 + xxhash==3.5.0 + yarl==1.18.3 + zipp==3.21.0 + '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']' + IFS='#' + read -ra parts + for part in '"${parts[@]}"' + echo 'sentence-transformers --no-deps' sentence-transformers --no-deps + uv pip install sentence-transformers --no-deps Using Python 3.13.1 environment at: llamastack-foo Resolved 1 package in 141ms Installed 1 package in 6ms + sentence-transformers==3.4.1 + for part in '"${parts[@]}"' + echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu' torch torchvision --index-url https://download.pytorch.org/whl/cpu + uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu Using Python 3.13.1 environment at: llamastack-foo Resolved 13 packages in 2.15s Installed 5 packages in 324ms + mpmath==1.3.0 + networkx==3.3 + sympy==1.13.1 + torch==2.6.0 + torchvision==0.21.0 Build Successful! ``` Run: ``` $ source llamastack-foo/bin/activate $ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' Warning: `bwrap` is not available. Code interpreter tool will not work correctly. modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s] config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s] README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s] sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s] config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s] model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s] tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s] vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s] tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s] special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s] 1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s] Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API safety POST /v1/safety/run-shield Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [39145] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
64328bfe62
|
fix: enable_session_persistence in AgentConfig should be optional (#1012)
# What does this PR do? This issue was discovered in https://github.com/meta-llama/llama-stack/pull/1009#discussion_r1947036518. ## Test Plan This field is no longer required after the change. [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
314ee09ae3
|
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ``` |
||
|
c0ee512980
|
build: configure ruff from pyproject.toml (#1100)
# What does this PR do? - Remove hardcoded configurations from pre-commit. - Allow configuration to be set via pyproject.toml. - Merge .ruff.toml settings into pyproject.toml. - Ensure the linter and formatter use the defined configuration instead of being overridden by pre-commit. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
a3cb039e83
|
docs: Add region parameter to Bedrock provider (#1103)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) |
||
|
406465622e
|
fix: Update QdrantConfig to QdrantVectorIOConfig (#1104)
# What does this PR do? This fixes an import introduced due to merging #1079 before #1039, and thus the changes from #1039 needing to update `QdrantConfig` to `QdrantVectorIOConfig`. ## Test Plan I ran the remote vllm provider inference tests against the latest main: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` That failed with: ``` File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module> from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py) ``` After this change, the import no longer fails and the tests pass. Signed-off-by: Ben Browning <bbrownin@redhat.com> |
||
|
2f7268b790
|
fix: add the missed help description info (#1096) | ||
|
b27c41fe39
|
fix: disable sqlite-vec test (#1090)
# What does this PR do? - sqlite_vec not added to all template yet, disable test for now to unblock release cut [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="846" alt="image" src="https://github.com/user-attachments/assets/fa896497-f37c-4cdf-bc62-21893afbd392" /> [//]: # (## Documentation) |
||
|
b0b696cb4f
|
fix: regex pattern matching to support :path suffix in the routes (#1089)
This PR fixes client sdk test failure --
|
||
|
da53dc3f5f
|
fix: openapi for eval-task (#1085)
# What does this PR do? - as title ## Test Plan - the deprecated endpoint need to obey what it was before [//]: # (## Documentation) |
||
|
2a8e199e10 | fix notebook | ||
|
8b655e3cd2
|
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do? - Update `/eval-tasks` to `/benchmarks` - ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Ben Browning <ben324@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu <reid201711@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
225dd38e5c
|
test: add test for Agent.create_turn non-streaming response (#1078)
Summary: This tests the fix to the SDK in https://github.com/meta-llama/llama-stack-client-python/pull/141 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B |
||
|
32d1e50a6f
|
test: Add qdrant to provider tests (#1039)
# What does this PR do? This is a follow on to #1022 . It includes the changes I needed to be able to test the Qdrant support as requested by @terrytangyuan . I uncovered a lot of bigger, more systemic issues with the vector DB testing and I will open a new issue for those. For now, I am just delivering the work I already did on that. ## Test Plan As discussed on #1022: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` ``` ollama pull all-minilm:l6-v2 curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}' ``` ``` EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings ``` These show 3 tests passing and 15 deselected which is presumably working as intended. --------- Signed-off-by: Bill Murdock <bmurdock@redhat.com> |
||
|
5858777ff0
|
fix: Update VectorIO config classes in registry (#1079)
This was missed in https://github.com/meta-llama/llama-stack/pull/1023. ``` Traceback (most recent call last): File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module> main() File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main impls = asyncio.run(construct_stack(config)) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry) File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls impl = await instantiate_provider( File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider config_type = instantiate_class_type(provider_spec.config_class) File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type return getattr(module, class_name) AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig' ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
aebd130b08
|
docs: Fix url to the llama-stack-spec yaml/html files (#1081)
# What does this PR do? Fixes urls in the rfc doc (RFC-0001-llama-stack.md) Also fixes minor markdown linting issues Signed-off-by: Anil Vishnoi <vishnoianil@gmail.com> |
||
|
efdd60014d
|
test: Enable logprobs top_k tests for remote::vllm (#1080)
top_k supported was added in https://github.com/meta-llama/llama-stack/pull/1074. The tests should be enabled as well. Verified that tests pass for remote::vllm: ``` LLAMA_STACK_BASE_URL=http://localhost:5003 pytest -v tests/client-sdk/inference/test_text_inference.py -k " test_completion_log_probs_non_streaming or test_completion_log_probs_streaming" ================================================================ test session starts ================================================================ platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 14 items / 12 deselected / 2 selected tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] =================================================== 2 passed, 12 deselected, 1 warning in 10.03s ==================================================== ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
8ff27b58fa
|
chore: Consistent naming for VectorIO providers (#1023)
# What does this PR do? This changes all VectorIO providers classes to follow the pattern `<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All API endpoints for VectorIOs are currently consistent with `/vector-io`. Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. ## Test Plan I don't have a way to test all providers. This is a simple renaming so things should work as expected. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
e4a1579e63
|
build: format codebase imports using ruff linter (#1028)
# What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
1527c30107
|
fix: remove :path in agents (#1077)
# What does this PR do? Remove :path in agents, we cannot have :path in params inside endpoints except last one ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] ``` llama stack run ``` [//]: # (## Documentation) |
||
|
f9ca441974
|
chore: Link to Groq docs in the warning message for preview model (#1060)
This should be `llama-3.2-3b` instead of `llama-3.2-3b-instruct`. |
||
|
2fa9e3c941
|
fix: make backslash work in GET /models/{model_id:path} (#1068) | ||
|
47fccf0d03
|
style: update model id in model list title (#1072)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since the subcommands used `MODEL_ID`, it would be better to use it in `model list` and make it easy to find it. ``` $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID << $ llama model describe --help usage: llama model describe [-h] -m MODEL_ID << $ llama download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models before: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ | Model Descriptor | Hugging Face Repo | Context Length | +-----------------------------------------+-----------------------------------------------------+----------------+ after: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ | Model Descriptor | Model ID | Context Length | +-----------------------------------------+-----------------------------------------------------+----------------+ | Llama3.1-8B | meta-llama/Llama-3.1-8B | 128K | +-----------------------------------------+-----------------------------------------------------+----------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com> |
||
|
418645696a
|
fix: improve signal handling and update dependencies (#1044)
# What does this PR do? This commit enhances the signal handling mechanism in the server by improving the `handle_signal` (previously handle_sigint) function. It now properly retrieves the signal name, ensuring clearer logging when a termination signal is received. Additionally, it cancels all running tasks and waits for their completion before stopping the event loop, allowing for a more graceful shutdown. Support for handling SIGTERM has also been added alongside SIGINT. Before the changes, handle_sigint used asyncio.run(run_shutdown()). However, asyncio.run() is meant to start a new event loop, and calling it inside an existing one (like when running Uvicorn) raises an error. The fix replaces asyncio.run(run_shutdown()) with an async function scheduled on the existing loop using loop.create_task(shutdown()). This ensures that the shutdown coroutine runs within the current event loop instead of trying to create a new one. Furthermore, this commit updates the project dependencies. `fastapi` and `uvicorn` have been added to the development dependencies in `pyproject.toml` and `uv.lock`, ensuring that the necessary packages are available for development and execution. Closes: https://github.com/meta-llama/llama-stack/issues/1043 Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run a server and send SIGINT: ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss. INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss. INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available. INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2... INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API safety POST /v1/safety/run-shield Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [65372] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ^CINFO: Shutting down INFO: Finished server process [65372] Received signal SIGINT (2). Exiting gracefully... INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
dd1a366347
|
fix: logprobs support in remote-vllm provider (#1074)
# What does this PR do? The remote-vllm provider was not passing logprobs options from CompletionRequest or ChatCompletionRequests through to the OpenAI client parameters. I manually verified this, as well as observed this provider failing `TestInference::test_completion_logprobs`. This was filed as issue #1073. This fixes that by passing the `logprobs.top_k` value through to the parameters we pass into the OpenAI client. Additionally, this fixes a bug in `test_text_inference.py` where it mistakenly assumed chunk.delta were of type `ContentDelta` for completion requests. The deltas are of type `ContentDelta` for chat completion requests, but for basic completion requests the deltas are of type string. This test was likely failing for other providers that did properly support logprobs because of this latter issue in the test, which was hit while fixing the above issue with the remote-vllm provider. (Closes #1073) ## Test Plan First, you need a vllm running. I ran one locally like this: ``` vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json ``` Next, run test_text_inference.py against this vllm using the remote vllm provider like this: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` Before my change, the test failed with this error: ``` llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs assert 1 <= len(response.logprobs) <= 5 E TypeError: object of type 'NoneType' has no len() ``` After my change, the test passes. [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com> |
||
|
8c01b7f05a
|
docs: Mention convential commits format in CONTRIBUTING.md (#1075)
# What does this PR do? This adds a note to ensure pull requests follow the conventional commits format, along with a link to that format, in CONTRIBUTING.md. One of the pull-request checks enforces PR titles that match this format, so it's good to be upfront about this expectation before a new developer opens a PR. Signed-off-by: Ben Browning <bbrownin@redhat.com> |
||
|
cc700b2f68
|
feat: support listing all for llama stack list-providers (#1056)
# What does this PR do? Support listing all for `llama stack list-providers`. For ease of reading, sort the output rows by type. Before the change. ``` llama stack list-providers usage: llama stack list-providers [-h] {inference,safety,agents,vector_io,datasetio,scoring,eval,post_training,tool_runtime,telemetry} llama stack list-providers: error: the following arguments are required: api ``` After the change. ``` +---------------+----------------------------------+----------------------------------------------------------------------------------+ | API Type | Provider Type | PIP Package Dependencies | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | agents | inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | datasetio | inline::localfs | pandas | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | datasetio | remote::huggingface | datasets | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | eval | inline::meta-reference | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::meta-reference | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- | | | | enforcer,sentence-transformers | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::meta-reference-quantized | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- | | | | enforcer,sentence-transformers,fbgemm-gpu,torchao==0.5.0 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::sentence-transformers | sentence-transformers | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::vllm | vllm | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::bedrock | boto3 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::cerebras | cerebras_cloud_sdk | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::databricks | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::fireworks | fireworks-ai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::groq | groq | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::hf::endpoint | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::hf::serverless | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::nvidia | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::ollama | ollama,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::runpod | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::sambanova | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::tgi | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::together | together | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::vllm | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | post_training | inline::torchtune | torch,torchtune==0.5.0,torchao==0.8.0,numpy | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::code-scanner | codeshield | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::llama-guard | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::meta-reference | transformers,torch --index-url https://download.pytorch.org/whl/cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::prompt-guard | transformers,torch --index-url https://download.pytorch.org/whl/cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | remote::bedrock | boto3 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::basic | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::braintrust | autoevals,openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::llm-as-judge | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | telemetry | inline::meta-reference | opentelemetry-sdk,opentelemetry-exporter-otlp-proto-http | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | inline::code-interpreter | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | inline::rag-runtime | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::bing-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::brave-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::model-context-protocol | mcp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::tavily-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::wolfram-alpha | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::chromadb | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::faiss | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::meta-reference | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::chromadb | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::pgvector | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no- | | | | deps,psycopg2-binary | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::qdrant | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,qdrant- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::weaviate | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,weaviate- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> |
||
|
119fe8742a
|
feat: Adding sqlite-vec as a vectordb (#1040)
# What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
025f615868
|
feat: add support for running in a venv (#1018)
# What does this PR do? add --image-type to `llama stack run`. Which takes conda, container or venv also add start_venv.sh which start the stack using a venv resolves #1007 ## Test Plan running locally: `llama stack build --template ollama --image-type venv` `llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml` ... ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml Run configuration: apis: - agents - datasetio ... ``` Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
5f88ff0b6a
|
fix: show proper help text (#1065)
# What does this PR do? when executing a sub-command like `llama model` the improper help text, sub-commands, and flags are displayed. each command group needs to have `.set_defaults` to display this info properly before: ``` llama model usage: llama [-h] {model,stack,download,verify-download} ... Welcome to the Llama CLI options: -h, --help show this help message and exit subcommands: {model,stack,download,verify-download} ``` after: ``` llama model usage: llama model [-h] {download,list,prompt-format,describe,verify-download} ... Work with llama models options: -h, --help show this help message and exit model_subcommands: {download,list,prompt-format,describe,verify-download} ``` Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
5e97dd9919
|
feat: Support tool calling for streaming chat completion in remote vLLM provider (#1063)
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Closes https://github.com/meta-llama/llama-stack/issues/1046. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 14 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 7%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 14%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 21%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 28%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 35%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 42%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 57%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 64%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 71%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 78%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 85%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] PASSED [ 92%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] PASSED [100%] =============================================== 12 passed, 2 xfailed, 1 warning in 366.56s (0:06:06) ================================================ ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
bf11cc0450
|
chore: update return type to Optional[str] (#982) | ||
|
66d7e15c93
|
perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041)
# What does this PR do?
**Problem**
- Using script:
https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085
- This hits an issue on server with `code_interpreter` not found, as we
do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`.
This is a general issue where model always tries to output
`code_interpreter` in `ToolCall` even when we do not have
`code_interpreter` available for execution.
**Reproduce Deeper Problem in chat-completion**
- Use script:
https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603
1. We currently always populate `code_interpreter` in `ToolCall` in
ChatCompletionResponse if the model's response begins with
`<|python_tag|>`. See
|
||
|
dd37e58868
|
feat: Support tool calling for non-streaming chat completion in remote vLLM provider (#1034)
# What does this PR do? This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider. ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 12 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 8%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 16%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 41%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
24385cfd03
|
fix: filter out remote::sample providers when listing (#1057)
# What does this PR do? Before: ``` llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ | Provider Type | PIP Package Dependencies | +------------------------+-----------------------------------------------------------------------+ | inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis | +------------------------+-----------------------------------------------------------------------+ | remote::sample | | +------------------------+-----------------------------------------------------------------------+ ``` After: ``` llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ | Provider Type | PIP Package Dependencies | +------------------------+-----------------------------------------------------------------------+ | inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis | +------------------------+-----------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> |