Ashwin Bharambe
46b0a404e8
chore: remove straggler references to llama-models ( #1345 )
...
Straggler references cleanup
2025-03-01 14:26:03 -08:00
Charlie Doern
9b130f96a7
fix: build_venv expects an extra argument ( #1233 )
...
# What does this PR do?
currently, build_venv.sh expects a `distribution_type` as the first
argument but the only things ever passed are:
1. image name
2. pip dependencies
so distribution_type is never passed in meaning the script errors when
calling something like:
`llama stack build --image-type venv --template ollama --image-name
test`
before output:
```
llama stack build --image-type venv --template ollama --image-name venv-test
Usage: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> <env_name> <pip_dependencies> [<special_pip_deps>]
Example: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> mybuild ./my-stack-build.yaml 'numpy pandas scipy'
Failed to build target venv-test with return code 1
Run config path is empty
```
after:
```
llama stack build --image-type venv --template ollama --image-name venv-test
Environment 'venv-test' already exists, re-using it.
Using virtual environment venv-test
Using CPython 3.13.0 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: venv-test
Activate with: source venv-test/bin/activate
Using Python 3.13.0 environment at: venv-test
Resolved 55 packages in 640ms
Built fire==0.7.0
Prepared 54 packages in 1.14s
Installed 55 packages in 82ms
+ annotated-types==0.7.0
```
## Test Plan
ran locally with output above
Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-25 11:08:50 -08:00
Sébastien Han
c4987bc349
fix: avoid failure when no special pip deps and better exit ( #1228 )
...
# What does this PR do?
When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:
1. Environment name
2. Pip dependencies
Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
This command shouldn't fail:
```
llama stack build --template ollama --image-type venv
```
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-24 13:18:52 -05:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier ( #1225 )
...
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Xi Yan
ca687d3e86
style: env var in build_venv
2025-02-19 22:32:59 -08:00
Xi Yan
61f43b8677
fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment ( #1163 )
...
# What does this PR do?
- resolves issue: #1159
- Root cause: https://github.com/meta-llama/llama-stack/pull/980 forces
`build_venv.sh` to install in a venv environment, which do not work on
Colab notebook environment
<img width="1004" alt="image"
src="https://github.com/user-attachments/assets/1f9be409-5313-4926-b078-74e141cf29eb "
/>
## This PR
Use `UV_SYSTEM_PYTHON` to make sure dependencies are installed in
current system environment. Which will be used in the Colab environment.
```
UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv
```
## Test Plan
- Works in Colab environment
<img width="621" alt="image"
src="https://github.com/user-attachments/assets/ae93bc3d-e05a-44b9-bb21-fb88f29969b8 "
/>
2025-02-19 22:21:16 -08:00
Sébastien Han
369cc513cb
fix: improve stack build on venv ( #980 )
...
# What does this PR do?
Added a pre_run_checks function to ensure a smooth environment setup by
verifying prerequisites. It checks for an existing virtual environment,
ensures uv is installed, and deactivates any active environment if
necessary.
Run the full build inside a venv created by 'uv'.
Improved string handling in printf statements and added shellcheck
suppressions for expected word splitting in pip commands.
These enhancements improve robustness, prevent
conflicts, and ensure a seamless setup process.
Signed-off-by: Sébastien Han <seb@redhat.com>
- [ ] Addresses issue (#issue)
## Test Plan
Run the following command on either Linux or MacOS:
```
llama stack build --template ollama --image-type venv --image-name foo
+ build_name=foo
+ env_name=llamastack-foo
+ pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ RED='\033[0;31m'
+ NC='\033[0m'
+ ENVNAME=
+++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
+ SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution
+ source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh
+ pre_run_checks llamastack-foo
+ local env_name=llamastack-foo
+ is_command_available uv
+ command -v uv
+ '[' -d llamastack-foo ']'
+ run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu '
+ local env_name=llamastack-foo
+ local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu '
+ echo 'Creating new virtual environment llamastack-foo'
Creating new virtual environment llamastack-foo
+ uv venv llamastack-foo
Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: llamastack-foo
Activate with: source llamastack-foo/bin/activate
+ source llamastack-foo/bin/activate
++ '[' -n x ']'
++ SCRIPT_PATH=llamastack-foo/bin/activate
++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ '[' darwin24 = cygwin ']'
++ '[' darwin24 = msys ']'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ export PATH
++ '[' x '!=' x ']'
+++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ VIRTUAL_ENV_PROMPT='(llamastack-foo) '
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(llamastack-foo) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ '[' -n '' ']'
+ '[' -n '' ']'
+ uv pip install --no-cache-dir llama-stack
Using Python 3.13.1 environment at: llamastack-foo
Resolved 50 packages in 1.25s
Built fire==0.7.0
Prepared 50 packages in 1.22s
Installed 50 packages in 126ms
+ annotated-types==0.7.0
+ anyio==4.8.0
+ blobfile==3.0.0
+ certifi==2025.1.31
+ charset-normalizer==3.4.1
+ click==8.1.8
+ distro==1.9.0
+ filelock==3.17.0
+ fire==0.7.0
+ fsspec==2025.2.0
+ h11==0.14.0
+ httpcore==1.0.7
+ httpx==0.28.1
+ huggingface-hub==0.28.1
+ idna==3.10
+ jinja2==3.1.5
+ llama-models==0.1.2
+ llama-stack==0.1.2
+ llama-stack-client==0.1.2
+ lxml==5.3.1
+ markdown-it-py==3.0.0
+ markupsafe==3.0.2
+ mdurl==0.1.2
+ numpy==2.2.2
+ packaging==24.2
+ pandas==2.2.3
+ pillow==11.1.0
+ prompt-toolkit==3.0.50
+ pyaml==25.1.0
+ pycryptodomex==3.21.0
+ pydantic==2.10.6
+ pydantic-core==2.27.2
+ pygments==2.19.1
+ python-dateutil==2.9.0.post0
+ python-dotenv==1.0.1
+ pytz==2025.1
+ pyyaml==6.0.2
+ regex==2024.11.6
+ requests==2.32.3
+ rich==13.9.4
+ setuptools==75.8.0
+ six==1.17.0
+ sniffio==1.3.1
+ termcolor==2.5.0
+ tiktoken==0.8.0
+ tqdm==4.67.1
+ typing-extensions==4.12.2
+ tzdata==2025.1
+ urllib3==2.3.0
+ wcwidth==0.2.13
+ '[' -n '' ']'
+ printf 'Installing pip dependencies\n'
Installing pip dependencies
+ uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn
Using Python 3.13.1 environment at: llamastack-foo
Resolved 105 packages in 37ms
Uninstalled 2 packages in 65ms
Installed 72 packages in 195ms
+ aiohappyeyeballs==2.4.6
+ aiohttp==3.11.12
+ aiosignal==1.3.2
+ aiosqlite==0.21.0
+ attrs==25.1.0
+ autoevals==0.0.119
+ backoff==2.2.1
+ braintrust-core==0.0.58
+ chardet==5.2.0
+ chevron==0.14.0
+ chromadb-client==0.6.3
+ contourpy==1.3.1
+ cycler==0.12.1
+ datasets==3.2.0
+ deprecated==1.2.18
+ dill==0.3.8
+ faiss-cpu==1.10.0
+ fastapi==0.115.8
+ fonttools==4.56.0
+ frozenlist==1.5.0
- fsspec==2025.2.0
+ fsspec==2024.9.0
+ googleapis-common-protos==1.66.0
+ grpcio==1.70.0
+ importlib-metadata==8.5.0
+ jiter==0.8.2
+ joblib==1.4.2
+ jsonschema==4.23.0
+ jsonschema-specifications==2024.10.1
+ kiwisolver==1.4.8
+ levenshtein==0.26.1
+ matplotlib==3.10.0
+ monotonic==1.6
+ multidict==6.1.0
+ multiprocess==0.70.16
+ nltk==3.9.1
- numpy==2.2.2
+ numpy==1.26.4
+ ollama==0.4.7
+ openai==1.61.1
+ opentelemetry-api==1.30.0
+ opentelemetry-exporter-otlp-proto-common==1.30.0
+ opentelemetry-exporter-otlp-proto-grpc==1.30.0
+ opentelemetry-exporter-otlp-proto-http==1.30.0
+ opentelemetry-proto==1.30.0
+ opentelemetry-sdk==1.30.0
+ opentelemetry-semantic-conventions==0.51b0
+ orjson==3.10.15
+ overrides==7.7.0
+ posthog==3.12.0
+ propcache==0.2.1
+ protobuf==5.29.3
+ psycopg2-binary==2.9.10
+ pyarrow==19.0.0
+ pyparsing==3.2.1
+ pypdf==5.3.0
+ rapidfuzz==3.12.1
+ redis==5.2.1
+ referencing==0.36.2
+ rpds-py==0.22.3
+ safetensors==0.5.2
+ scikit-learn==1.6.1
+ scipy==1.15.1
+ sentencepiece==0.2.0
+ starlette==0.45.3
+ tenacity==9.0.0
+ threadpoolctl==3.5.0
+ tokenizers==0.21.0
+ transformers==4.48.3
+ uvicorn==0.34.0
+ wrapt==1.17.2
+ xxhash==3.5.0
+ yarl==1.18.3
+ zipp==3.21.0
+ '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu ' ']'
+ IFS='#'
+ read -ra parts
+ for part in '"${parts[@]}"'
+ echo 'sentence-transformers --no-deps'
sentence-transformers --no-deps
+ uv pip install sentence-transformers --no-deps
Using Python 3.13.1 environment at: llamastack-foo
Resolved 1 package in 141ms
Installed 1 package in 6ms
+ sentence-transformers==3.4.1
+ for part in '"${parts[@]}"'
+ echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu '
torch torchvision --index-url https://download.pytorch.org/whl/cpu
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Using Python 3.13.1 environment at: llamastack-foo
Resolved 13 packages in 2.15s
Installed 5 packages in 324ms
+ mpmath==1.3.0
+ networkx==3.3
+ sympy==1.13.1
+ torch==2.6.0
+ torchvision==0.21.0
Build Successful!
```
Run:
```
$ source llamastack-foo/bin/activate
$ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
db_path: /Users/leseb/.llama/distributions/ollama/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-3B-Instruct
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- llm
provider_id: ollama
provider_model_id: null
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- embedding
provider_id: sentence-transformers
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
datasetio:
- config: {}
provider_id: huggingface
provider_type: remote::huggingface
- config: {}
provider_id: localfs
provider_type: inline::localfs
eval:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
url: http://localhost:11434
provider_id: ollama
provider_type: remote::ollama
- config: {}
provider_id: sentence-transformers
provider_type: inline::sentence-transformers
safety:
- config: {}
provider_id: llama-guard
provider_type: inline::llama-guard
scoring:
- config: {}
provider_id: basic
provider_type: inline::basic
- config: {}
provider_id: llm-as-judge
provider_type: inline::llm-as-judge
- config:
openai_api_key: '********'
provider_id: braintrust
provider_type: inline::braintrust
telemetry:
- config:
service_name: llama-stack
sinks: console,sqlite
sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
provider_id: meta-reference
provider_type: inline::meta-reference
tool_runtime:
- config:
api_key: '********'
max_results: 3
provider_id: brave-search
provider_type: remote::brave-search
- config:
api_key: '********'
max_results: 3
provider_id: tavily-search
provider_type: remote::tavily-search
- config: {}
provider_id: code-interpreter
provider_type: inline::code-interpreter
- config: {}
provider_id: rag-runtime
provider_type: inline::rag-runtime
vector_io:
- config:
kvstore:
db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
namespace: null
type: sqlite
provider_id: faiss
provider_type: inline::faiss
scoring_fns: []
server:
port: 8321
tls_certfile: null
tls_keyfile: null
shields: []
tool_groups:
- args: null
mcp_endpoint: null
provider_id: tavily-search
toolgroup_id: builtin::websearch
- args: null
mcp_endpoint: null
provider_id: rag-runtime
toolgroup_id: builtin::rag
- args: null
mcp_endpoint: null
provider_id: code-interpreter
toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s]
README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s]
config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s]
Serving API tool_groups
GET /v1/tools/{tool_name}
GET /v1/toolgroups/{toolgroup_id}
GET /v1/toolgroups
GET /v1/tools
POST /v1/toolgroups
DELETE /v1/toolgroups/{toolgroup_id}
Serving API tool_runtime
POST /v1/tool-runtime/invoke
GET /v1/tool-runtime/list-tools
POST /v1/tool-runtime/rag-tool/insert
POST /v1/tool-runtime/rag-tool/query
Serving API vector_io
POST /v1/vector-io/insert
POST /v1/vector-io/query
Serving API telemetry
GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
GET /v1/telemetry/spans/{span_id}/tree
GET /v1/telemetry/traces/{trace_id}
POST /v1/telemetry/events
GET /v1/telemetry/spans
GET /v1/telemetry/traces
POST /v1/telemetry/spans/export
Serving API models
GET /v1/models/{model_id}
GET /v1/models
POST /v1/models
DELETE /v1/models/{model_id}
Serving API eval
POST /v1/eval/tasks/{task_id}/evaluations
DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
GET /v1/eval/tasks/{task_id}/jobs/{job_id}
POST /v1/eval/tasks/{task_id}/jobs
Serving API datasets
GET /v1/datasets/{dataset_id}
GET /v1/datasets
POST /v1/datasets
DELETE /v1/datasets/{dataset_id}
Serving API scoring_functions
GET /v1/scoring-functions/{scoring_fn_id}
GET /v1/scoring-functions
POST /v1/scoring-functions
Serving API inspect
GET /v1/health
GET /v1/inspect/providers
GET /v1/inspect/routes
GET /v1/version
Serving API scoring
POST /v1/scoring/score
POST /v1/scoring/score-batch
Serving API shields
GET /v1/shields/{identifier}
GET /v1/shields
POST /v1/shields
Serving API vector_dbs
GET /v1/vector-dbs/{vector_db_id}
GET /v1/vector-dbs
POST /v1/vector-dbs
DELETE /v1/vector-dbs/{vector_db_id}
Serving API eval_tasks
GET /v1/eval-tasks/{eval_task_id}
GET /v1/eval-tasks
POST /v1/eval-tasks
Serving API agents
POST /v1/agents
POST /v1/agents/{agent_id}/session
POST /v1/agents/{agent_id}/session/{session_id}/turn
DELETE /v1/agents/{agent_id}
DELETE /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API inference
POST /v1/inference/chat-completion
POST /v1/inference/completion
POST /v1/inference/embeddings
Serving API datasetio
POST /v1/datasetio/rows
GET /v1/datasetio/rows
Serving API safety
POST /v1/safety/run-shield
Listening on ['::', '0.0.0.0']:5001
INFO: Started server process [39145]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:22:03 -08:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell
( #978 )
...
- Added new template `dell` and its documentation
- Update docs
- [minor] uv fix i came across
- codegen for all templates
Tested with
```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT ](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321
# build the stack template
llama stack build --template=dell
# start the TGI inference server
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference ) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0
# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)
# build docker
llama stack build --template=dell --image-type=container
# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
# test the server
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py
```
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Yuan Tang
7558678b8c
Fix uv pip install timeout issue for PyTorch ( #929 )
...
This fixes the following timeout issue when installing PyTorch via uv.
Also see reference: https://github.com/astral-sh/uv/pull/1694 ,
https://github.com/astral-sh/uv/issues/1549
```
Installing pip dependencies
Using Python 3.10.16 environment at: /home/yutang/.conda/envs/distribution-myenv
× Failed to download and build `antlr4-python3-runtime==4.9.3`
├─▶ Failed to extract archive
├─▶ failed to unpack
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ failed to unpack
│ `antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py` into
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ error decoding response body
├─▶ request or response body error
╰─▶ operation timed out
help: `antlr4-python3-runtime` (v4.9.3) was included because `torchtune`
(v0.5.0) depends on `omegaconf` (v2.3.0) which depends on
`antlr4-python3-runtime>=4.9.dev0, <4.10.dev0`
Failed to build target distribution-myenv with return code 1
```
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-03 06:39:35 -08:00
Ashwin Bharambe
5b1e69e58e
Use uv pip install
instead of pip install
( #921 )
...
## What does this PR do?
See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.
## Test Plan
First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.
Docker: `llama stack build --template together --image-type container`
works!
2025-01-31 22:29:41 -08:00
Ashwin Bharambe
e951852848
Miscellaneous fixes around telemetry, library client and run yaml autogen
...
Also add a `venv` image-type for llama stack build
2024-12-08 20:40:22 -08:00