Commit graph

44 commits

Author SHA1 Message Date
Reid
89d37687dd
chore: remove --no-list-templates option (#1121)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

From the code and the usage, seems cannot see that need to use
`--no-list-templates` to handle, and also make the user confused from
the help text, so try to remove it.
```
$ llama stack build --no-list-templates
> Enter a name for your Llama Stack (e.g. my-local-stack):

$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack):

before:
$ llama stack build --help
  --list-templates, --no-list-templates
                        Show the available templates for building a Llama Stack distribution (default: False)

after:
  --list-templates      Show the available templates for building a Llama Stack distribution
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:13:46 -08:00
Sébastien Han
369cc513cb
fix: improve stack build on venv (#980)
# What does this PR do?

Added a pre_run_checks function to ensure a smooth environment setup by
verifying prerequisites. It checks for an existing virtual environment,
ensures uv is installed, and deactivates any active environment if
necessary.

Run the full build inside a venv created by 'uv'.

Improved string handling in printf statements and added shellcheck
suppressions for expected word splitting in pip commands.

These enhancements improve robustness, prevent
conflicts, and ensure a seamless setup process.

Signed-off-by: Sébastien Han <seb@redhat.com>

- [ ] Addresses issue (#issue)


## Test Plan

Run the following command on either Linux or MacOS:

```
llama stack build --template ollama --image-type venv --image-name foo
+ build_name=foo
+ env_name=llamastack-foo
+ pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ RED='\033[0;31m'
+ NC='\033[0m'
+ ENVNAME=
+++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
+ SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution
+ source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh
+ pre_run_checks llamastack-foo
+ local env_name=llamastack-foo
+ is_command_available uv
+ command -v uv
+ '[' -d llamastack-foo ']'
+ run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ local env_name=llamastack-foo
+ local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ echo 'Creating new virtual environment llamastack-foo'
Creating new virtual environment llamastack-foo
+ uv venv llamastack-foo
Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: llamastack-foo
Activate with: source llamastack-foo/bin/activate
+ source llamastack-foo/bin/activate
++ '[' -n x ']'
++ SCRIPT_PATH=llamastack-foo/bin/activate
++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ '[' darwin24 = cygwin ']'
++ '[' darwin24 = msys ']'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ export PATH
++ '[' x '!=' x ']'
+++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ VIRTUAL_ENV_PROMPT='(llamastack-foo) '
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(llamastack-foo) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ '[' -n '' ']'
+ '[' -n '' ']'
+ uv pip install --no-cache-dir llama-stack
Using Python 3.13.1 environment at: llamastack-foo
Resolved 50 packages in 1.25s
   Built fire==0.7.0
Prepared 50 packages in 1.22s
Installed 50 packages in 126ms
 + annotated-types==0.7.0
 + anyio==4.8.0
 + blobfile==3.0.0
 + certifi==2025.1.31
 + charset-normalizer==3.4.1
 + click==8.1.8
 + distro==1.9.0
 + filelock==3.17.0
 + fire==0.7.0
 + fsspec==2025.2.0
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + huggingface-hub==0.28.1
 + idna==3.10
 + jinja2==3.1.5
 + llama-models==0.1.2
 + llama-stack==0.1.2
 + llama-stack-client==0.1.2
 + lxml==5.3.1
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + numpy==2.2.2
 + packaging==24.2
 + pandas==2.2.3
 + pillow==11.1.0
 + prompt-toolkit==3.0.50
 + pyaml==25.1.0
 + pycryptodomex==3.21.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + pygments==2.19.1
 + python-dateutil==2.9.0.post0
 + python-dotenv==1.0.1
 + pytz==2025.1
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests==2.32.3
 + rich==13.9.4
 + setuptools==75.8.0
 + six==1.17.0
 + sniffio==1.3.1
 + termcolor==2.5.0
 + tiktoken==0.8.0
 + tqdm==4.67.1
 + typing-extensions==4.12.2
 + tzdata==2025.1
 + urllib3==2.3.0
 + wcwidth==0.2.13
+ '[' -n '' ']'
+ printf 'Installing pip dependencies\n'
Installing pip dependencies
+ uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn
Using Python 3.13.1 environment at: llamastack-foo
Resolved 105 packages in 37ms
Uninstalled 2 packages in 65ms
Installed 72 packages in 195ms
 + aiohappyeyeballs==2.4.6
 + aiohttp==3.11.12
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + attrs==25.1.0
 + autoevals==0.0.119
 + backoff==2.2.1
 + braintrust-core==0.0.58
 + chardet==5.2.0
 + chevron==0.14.0
 + chromadb-client==0.6.3
 + contourpy==1.3.1
 + cycler==0.12.1
 + datasets==3.2.0
 + deprecated==1.2.18
 + dill==0.3.8
 + faiss-cpu==1.10.0
 + fastapi==0.115.8
 + fonttools==4.56.0
 + frozenlist==1.5.0
 - fsspec==2025.2.0
 + fsspec==2024.9.0
 + googleapis-common-protos==1.66.0
 + grpcio==1.70.0
 + importlib-metadata==8.5.0
 + jiter==0.8.2
 + joblib==1.4.2
 + jsonschema==4.23.0
 + jsonschema-specifications==2024.10.1
 + kiwisolver==1.4.8
 + levenshtein==0.26.1
 + matplotlib==3.10.0
 + monotonic==1.6
 + multidict==6.1.0
 + multiprocess==0.70.16
 + nltk==3.9.1
 - numpy==2.2.2
 + numpy==1.26.4
 + ollama==0.4.7
 + openai==1.61.1
 + opentelemetry-api==1.30.0
 + opentelemetry-exporter-otlp-proto-common==1.30.0
 + opentelemetry-exporter-otlp-proto-grpc==1.30.0
 + opentelemetry-exporter-otlp-proto-http==1.30.0
 + opentelemetry-proto==1.30.0
 + opentelemetry-sdk==1.30.0
 + opentelemetry-semantic-conventions==0.51b0
 + orjson==3.10.15
 + overrides==7.7.0
 + posthog==3.12.0
 + propcache==0.2.1
 + protobuf==5.29.3
 + psycopg2-binary==2.9.10
 + pyarrow==19.0.0
 + pyparsing==3.2.1
 + pypdf==5.3.0
 + rapidfuzz==3.12.1
 + redis==5.2.1
 + referencing==0.36.2
 + rpds-py==0.22.3
 + safetensors==0.5.2
 + scikit-learn==1.6.1
 + scipy==1.15.1
 + sentencepiece==0.2.0
 + starlette==0.45.3
 + tenacity==9.0.0
 + threadpoolctl==3.5.0
 + tokenizers==0.21.0
 + transformers==4.48.3
 + uvicorn==0.34.0
 + wrapt==1.17.2
 + xxhash==3.5.0
 + yarl==1.18.3
 + zipp==3.21.0
+ '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']'
+ IFS='#'
+ read -ra parts
+ for part in '"${parts[@]}"'
+ echo 'sentence-transformers --no-deps'
sentence-transformers --no-deps
+ uv pip install sentence-transformers --no-deps
Using Python 3.13.1 environment at: llamastack-foo
Resolved 1 package in 141ms
Installed 1 package in 6ms
 + sentence-transformers==3.4.1
+ for part in '"${parts[@]}"'
+ echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu'
torch torchvision --index-url https://download.pytorch.org/whl/cpu
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Using Python 3.13.1 environment at: llamastack-foo
Resolved 13 packages in 2.15s
Installed 5 packages in 324ms
 + mpmath==1.3.0
 + networkx==3.3
 + sympy==1.13.1
 + torch==2.6.0
 + torchvision==0.21.0
Build Successful!
```

Run:

```
$ source llamastack-foo/bin/activate
$ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s]
README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s]
config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s]
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API safety
 POST /v1/safety/run-shield

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [39145]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:22:03 -08:00
Charlie Doern
f5e4bf2edf
chore: remove unused argument (#987)
# What does this PR do?

very small fix I noticed some unused arguments, but this seems like the
easiest one to remove since its passed in explicitly.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-06 10:05:35 -08:00
Ashwin Bharambe
216cde5ee8 Add --print-deps-only for computing dependencies 2025-01-31 14:33:51 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Ashwin Bharambe
cee3816609
Make llama stack build not create a new conda by default (#788)
## What does this PR do?

So far `llama stack build` has always created a separate conda
environment for packaging the dependencies of a distribution. The main
reason to do so is isolation -- distributions are composed of providers
which can have a variety of potentially conflicting dependencies. That
said, this has created significant annoyance for new users since it is
not at all transparent. The fact that `llama stack run` is actually
running the code in some other conda is very surprising.

This PR tries to make things better. 

- Both `llama stack build` and `llama stack run` now accept an
`--image-name` argument which represents the (conda, docker, virtualenv)
image you want to operate upon.
- For the default (conda) mode, the script checks if a current conda
environment exists. If one exists, it uses it.
- If `--image-name` is provided, that option is used. In this case, an
environment is created if needed.
- There is no automatic `llamastack-` prefixing of the environment names
done anymore.


## Test Plan

Start in a conda environment, run `llama stack build --template
fireworks`; verify that it successfully built into the current
environment and stored the build file at
`$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks`
which started correctly in the current environment.

Ran the same build command outside of conda. It failed asking for
`--image-name`. Ran it with `llama stack build --template fireworks
--image-name foo`. This successfully created a conda environment called
`foo` and installed deps. Ran `llama stack run fireworks` outside conda
which failed. Activated a different conda, ran again, it failed saying
it did not find the `llamastack-build.yaml` file. Then used
`--image-name foo` option and it ran successfully.
2025-01-16 13:44:53 -08:00
Xi Yan
32d3abe964
[CICD] Github workflow for publishing Docker images (#764)
# What does this PR do?

- Add Github workflow for publishing docker images. 
- Manual Inputs
- We can use a (1) TestPyPi version / (2) build via released PyPi
version

**Notes**
- Keep this workflow manually triggered as we don't want to publish
nightly docker images

**Additional Changes**
- Resolve issue with running llama stack build in non-terminal device
```
  File "/home/runner/.local/lib/python3.12/site-packages/llama_stack/distribution/utils/exec.py", line 25, in run_with_pty
    old_settings = termios.tcgetattr(sys.stdin)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
termios.error: (25, 'Inappropriate ioctl for device')
```
- Modified build_container.sh to work in non-terminal environment


## Test Plan

- Triggered workflow:
3562217878
<img width="1076" alt="image"
src="https://github.com/user-attachments/assets/f1b5cef6-05ab-49c7-b405-53abc9264734"
/>


- Tested published docker image
<img width="702" alt="image"
src="https://github.com/user-attachments/assets/e7135189-65c8-45d8-86f9-9f3be70e380b"
/>


- /tools API endpoints are served so that docker is correctly using the
TestPyPi package
<img width="296" alt="image"
src="https://github.com/user-attachments/assets/bbcaa7fe-c0a4-4d22-b600-90e3c254bbfd"
/>

- Published tagged images:
https://hub.docker.com/repositories/llamastack
<img width="947" alt="image"
src="https://github.com/user-attachments/assets/2a0a0494-4d45-4643-bc29-72154ecc54a5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-15 09:01:33 -08:00
Dinesh Yeduguru
a174938fbd
Fix telemetry to work on reinstantiating new lib cli (#761)
# What does this PR do?

Since we maintain global state in our telemetry pipeline,
reinstantiating lib cli will cause us to add duplicate span processors
causing sqlite to lock out because of constraint violations since we now
have two span processor writing to sqlite. This PR changes the telemetry
adapter for otel to only instantiate the provider once and add the span
processsors only once.

Also fixes an issue llama stack build


## Test Plan

tested with notebook at
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d#scrollTo=9496f75c
2025-01-14 11:31:50 -08:00
Yuan Tang
9ec54dcbe7
Switch to use importlib instead of deprecated pkg_resources (#678)
`pkg_resources` has been deprecated. This PR switches to use
`importlib.resources`.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-13 20:20:02 -08:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Yuan Tang
987e651755
Add missing venv option in --image-type (#677)
"venv" option is supported but not mentioned in the prompt.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-21 21:10:13 -08:00
Aidan Do
76eb558bde
doc: llama-stack build --config help text references old directory (#596)
# What does this PR do?

- llama-stack build --config help text references example_configs which
no longer exists
- Update to refer new directory format to avoid confusion

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
2024-12-10 17:42:02 -08:00
Ashwin Bharambe
fa68ded07c Remove the unnecessary message after llama stack build 2024-12-10 09:46:56 -08:00
Ashwin Bharambe
e951852848 Miscellaneous fixes around telemetry, library client and run yaml autogen
Also add a `venv` image-type for llama stack build
2024-12-08 20:40:22 -08:00
Ashwin Bharambe
358db3c5b6 No need to use os.path.relpath() when Path() knows everything anyway 2024-11-23 11:45:47 -08:00
Dalton Flanagan
b007b062f3
Fix llama stack build in 0.0.54 (#505)
# What does this PR do?

Safety provider `inline::meta-reference` is now deprecated. However, we 

* aren't checking / printing the deprecation message in `llama stack
build`
* make the deprecated (unusable) provider

So I (1) added checking and (2) made `inline::llama-guard` the default

## Test Plan

Before

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 305, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 226, in _generate_run_config
    config_type = instantiate_class_type(
  File "/home/dalton/all/llama-stack/llama_stack/distribution/utils/dynamic.py", line 12, in instantiate_class_type
    module = importlib.import_module(module_name)
  File "/home/dalton/.conda/envs/nov22/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'llama_stack.providers.inline.safety.meta_reference'
```

After

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 309, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 228, in _generate_run_config
    raise InvalidProviderError(p.deprecation_error)
llama_stack.distribution.resolver.InvalidProviderError: 
Provider `inline::meta-reference` for API `safety` does not work with the latest Llama Stack.
- if you are using Llama Guard v3, please use the `inline::llama-guard` provider instead.
- if you are using Prompt Guard, please use the `inline::prompt-guard` provider instead.
- if you are using Code Scanner, please use the `inline::code-scanner` provider instead.
```

<img width="469" alt="Screenshot 2024-11-22 at 4 10 24 PM"
src="https://github.com/user-attachments/assets/8c2e09fe-379a-4504-b246-7925f80a6ed6">

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-22 16:23:44 -05:00
Ashwin Bharambe
068ac00a3b
Don't depend on templates.py when print llama stack build messages (#496) 2024-11-20 15:44:49 -08:00
Xi Yan
6765fd76ff
fix llama stack build for together & llama stack build from templates (#479)
# What does this PR do?

- Fix issue w/ llama stack build using together template
<img width="669" alt="image"
src="https://github.com/user-attachments/assets/1cbef052-d902-40b9-98f8-37efb494d117">

- For builds from templates, copy over the
`templates/<template-name>/run.yaml` file to the
`~/.llama/distributions/<name>/<name>-run.yaml` instead of re-building
run config.


## Test Plan

```
$ llama stack build --template together --image-type conda
..
Build spec configuration saved at /opt/anaconda3/envs/llamastack-together/together-build.yaml
Build Successful! Next steps:
   1. Set the environment variables: LLAMASTACK_PORT, TOGETHER_API_KEY
   2. `llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml`
```

```
$ llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml
```

```
$ llama-stack-client models list
$ pytest -v -s -m remote agents/test_agents.py --env REMOTE_STACK_URL=http://localhost:5000 --inference-model meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
```
<img width="764" alt="image"
src="https://github.com/user-attachments/assets/b805b6c5-a316-4561-8fe3-24fc3b1f8b80">


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-18 22:29:16 -08:00
Ashwin Bharambe
2a31163178
Auto-generate distro yamls + docs (#468)
# What does this PR do?

Automatically generates
- build.yaml
- run.yaml
- run-with-safety.yaml
- parts of markdown docs

for the distributions.

## Test Plan

At this point, this only updates the YAMLs and the docs. Some testing
(especially with ollama and vllm) has been performed but needs to be
much more tested.
2024-11-18 14:57:06 -08:00
Xi Yan
748606195b
Kill llama stack configure (#371)
* remove configure

* build msg

* wip

* build->run

* delete prints

* docs

* fix docs, kill configure

* precommit

* update fireworks build

* docs

* clean up build

* comments

* fix

* test

* remove baking build.yaml into docker

* fix msg, urls

* configure msg
2024-11-06 13:32:10 -08:00
Ashwin Bharambe
4aa1bf6a60
Kill --name from llama stack build (#340) 2024-10-28 23:07:32 -07:00
Xi Yan
07f9bf723f
fix broken --list-templates with adding build.yaml files for packaging (#327)
* add build files to templates

* fix templates

* manifest

* symlink

* symlink

* precommit

* change everything to docker build.yaml

* remove image_type in templates

* fix build from templates CLI

* fix readmes
2024-10-25 12:51:22 -07:00
Ashwin Bharambe
afae4e3d8e Update docker build flow a little 2024-10-25 10:06:21 -07:00
Ashwin Bharambe
5bed6c276c Move function around 2024-10-25 09:18:22 -07:00
Xi Yan
23210e8679
llama stack distributions / templates / docker refactor (#266)
* docker compose ollama

* comment

* update compose file

* readme for distributions

* readme

* move distribution folders

* move distribution/templates to distributions/

* rename

* kill distribution/templates

* readme

* readme

* build/developer cookbook/new api provider

* developer cookbook

* readme

* readme

* [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)

* fix case where memory bank is registered without provider_id

* memory test

* agents unit test

* Add an option to not use elastic agents for meta-reference inference (#269)

* Allow overridding checkpoint_dir via config

* Small rename

* Make all methods `async def` again; add completion() for meta-reference (#270)

PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)

* Improve an important error message

* update ollama for llama-guard3

* Add vLLM inference provider for OpenAI compatible vLLM server (#178)

This PR adds vLLM inference provider for OpenAI compatible vLLM server.

* Create .readthedocs.yaml

Trying out readthedocs

* Update event_logger.py (#275)

spelling error

* vllm

* build templates

* delete templates

* tmp add back build to avoid merge conflicts

* vllm

* vllm

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: nehal-a2z <nehal@coderabbit.ai>
2024-10-21 11:17:53 -07:00
Ashwin Bharambe
1ff0476002 Split off meta-reference-quantized provider 2024-10-10 16:03:19 -07:00
Ashwin Bharambe
6bb57e72a7
Remove "routing_table" and "routing_key" concepts for the user (#201)
This PR makes several core changes to the developer experience surrounding Llama Stack.

Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)

However, this had a few drawbacks:

you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:

- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.

providers:
  inference:
  - provider_id: foo
    provider_type: remote::tgi
    config: { ... }
  - provider_id: bar
    provider_type: remote::tgi
    config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.

When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)

The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.

Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods

register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)

There are many other cleanups included some of which are detailed in a follow-up comment.
2024-10-10 10:24:13 -07:00
Xi Yan
62d266f018
[CLI] avoid configure twice (#171)
* avoid configure twice

* cleanup tmp config

* update output msg

* address comment

* update msg

* script update
2024-10-03 11:20:54 -07:00
Ashwin Bharambe
e9f6150588 A bit cleanup to avoid breakages 2024-10-02 21:31:09 -07:00
Ashwin Bharambe
988a9cada3 Don't ask for Api.inspect in stack build 2024-10-02 21:10:56 -07:00
Ashwin Bharambe
df68db644b Refactoring distribution/distribution.py
This file was becoming too large and unclear what it housed. Split it
into pieces.
2024-10-02 14:03:02 -07:00
Xi Yan
73decb3781 re-build from name 2024-09-30 16:22:52 -07:00
Xi Yan
4897bf2f85 allow --name to re-build from config 2024-09-30 16:18:12 -07:00
Xi Yan
f6a6598d1a
[bugfix] fix #146 (#147)
* more robust image type

* lint
2024-09-28 17:47:00 -07:00
Xi Yan
6a8c2ae1df
[CLI] remove dependency on CONDA_PREFIX in CLI (#144)
* remove dependency on CONDA_PREFIX in CLI

* lint

* typo

* more robust
2024-09-28 16:46:47 -07:00
Russell Bryant
fb9e6371ec
Validate name in llama stack build (#128)
The first time I ran `llama stack build`, I quickly hit enter at the
first prompt asking for a name, assuming it would use the default
given in the help text. This caused a failure later on that wasn't
very obvious. I was using the `docker` format and a blank name caused
an invalid tag format that failed the image build.

This change adds validation for the `name` parameter to ensure it's
not empty before proceeding.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:30:55 -07:00
Xi Yan
ca7602a642 fix #100 2024-09-25 15:11:56 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Xi Yan
6302a1ee90
fix prompt with name args (#80) 2024-09-18 23:48:31 -07:00
Ashwin Bharambe
c63d6cbd08 list(...keys()) so dict_keys does not show up 2024-09-18 23:24:07 -07:00
Ashwin Bharambe
9ab27e852b Bug fixes for memory 2024-09-18 21:54:02 -07:00
Xi Yan
1128f69674
CLI: add build templates support, move imports (#77)
* list templates implementation

* relative path

* finalize templates

* remove imports

* remove templates from name, name templates

* fix docker

* fix docker
2024-09-18 14:25:53 -07:00
Xi Yan
6b21523c28
CLI - add back build wizard, configure with name instead of build.yaml (#74)
* add back wizard for build

* conda build path move

* polish message

* run with name only

* prompt for build

* improve comments

* update msgs

* add new lines

* move build.yaml

* address comments

* validator for providers

* move imports

* Please enter -> enter

* comments, get started guide

* nits

* fix cprint import

* fix imports
2024-09-18 11:41:56 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/cli/stack/build.py (Browse further)