Commit graph

236 commits

Author SHA1 Message Date
Hardik Shah
c335ed8765 raise when client initialize fails 2025-02-07 12:24:07 -08:00
Charlie Doern
2a4a612373
fix: Ensure a better error stack trace when llama-stack is not built (#950)
# What does this PR do?

currently this is the output when you run a distribution locally without
running `llama stack build`:

```
Traceback (most recent call last):
  File "/Users/charliedoern/Documents/llama-sdk.py", line 25, in <module>
    models = client.models.list()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 107, in list
    raise exc
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 95, in list
    return self._get(
           ^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/_base_client.py", line 1212, in get
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 168, in request
    return asyncio.run(self.async_client.request(*args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 258, in request
    if not self.endpoint_impls:
           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncLlamaStackAsLibraryClient' object has no attribute 'endpoint_impls'
```

the intended exception is never raised, add an except for an
AttributeError so users can catch when they call things like
`models.list()` and so that a more useful error telling them that the
client is not properly initialized is printed.

## Test Plan

Please describe:
- I ran the script found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk
locally with the changes in this PR and the exception was caught
successfully.

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-07 09:47:02 -08:00
Ashwin Bharambe
f8f2f7f9bb
feat: Add HTTPS serving option (#1000)
# What does this PR do?

Enables HTTPS option for Llama Stack. 

While doing so, introduces a `ServerConfig` sub-structure to house all
server related configuration (port, ssl, etc.)

Also simplified the `start_container.sh` entrypoint to simply be
`python` instead of a complex bash command line.

## Test Plan

Conda: 

Run:
```bash
$ llama stack build --template together
$ llama stack run --port 8322        # ensure server starts 

$ llama-stack-client configure --endpoint http://localhost:8322
$ llama-stack-client models list
```

Create a self-signed SSL key / cert pair. Then, using a local checkout
of `llama-stack-client-python`, change
https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759
to add `kwargs.setdefault("verify", False)` so SSL verification is
disabled. Then:

```bash
$ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE>
$ llama-stack-client configure --endpoint https://localhost:8322  # notice the `https`
$ llama-stack-client models list
```

Also tested with containers (but of course one needs to make sure the
cert and key files are appropriately provided to the container.)
2025-02-07 09:39:08 -08:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Charlie Doern
26aef50bc5
if client.initialize fails, the example should exit (#954)
# What does this PR do?

the example script can gracefully exit if the boolean returned from
initialize is used properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-04 13:54:21 -08:00
ehhuang
c9ab72fa82
Support sys_prompt behavior in inference (#937)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan

python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
2025-02-03 23:35:16 -08:00
Yuan Tang
7558678b8c
Fix uv pip install timeout issue for PyTorch (#929)
This fixes the following timeout issue when installing PyTorch via uv.
Also see reference: https://github.com/astral-sh/uv/pull/1694,
https://github.com/astral-sh/uv/issues/1549

```
Installing pip dependencies
Using Python 3.10.16 environment at: /home/yutang/.conda/envs/distribution-myenv
  × Failed to download and build `antlr4-python3-runtime==4.9.3`
  ├─▶ Failed to extract archive
  ├─▶ failed to unpack
  │   `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
  ├─▶ failed to unpack
  │   `antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py` into
  │   `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
  ├─▶ error decoding response body
  ├─▶ request or response body error
  ╰─▶ operation timed out
  help: `antlr4-python3-runtime` (v4.9.3) was included because `torchtune`
        (v0.5.0) depends on `omegaconf` (v2.3.0) which depends on
        `antlr4-python3-runtime>=4.9.dev0, <4.10.dev0`
Failed to build target distribution-myenv with return code 1
```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-03 06:39:35 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Yuan Tang
4773092dd1
Fix UBI9 image build when installing Python packages via uv (#926)
This was missed in https://github.com/meta-llama/llama-stack/pull/921. 

cc @ashwinb @hardikjshah

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-01 19:14:29 -08:00
Ashwin Bharambe
b03e093e80 Add a COPY option for copying source files into docker 2025-02-01 15:35:38 -08:00
Ashwin Bharambe
942e8b96ac Fix uv pip uninstall 2025-02-01 11:42:28 -08:00
Ashwin Bharambe
5b1e69e58e
Use uv pip install instead of pip install (#921)
## What does this PR do? 

See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.

## Test Plan 

First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.

Docker: `llama stack build --template together --image-type container`
works!
2025-01-31 22:29:41 -08:00
Ashwin Bharambe
0d96070af9
Update OpenAPI generator to add param and field documentation (#896)
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)

This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.

## Test Plan:

Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
2025-01-29 10:04:30 -08:00
Ashwin Bharambe
aee6237685 Small refactor for run_with_pty 2025-01-28 09:32:33 -08:00
Vladislav Bronzov
09299e908e
Add windows support for build execution (#889)
# What does this PR do?

This PR implements windows platform support for build_container.sh
execution from terminal. Additionally, it resolves "no support for
Terminos and PTY for Window PC" issues.

- [x] Addresses issue (#issue)
Releates issues: https://github.com/meta-llama/llama-stack/issues/826,
https://github.com/meta-llama/llama-stack/issues/726

## Test Plan

Changes were tested manually by executing standard scripts from LLama
guide:
- llama stack build --template ollama --image-type container
- llama stack build --list-templates
- llama stack build

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 07:41:41 -08:00
Ashwin Bharambe
891bf704eb
Ensure llama stack build --config <> --image-type <> works (#879)
Fix the issues brought up in
https://github.com/meta-llama/llama-stack/issues/870

Test all combinations of (conda, container) vs. (template, config)
combos.
2025-01-25 11:13:36 -08:00
Dinesh Yeduguru
cb11336886
remove logger handler only in notebook (#868)
remove logger handler only in notebook
2025-01-23 16:58:17 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Ashwin Bharambe
35c71d5bbe
Update OpenAPI generator to output discriminator (#848)
oneOf should have discriminators so Stainless can generate better code

## Test Plan

Going to generate the SDK now and check.
2025-01-22 22:15:23 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
a8345f5f76 Fix llama stack build docker creation to have correct entrypoint 2025-01-22 16:53:54 -08:00
Ashwin Bharambe
a63a43c646
[memory refactor][6/n] Update naming and routes (#839)
Making a few small naming changes as per feedback:

- RAGToolRuntime methods are called `insert` and `query` to keep them
more general
- The tool names are changed to non-namespaced forms
`insert_into_memory` and `query_from_memory`
- The REST endpoints are more REST-ful
2025-01-22 10:39:13 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Ashwin Bharambe
78a481bb22
[memory refactor][2/n] Update faiss and make it pass tests (#830)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Second part:

- updates routing table / router code 
- updates the faiss implementation


## Test Plan

```
pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384
```
2025-01-22 10:02:15 -08:00
Ashwin Bharambe
3ae8585b65
[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This is the first part:

- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
2025-01-22 09:59:30 -08:00
Xi Yan
74f6af8bbe
[CICD] add simple test step for docker build workflow, fix prefix bug (#821)
# What does this PR do?

**Main Thing**
- Add a simple test step before publishing docker image in workflow

**Side Fix**
- Docker push action fails recently due to extra prefix introduced. E.g.
see:
https://github.com/meta-llama/llama-stack/pull/802#issuecomment-2599507062

cc @terrytangyuan 

## Test Plan

1. Release a TestPyPi version on this code: 0.0.63.dev51206766


3581203331

```
# 1. build docker image
TEST_PYPI_VERSION=0.0.63.dev51206766 llama stack build --template fireworks

# 2. test the docker image
cd distributions/fireworks && docker compose up
```

4. Test the full build + test docker flow using TestPyPi from (1):
1284218494

<img width="1049" alt="image"
src="https://github.com/user-attachments/assets/c025893d-5ce2-48ff-aa90-de00e105ee09"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-18 15:16:05 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Ashwin Bharambe
eb60f04f86
optional api dependencies (#793)
Co-authored-by: Dinesh Yeduguru <yvdinesh@gmail.com>
2025-01-17 15:26:53 -08:00
Xi Yan
9d574f4aee
fix playground for v1 (#799)
# What does this PR do?

- update playground callsites for v1 api changes

## Test Plan

```
cd llama_stack/distribution/ui
streamlit run app.py
```


https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f




## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 19:32:07 -08:00
Ashwin Bharambe
9f14382d82
meta reference inference fixes (#797)
Miscellaneous fixes for meta reference inference

Tests for log probs dont pass because meta reference does not support
top_k > 1
2025-01-16 18:17:46 -08:00
Ashwin Bharambe
cb41848a2a disable version check optionally 2025-01-16 18:14:48 -08:00
Ashwin Bharambe
03ac84a829 Update default port from 5000 -> 8321 2025-01-16 15:26:48 -08:00
Dinesh Yeduguru
12c994b5b2
REST API fixes (#789)
# What does this PR do?

Client SDK fixes

## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/memory/test_memory.py
2025-01-16 13:47:08 -08:00
Ashwin Bharambe
cee3816609
Make llama stack build not create a new conda by default (#788)
## What does this PR do?

So far `llama stack build` has always created a separate conda
environment for packaging the dependencies of a distribution. The main
reason to do so is isolation -- distributions are composed of providers
which can have a variety of potentially conflicting dependencies. That
said, this has created significant annoyance for new users since it is
not at all transparent. The fact that `llama stack run` is actually
running the code in some other conda is very surprising.

This PR tries to make things better. 

- Both `llama stack build` and `llama stack run` now accept an
`--image-name` argument which represents the (conda, docker, virtualenv)
image you want to operate upon.
- For the default (conda) mode, the script checks if a current conda
environment exists. If one exists, it uses it.
- If `--image-name` is provided, that option is used. In this case, an
environment is created if needed.
- There is no automatic `llamastack-` prefixing of the environment names
done anymore.


## Test Plan

Start in a conda environment, run `llama stack build --template
fireworks`; verify that it successfully built into the current
environment and stored the build file at
`$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks`
which started correctly in the current environment.

Ran the same build command outside of conda. It failed asking for
`--image-name`. Ran it with `llama stack build --template fireworks
--image-name foo`. This successfully created a conda environment called
`foo` and installed deps. Ran `llama stack run fireworks` outside conda
which failed. Activated a different conda, ran again, it failed saying
it did not find the `llamastack-build.yaml` file. Then used
`--image-name foo` option and it ran successfully.
2025-01-16 13:44:53 -08:00
Sixian Yi
c79b087552
[test automation] support run tests on config file (#730)
# Context
For test automation, the end goal is to run a single pytest command from
root test directory (llama_stack/providers/tests/.) such that we execute
push-blocking tests

The work plan: 
1) trigger pytest from llama_stack/providers/tests/.
2) use config file to determine what tests and parametrization we want
to run

# What does this PR do?
1) consolidates the "inference-models" / "embedding-model" /
"judge-model" ... options in root conftest.py. Without this change, we
will hit into error when trying to run `pytest
/Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of
duplicated `addoptions` definitions across child conftest files.

2) Add a `config` option to specify test config in YAML. (see
[`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e)
for example config file)

For provider_fixtures, we allow users to use either a default fixture
combination or define their own {api:provider} combinations.

```

memory:
....
  fixtures:
    provider_fixtures:
    - default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk)
    - inference: sentence_transformers
      memory: faiss
    - default_fixture_param_id: chroma

```
3) generate tests according to the config. Logic lives in two places: 
a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to
do parametrization.
b) after test collection, in `pytest_collection_modifyitems`, we filter
the tests to include only functions listed in config.

## Test Plan
1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--collect-only --config=ci_test_config.yaml`

Using `--collect-only` tag to print the pytests listed in the config
file (`ci_test_config.yaml`).

output:
[gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e)


2) sanity check on `--inference-model` option

```
pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 12:05:49 -08:00
Dinesh Yeduguru
678ab29129
Idiomatic REST API: Inspect (#779)
# What does this PR do?

Since provider list returns a map grouping providers by API, we should
not be using data. This PR fixes the types to just be the plain dict,
basically reverting back to previous behavior



## Test Plan

llama-stack on  fix-provider-list [$] 🅒 stack❯
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py
2025-01-16 10:39:42 -08:00
Dinesh Yeduguru
05f6b44da7
Fix telemetry (#787)
# What does this PR do?

PR fixes couple of issues with telemetry:
1) The REST refactor changed the method from get_span_tree to
query_span_tree, which is causing the server side to return empty spans
2) Library client has introduced a new event loop, which required
changing the location of where start and end trace are called


## Test Plan

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py -k
"test_builtin_tool_web_search"


And querying for spans from the agent run using the library client.
2025-01-16 10:36:13 -08:00
Dinesh Yeduguru
8fd9bcb8cd
fix routing in library client (#776)
# What does this PR do?

Library client needs to match the impl based on both the path and
method. Since path is no longer static, this PR uses the inefficient way
of using regexes computed based on the annotated route path to match
against the incoming request path. The variables now also can come to
the impl from both path or the body, which is also handled cleanly by
finding all the regex matches.



## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py
2025-01-15 15:59:45 -08:00
Dinesh Yeduguru
7fb2c1c48d
More idiomatic REST API (#765)
# What does this PR do?

This PR changes our API to follow more idiomatic REST API approaches of
having paths being resources and methods indicating the action being
performed.

Changes made to generator:
1) removed the prefix check of "get" as its not required and is actually
needed for other method types too
2) removed _ check on path since variables can have "_"



## Test Plan

LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/agents/test_agents.py
2025-01-15 13:20:09 -08:00
Xi Yan
32d3abe964
[CICD] Github workflow for publishing Docker images (#764)
# What does this PR do?

- Add Github workflow for publishing docker images. 
- Manual Inputs
- We can use a (1) TestPyPi version / (2) build via released PyPi
version

**Notes**
- Keep this workflow manually triggered as we don't want to publish
nightly docker images

**Additional Changes**
- Resolve issue with running llama stack build in non-terminal device
```
  File "/home/runner/.local/lib/python3.12/site-packages/llama_stack/distribution/utils/exec.py", line 25, in run_with_pty
    old_settings = termios.tcgetattr(sys.stdin)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
termios.error: (25, 'Inappropriate ioctl for device')
```
- Modified build_container.sh to work in non-terminal environment


## Test Plan

- Triggered workflow:
3562217878
<img width="1076" alt="image"
src="https://github.com/user-attachments/assets/f1b5cef6-05ab-49c7-b405-53abc9264734"
/>


- Tested published docker image
<img width="702" alt="image"
src="https://github.com/user-attachments/assets/e7135189-65c8-45d8-86f9-9f3be70e380b"
/>


- /tools API endpoints are served so that docker is correctly using the
TestPyPi package
<img width="296" alt="image"
src="https://github.com/user-attachments/assets/bbcaa7fe-c0a4-4d22-b600-90e3c254bbfd"
/>

- Published tagged images:
https://hub.docker.com/repositories/llamastack
<img width="947" alt="image"
src="https://github.com/user-attachments/assets/2a0a0494-4d45-4643-bc29-72154ecc54a5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-15 09:01:33 -08:00
Ashwin Bharambe
b78e6675ea llama-stack version alpha -> v1 2025-01-15 05:58:09 -08:00
Hardik Shah
a51c8b4efc
Convert SamplingParams.strategy to a union (#767)
# What does this PR do?

Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier, 
```
class SamplingParams: 
    strategy: enum ()
    top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.

Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.

Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
   

## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together 
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks 
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-15 05:38:51 -08:00
Hardik Shah
b2b82d4a90 removing unused script file 2025-01-14 17:54:22 -08:00
Vladimir Ivić
472feea8d4
Fix broken tests in test_registry (#707)
Summary:
Tests were broken after registry.get return type was changed from
`List[RoutableObjectWithProvider]` to
`Optional[RoutableObjectWithProvider]` in
efe791bab7 (diff-5de152bae521b7baef01048a4c0142484f8f1c978a04f3b55f4e4dabc52835beL29)

Test Plan:
Run tests
```
pytest llama_stack/distribution/store/tests/test_registry.py -v

collected 6 items

llama_stack/distribution/store/tests/test_registry.py::test_registry_initialization PASSED                                                                  [ 16%]
llama_stack/distribution/store/tests/test_registry.py::test_basic_registration PASSED                                                                       [ 33%]
llama_stack/distribution/store/tests/test_registry.py::test_cached_registry_initialization PASSED                                                           [ 50%]
llama_stack/distribution/store/tests/test_registry.py::test_cached_registry_updates PASSED                                                                  [ 66%]
llama_stack/distribution/store/tests/test_registry.py::test_duplicate_provider_registration PASSED                                                          [ 83%]
llama_stack/distribution/store/tests/test_registry.py::test_get_all_objects PASSED                                                                          [100%]
```
2025-01-14 14:33:15 -08:00
Jeff Tang
91907b714e
added support of PYPI_VERSION in stack build (#762)
# What does this PR do?

To build a conda env for specific Llama Stack version, e.g.

`PYPI_VERSION=0.0.58 llama stack build --template together --image-type
conda`
will install these in the llamastack-together env:
```
llama_models                             0.0.58
llama_stack                              0.0.58
llama_stack_client                       0.0.58
```
Without `PYPI_VERSION=`, `llama stack build --template together
--image-type conda` installs the latest all.


In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-14 13:45:42 -08:00
Xi Yan
194d12b304
[bugfix] fix streaming GeneratorExit exception with LlamaStackAsLibraryClient (#760)
# What does this PR do?

#### Issue
- Using Jupyter notebook with LlamaStackAsLibraryClient + streaming
gives exception
```
Exception ignored in: <async_generator object HTTP11ConnectionByteStream.__aiter__ at 0x32a95a740>
Traceback (most recent call last):
  File "/opt/anaconda3/envs/fresh/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 404, in _aiter_
    yield part
RuntimeError: async generator ignored GeneratorExit
```

- Reproduce w/
https://github.com/meta-llama/llama-stack/blob/notebook-streaming-debug/inline.ipynb

#### Fix
- Issue likely comes from stream_across_asyncio_run_boundary closing
connection too soon when interacting in jupyter environment
- This uses an alternative way to convert AsyncStream to SyncStream
return type by sync version of LlamaStackAsLibraryClient, which calls
AsyncLlamaStackAsLibraryClient calling async impls under the hood

#### Additional changes
- Moved tracing logic into AsyncLlamaStackAsLibraryClient.request s.t.
streaming / non-streaming request for LlamaStackAsLibraryClient shares
same code

## Test Plan

- Test w/ together & fireworks & ollama with streaming and non-streaming
using notebook in:
https://github.com/meta-llama/llama-stack/blob/notebook-streaming-debug/inline.ipynb
- Note: need to restart kernel and run pip install -e . in jupyter
interpreter for local code change to take effect

<img width="826" alt="image"
src="https://github.com/user-attachments/assets/5f90985d-1aee-452c-a599-2157f5654fea"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-14 10:58:46 -08:00
Ashwin Bharambe
2c2969f331 Fixes; make inference tests pass with newer tool call types 2025-01-13 23:16:53 -08:00
Yuan Tang
9ec54dcbe7
Switch to use importlib instead of deprecated pkg_resources (#678)
`pkg_resources` has been deprecated. This PR switches to use
`importlib.resources`.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-13 20:20:02 -08:00
Yuan Tang
9173e35bd5
Fix incorrect Python binary path for UBI9 image (#757)
This was missed during a rebase in
https://github.com/meta-llama/llama-stack/pull/676.

Fixed the following error:
```
Error: crun: executable file `python` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
++ error_handler 88
++ echo 'Error occurred in script at line: 88'
Error occurred in script at line: 88
```

cc @hardikjshah

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-13 20:17:21 -08:00