Commit graph

97 commits

Author SHA1 Message Date
Sébastien Han
0d0b8d2be1
ci: use ollama container image with loaded models (#2410)
Some checks failed
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 16s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Pre-commit / pre-commit (push) Successful in 1m3s
# What does this PR do?

Instead of downloading the models each time we now have a single Ollama
container that is baked with the models pulled and ready to use.

This will remove the CI flakiness on model pulling.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-06 12:08:20 +02:00
github-actions[bot]
692709cd45 build: Bump version to 0.2.10
Some checks failed
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 27s
Test Llama Stack Build / build (push) Failing after 7s
Pre-commit / pre-commit (push) Failing after 1m16s
2025-06-05 22:56:39 +00:00
Hardik Shah
04592b9590
fix: update pyproject to include recursive LS deps (#2404)
trying to run `llama` cli after installing wheel fails with this error 
```
Traceback (most recent call last):
  File "/tmp/tmp.wdZath9U6j/.venv/bin/llama", line 4, in <module>
    from llama_stack.cli.llama import main
  File "/tmp/tmp.wdZath9U6j/.venv/lib/python3.10/site-packages/llama_stack/__init__.py", line 7, in <module>
    from llama_stack.distribution.library_client import (  # noqa: F401
ModuleNotFoundError: No module named 'llama_stack.distribution.library_client'
``` 

This PR fixes it by ensurring that all sub-directories of `llama_stack`
are also included.

Also, fixes the missing `fastapi` dependency issue.
2025-06-05 11:46:48 -07:00
ehhuang
3c9a10d2fe
feat: reference implementation for files API (#2330)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (http, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (library, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 11s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Pre-commit / pre-commit (push) Successful in 53s
# What does this PR do?
TSIA
Added Files provider to the fireworks template. Might want to add to all
templates as a follow-up.

## Test Plan
llama-stack pytest tests/unit/files/test_files.py

llama-stack llama stack build --template fireworks --image-type conda
--run
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s -v
tests/integration/files/
2025-06-02 21:54:24 -07:00
Francisco Arceo
31ce208bda
fix: Fix requirements from broken github-actions[bot] (#2323)
Some checks failed
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 40s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 46s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (http, datasets) (push) Failing after 47s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 45s
Integration Tests / test-matrix (http, scoring) (push) Failing after 45s
Integration Tests / test-matrix (http, post_training) (push) Failing after 47s
Integration Tests / test-matrix (library, datasets) (push) Failing after 46s
Integration Tests / test-matrix (http, inspect) (push) Failing after 49s
Integration Tests / test-matrix (library, agents) (push) Failing after 48s
Unit Tests / unit-tests (3.10) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 1m33s
2025-05-30 19:05:47 -07:00
github-actions[bot]
ad15276da1 build: Bump version to 0.2.9
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 11s
Integration Tests / test-matrix (library, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, datasets) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Integration Tests / test-matrix (library, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, post_training) (push) Failing after 8s
Unit Tests / unit-tests (3.10) (push) Failing after 7s
Integration Tests / test-matrix (library, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, scoring) (push) Failing after 9s
Unit Tests / unit-tests (3.11) (push) Failing after 8s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Pre-commit / pre-commit (push) Failing after 1m34s
2025-05-30 19:43:09 +00:00
Sébastien Han
63a9f08c9e
chore: use starlette built-in Route class (#2267)
# What does this PR do?

Use a more common pattern and known terminology from the ecosystem,
where Route is more approved than Endpoint.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-28 09:53:33 -07:00
Sébastien Han
4f3f28f718
chore: use dependency-groups for dev (#2287)
# What does this PR do?

The previous `[project.optional-dependencies]` was misrepresenting what
the packages were. They were NOT optional dependencies to the project
but development dependencies. Unlike optional dependencies, development
dependencies are local-only and will not be included in the project
requirements when published to PyPI or other indexes. As such,
development dependencies are not included in the [project] table.
Additionally, the dev group is synced by default.

Source:

https://docs.astral.sh/uv/concepts/projects/dependencies/#development-dependencies

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 23:00:17 +02:00
github-actions[bot]
7105a25b0f build: Bump version to 0.2.8 2025-05-27 20:28:29 +00:00
Sébastien Han
448f00903d
chore: mark blobpath as optional (#2271)
# What does this PR do?

This is not a core dependency of the distro server. It's only necessary
when using `inline::rag-runtime` or `inline::meta-reference` providers.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-27 10:55:24 +02:00
Yuan Tang
055f48b6a2
fix(security): Upgrade setuptools to v80.8.0. Fixes CVE-2025-47273 (#2242)
# What does this PR do?

This fixes a high vulnerable CVE in `setuptools`:
https://github.com/advisories/GHSA-5rjg-fvgr-3xxf

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-05-24 06:57:24 -07:00
Sébastien Han
c25acedbcd
chore: remove k8s auth in favor of k8s jwks endpoint (#2216)
# What does this PR do?

Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our
recent oauth2 recent implementation.
The CI test has been kept intact for validation.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-21 16:23:54 +02:00
Ashwin Bharambe
c7015d3d60
feat: introduce OAuth2TokenAuthProvider and notion of "principal" (#2185)
This PR adds a notion of `principal` (aka some kind of persistent
identity) to the authentication infrastructure of the Stack. Until now
we only used access attributes ("claims" in the more standard OAuth /
OIDC setup) but we need the notion of a User fundamentally as well.
(Thanks @rhuss for bringing this up.)

This value is not yet _used_ anywhere downstream but will be used to
segregate access to resources.

In addition, the PR introduces a built-in JWT token validator so the
Stack does not need to contact an authentication provider to validating
the authorization and merely check the signed token for the represented
claims. Public keys are refreshed via the configured JWKS server. This
Auth Provider should overwhelmingly be considered the default given the
seamless integration it offers with OAuth setups.
2025-05-18 17:54:19 -07:00
Charlie Doern
f02f7b28c1
feat: add huggingface post_training impl (#2132)
# What does this PR do?


adds an inline HF SFTTrainer provider. Alongside touchtune -- this is a
super popular option for running training jobs. The config allows a user
to specify some key fields such as a model, chat_template, device, etc

the provider comes with one recipe `finetune_single_device` which works
both with and without LoRA.

any model that is a valid HF identifier can be given and the model will
be pulled.

this has been tested so far with CPU and MPS device types, but should be
compatible with CUDA out of the box

The provider processes the given dataset into the proper format,
establishes the various steps per epoch, steps per save, steps per eval,
sets a sane SFTConfig, and runs n_epochs of training

if checkpoint_dir is none, no model is saved. If there is a checkpoint
dir, a model is saved every `save_steps` and at the end of training.


## Test Plan

re-enabled post_training integration test suite with a singular test
that loads the simpleqa dataset:
https://huggingface.co/datasets/llamastack/simpleqa and a tiny granite
model: https://huggingface.co/ibm-granite/granite-3.3-2b-instruct. The
test now uses the llama stack client and the proper post_training API

runs one step with a batch_size of 1. This test runs on CPU on the
Ubuntu runner so it needs to be a small batch and a single step.

[//]: # (## Documentation)

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 14:41:28 -07:00
github-actions[bot]
65cf076f13 build: Bump version to 0.2.7 2025-05-16 20:32:06 +00:00
Sébastien Han
a5d14749a5
chore: rehydrate requirements.txt (#2146)
# What does this PR do?

Hiccup with 0.2.6 bot release?

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-12 12:45:35 -07:00
github-actions[bot]
23d9f3b1fb build: Bump version to 0.2.6 2025-05-12 18:02:05 +00:00
Ashwin Bharambe
d27a0f276c fix: pytest.mark.skip, not pytest.skip 2025-05-04 13:22:06 -07:00
github-actions[bot]
6b4c218788 build: Bump version to 0.2.5 2025-05-03 21:31:01 +00:00
Ashwin Bharambe
799286fe52 fix: Bump version to 0.2.4 2025-04-29 10:34:17 -07:00
Sébastien Han
79851d93aa
feat: Add Kubernetes authentication (#1778)
# What does this PR do?

This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:

- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage

The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints

## Test Plan

Setup a Kube cluster using Kind or Minikube.

Run a server with:

```
server:
  port: 8321
  auth:
    provider_type: kubernetes
    config:
      api_server_url: http://url
      ca_cert_path: path/to/cert (optional)
```

Run:

```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```

Or replace "my-user" with your service account.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-28 22:24:58 +02:00
Yuan Tang
28687b0e85
fix: Bump h11 to 0.16.0 to fix cve-2025-43859 (#2041)
This resolves a new critical severity on h11. See
https://access.redhat.com/security/cve/cve-2025-43859. We should
consider releasing a new patch with this fix.

This was updated via:

```
uv add "h11>=0.16.0"
uv export --frozen --no-hashes --no-emit-project --output-file=requirements.txt
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-27 11:45:35 -07:00
Ashwin Bharambe
ff14773fa7 fix: update llama stack client dependency 2025-04-12 18:14:33 -07:00
Ben Browning
2b2db5fbda
feat: OpenAI-Compatible models, completions, chat/completions (#1894)
# What does this PR do?

This stubs in some OpenAI server-side compatibility with three new
endpoints:

/v1/openai/v1/models
/v1/openai/v1/completions
/v1/openai/v1/chat/completions

This gives common inference apps using OpenAI clients the ability to
talk to Llama Stack using an endpoint like
http://localhost:8321/v1/openai/v1 .

The two "v1" instances in there isn't awesome, but the thinking is that
Llama Stack's API is v1 and then our OpenAI compatibility layer is
compatible with OpenAI V1. And, some OpenAI clients implicitly assume
the URL ends with "v1", so this gives maximum compatibility.

The openai models endpoint is implemented in the routing layer, and just
returns all the models Llama Stack knows about.

The following providers should be working with the new OpenAI
completions and chat/completions API:
* remote::anthropic (untested)
* remote::cerebras-openai-compat (untested)
* remote::fireworks (tested)
* remote::fireworks-openai-compat (untested)
* remote::gemini (untested)
* remote::groq-openai-compat (untested)
* remote::nvidia (tested)
* remote::ollama (tested)
* remote::openai (untested)
* remote::passthrough (untested)
* remote::sambanova-openai-compat (untested)
* remote::together (tested)
* remote::together-openai-compat (untested)
* remote::vllm (tested)

The goal to support this for every inference provider - proxying
directly to the provider's OpenAI endpoint for OpenAI-compatible
providers. For providers that don't have an OpenAI-compatible API, we'll
add a mixin to translate incoming OpenAI requests to Llama Stack
inference requests and translate the Llama Stack inference responses to
OpenAI responses.

This is related to #1817 but is a bit larger in scope than just chat
completions, as I have real use-cases that need the older completions
API as well.

## Test Plan

### vLLM

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct"
```

### ollama
```
INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0"
```



## Documentation

Run a Llama Stack distribution that uses one of the providers mentioned
in the list above. Then, use your favorite OpenAI client to send
completion or chat completion requests with the base_url set to
http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the
host and port of your Llama Stack server, if different.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-11 13:14:17 -07:00
Ashwin Bharambe
5a31e66a91 fix: update llama-stack-client dependency to fix integration tests 2025-04-06 19:11:05 -07:00
Francisco Arceo
9b478f3756
docs: Adding darkmode to documentation (#1843)
# What does this PR do?
docs: Adding darkmode to documentation


## Test Plan
Tested locally. 

Here's the look:
![Screenshot 2025-03-31 at 9 43
05 AM](https://github.com/user-attachments/assets/5989dbc8-ba03-4710-ad8d-6d4b9ac79786)


## Issues

Related to https://github.com/meta-llama/llama-stack/issues/1815 

Closes https://github.com/meta-llama/llama-stack/issues/1844

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-31 08:31:53 -07:00
github-actions[bot]
b7ab1a9710 build: Bump version to 0.1.19 2025-03-29 00:18:38 +00:00
Ashwin Bharambe
8c351fe432 build: Bump version to 0.1.8 2025-03-23 16:01:10 -07:00
Ashwin Bharambe
93cfade8c9 ci: Bump version to 0.1.7 2025-03-14 15:21:26 -07:00
yyymeta
a626b7bce3
feat: [new open benchmark] BFCL_v3 (#1578)
# What does this PR do?
create a new dataset BFCL_v3 from
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html

overall each question asks the model to perform a task described in
natural language, and additionally a set of available functions and
their schema are given for the model to choose from. the model is
required to write the function call form including function name and
parameters , to achieve the stated purpose. the results are validated
against provided ground truth, to make sure that the generated function
call and the ground truth function call are syntactically and
semantically equivalent, by checking their AST .



## Test Plan

start server by 

```
llama stack run ./llama_stack/templates/ollama/run.yaml
```

then send traffic
```
 llama-stack-client eval run-benchmark "bfcl"  --model-id   meta-llama/Llama-3.2-3B-Instruct    --output-dir /tmp/gpqa    --num-examples   2
```




[//]: # (## Documentation)
2025-03-14 12:50:49 -07:00
Ashwin Bharambe
bc8daf7fea
fix: include jinja2 as a core llama-stack dependency (#1529)
We removed `llama-models` as a dep which was pulling this in for us
previously. This did not get caught in the release process because the
distros we use for testing (fireworks / together) pull that in via
sentence transformers which we don't use in all distros (notably
ollama.)

See #1511 

## Test Plan

Ran `llama-stack-ops/actions/test-and-cut/main.sh` with
`ONLY_TEST_DONT_CUT=1 COMMIT_ID=origin/fix_jinja2` and by making it
build the ollama docker. Ran the docker to ensure it does not error out
with jinja2 dependency error. (Unfortunately there is another error with
sqlite_vec there.)
2025-03-10 14:59:11 -07:00
Ashwin Bharambe
0db3a2f511 fix: run pre-commit due to release script bumps 2025-03-07 16:31:42 -08:00
ehhuang
1257288361
build: add 'tiktoken' to deps (#1483)
Summary:

Test Plan:
2025-03-07 12:36:02 -08:00
Sébastien Han
ffa32af930
build: bump llama-stack-client version (#1469)
## What does this PR do?

Use 0.1.5.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-07 11:42:38 -08:00
Ashwin Bharambe
8bbd52bb9f
chore: remove dependency on llama_models completely (#1344) 2025-03-01 12:48:08 -08:00
Charlie Doern
de878e15a9
fix: pre-commit updates (#1243)
# What does this PR do?

PR #1139 caused pre-commit failures on main likely due to improper
rebase before merge. run pre-commit on main and commit the changes

see runs here:
3775148428

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-24 17:20:29 -08:00
Sébastien Han
9bbe34694d
ci: add mypy for static type checking (#1101)
# What does this PR do?

- Enable mypy to run in the CI on a subset of the repository
- Fix a few mypy errors
- Run mypy from pre-commit

Signed-off-by: Sébastien Han <seb@redhat.com>
 
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-21 13:15:40 -08:00
Sébastien Han
69eebaf5bf
build: add missing dev dependencies for unit tests (#1004)
# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:

```
uv run pytest -v -k "ollama" \
     --inference-model=llama3.2:3b-instruct-fp16
     llama_stack/providers/tests/inference/test_model_registration.py
```

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py

```

You can observe that some tests pass while others fail, but the test
runs successfully.

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 22:26:11 -08:00
Sébastien Han
00613d9014
build: resync uv and deps on 0.1.3 (#1108)
# What does this PR do?

The bot just updated the project to 0.1.3 in

https://github.com/meta-llama/llama-stack/commits?author=github-actions%5Bbot%5D
but the deps need to be synced.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 12:26:04 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Sarthak Deshpande
80ba9deab1
chore: Updated requirements.txt (#1017)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Updated requirements.txt

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
2025-02-08 11:50:35 -08:00
Ashwin Bharambe
f98efe68c9
Misc fixes (#944)
- Make sure torch + torchvision go together as deps, otherwise bad stuff
happens
- Add a pre-commit for requirements.txt
2025-02-03 14:08:47 -08:00
Ashwin Bharambe
6344b2429b Kill requirements.txt 2025-01-31 22:38:58 -08:00
Ashwin Bharambe
05d73dd4fd Bump version to 0.1.0 2025-01-24 09:50:07 -08:00
Ashwin Bharambe
d6fcdefec7 Bump version to 0.0.63 2024-12-17 23:15:27 -08:00
Ashwin Bharambe
eea478618d Bump version to 0.0.62 2024-12-17 18:19:47 -08:00
Ashwin Bharambe
02b43be9d7 Bump version to 0.0.61 2024-12-10 10:18:44 -08:00
Ashwin Bharambe
1ad691bb04 Bump version to 0.0.60 2024-12-09 22:19:51 -08:00
Ashwin Bharambe
baae4f7b51 Bump version to 0.0.59 2024-12-09 21:22:20 -08:00
Ashwin Bharambe
2c5c73f7ca Bump version to 0.0.58 2024-12-06 08:36:00 -08:00