Commit graph

10 commits

Author SHA1 Message Date
Sajikumar JS
6fe8b292b1 Merge branch 'main' into add-watsonx-inference-adapter 2025-04-25 10:57:45 +05:30
Jash Gulabrai
cc77f79f55
feat: Add NVIDIA Eval integration (#1890)
# What does this PR do?
This PR adds support for NVIDIA's NeMo Evaluator API to the Llama Stack
eval module. The integration enables users to evaluate models via the
Llama Stack interface.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
1. Added unit tests and successfully ran from root of project:
`./scripts/unit-tests.sh tests/unit/providers/nvidia/test_eval.py`
```
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_cancel PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_result PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_status PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_register_benchmark PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_run_eval PASSED
```
2. Verified I could build the Llama Stack image: `LLAMA_STACK_DIR=$(pwd)
llama stack build --template nvidia --image-type venv`

Documentation added to
`llama_stack/providers/remote/eval/nvidia/README.md`

---------

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-04-24 17:12:42 -07:00
Sajikumar JS
efe5b124f3 pre-commit issues fix 2025-04-17 23:45:27 +05:30
Sajikumar JS
ebf994475d Merge branch 'main' into add-watsonx-inference-adapter 2025-04-15 11:47:56 +05:30
Sébastien Han
edd9aaac3b
fix: use torchao 0.8.0 for inference (#1925)
# What does this PR do?

While building the "experimental-post-training" distribution, we
encountered a version conflict between torchao with inference requiring
version 0.5.0 and training currently depending on version 0.8.0.

Resolves this error:

```
  × No solution found when resolving dependencies:
  ╰─▶ Because you require torchao==0.5.0 and torchao==0.8.0, we can conclude that your requirements are unsatisfiable.
ERROR    2025-04-10 10:41:22,597 llama_stack.distribution.build:128 uncategorized: Failed to build target test with
         return code 1
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-10 13:39:20 -07:00
Sajikumar JS
077dd51e0f Merge branch 'main' into add-watsonx-inference-adapter 2025-04-08 14:01:20 +05:30
ehhuang
7b4eb0967e
test: verification on provider's OAI endpoints (#1893)
# What does this PR do?


## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL
2025-04-07 23:06:28 -07:00
Ashwin Bharambe
530d4bdfe1
refactor: move all llama code to models/llama out of meta reference (#1887)
# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-04-07 15:03:58 -07:00
Sajikumar JS
8cf8bd35f8 Merge branch 'main' into add-watsonx-inference-adapter 2025-04-06 16:28:39 +05:30
Sébastien Han
e3578b1c1b
chore: remove distributions dir (#1809)
# What does this PR do?

Followup on https://github.com/meta-llama/llama-stack/pull/1801. Move
the deps files to llama_stack/templates.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-27 09:03:39 -04:00
Renamed from distributions/dependencies.json (Browse further)