Commit graph

4 commits

Author SHA1 Message Date
Sébastien Han
edd9aaac3b
fix: use torchao 0.8.0 for inference (#1925)
# What does this PR do?

While building the "experimental-post-training" distribution, we
encountered a version conflict between torchao with inference requiring
version 0.5.0 and training currently depending on version 0.8.0.

Resolves this error:

```
  × No solution found when resolving dependencies:
  ╰─▶ Because you require torchao==0.5.0 and torchao==0.8.0, we can conclude that your requirements are unsatisfiable.
ERROR    2025-04-10 10:41:22,597 llama_stack.distribution.build:128 uncategorized: Failed to build target test with
         return code 1
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-10 13:39:20 -07:00
ehhuang
7b4eb0967e
test: verification on provider's OAI endpoints (#1893)
# What does this PR do?


## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL
2025-04-07 23:06:28 -07:00
Ashwin Bharambe
530d4bdfe1
refactor: move all llama code to models/llama out of meta reference (#1887)
# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-04-07 15:03:58 -07:00
Sébastien Han
e3578b1c1b
chore: remove distributions dir (#1809)
# What does this PR do?

Followup on https://github.com/meta-llama/llama-stack/pull/1801. Move
the deps files to llama_stack/templates.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-27 09:03:39 -04:00
Renamed from distributions/dependencies.json (Browse further)