Commit graph

676 commits

Author SHA1 Message Date
Jeff Tang
587753da2f
LocalInferenceImpl update for LS 0.1 (#911)
# What does this PR do?

To work with the updated iOSCalendarAssistantWithLocalInf
[here](https://github.com/meta-llama/llama-stack-apps/compare/ios_local).

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-02 09:49:40 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Matthew Farrellee
e21c8b6d80
add image support to NVIDIA inference provider (#907)
# What does this PR do?

add support to the NVIDIA Inference provider for image inputs


## Test Plan

1. Run local [Llama 3.2 11b vision
instruct](https://build.nvidia.com/meta/llama-3.2-11b-vision-instruct?snippet_tab=Docker)
NIM
2. Start a stack, e.g. `llama stack run
llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000`
3. Run image tests, e.g. `LLAMA_STACK_BASE_URL=http://localhost:8321
pytest -v tests/client-sdk/inference/test_inference.py
--vision-inference-model meta-llama/Llama-3.2-11B-Vision-Instruct -k
image`


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2025-02-01 09:02:27 -08:00
Ashwin Bharambe
95786d5bdc Update client-sdk test config option handling
Fix test
2025-01-31 15:37:25 -08:00
Hardik Shah
a7b929f17e
Sec fixes as raised by bandit (#917)
minor fixes to hashlib and jinja
2025-01-31 13:44:26 -08:00
Dmitry Rogozhkin
7ea14ae62e
feat: enable xpu support for meta-reference stack (#558)
This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.

Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```

Results:
* Tested on: system with single CUDA device, system with single XPU
device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline,
i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`

Requires: https://github.com/meta-llama/llama-models/pull/233

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-01-31 12:11:49 -08:00
Hardik Shah
97eb3eecea
Fix Agents to support code and rag simultaneously (#908)
# What does this PR do?

Fixes a bug where agents were not working when both rag and
code-interpreter were added as tools.


## Test Plan

Added a new client_sdk test which tests for this scenario 
```
LLAMA_STACK_CONFIG=together pytest -s -v  tests/client-sdk -k 'test_rag_and_code_agent'
```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-30 17:09:34 -08:00
Xi Yan
94051cfe9e
fix ImageContentItem to take base64 string as image.data (#909)
# What does this PR do?

- Discussion in
https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819

- image.data should accept base64 string as input instead of binary
bytes, change prompt_adapter to account for that.

## Test Plan

```
pytest -v tests/client-sdk/inference/test_inference.py
```

with test in https://github.com/meta-llama/llama-stack/pull/906

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-30 15:58:23 -08:00
snova-edwardm
7fe2592795
SambaNova supports Llama 3.3 (#905)
# What does this PR do?

- Fix typo
- Support Llama 3.3 70B

## Test Plan

Run the following scripts and obtain the test results

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 1.26s ============================================
```

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 0.52s ============================================
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [N] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [N] Wrote necessary unit or integration tests.
2025-01-30 09:24:46 -08:00
Sixian Yi
836f47a82d
log probs - mark pytests as xfail for unsupported providers + add support for together (#883)
# What does this PR do?

1) As per @mattf's suggestion, we want to mark the pytest as xfail for
providers that do not support the functionality. In this diff, we xfail
the logProbs inference tests for providers who does not support log
probs.
( log probs is only supported by together, fireworks and vllm)

2) Added logProbs support for together according to their developer
[doc](https://docs.together.ai/docs/logprobs).

## Test Plan
1) Together & Fireworks
```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml  
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
```
tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED

========================================================================================== 15 passed, 2 warnings in 19.46s ===========================================================================================
```

```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml   
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
All tests passed 

2) Ollama - LogProbs tests are marked as xfailed. 
```
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
```
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-29 23:41:25 -08:00
Aidan Do
39c34dd25f
[#432] Groq Provider tool call tweaks (#811)
# What does this PR do?

Follow up for @ashwinb's comments in
https://github.com/meta-llama/llama-stack/pull/630

- [x] Contributes to issue (#432)


## Test Plan
<details>
<summary>Environment</summary>

```shell
export GROQ_API_KEY=<api-key>

# Create environment if not already
conda create --name llamastack-groq python=3.10
conda activate llamastack-groq

wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml
wget https://raw.githubusercontent.com/meta-llama/llama-stack/918172c7fa92522c9ebc586bdb4f386b1d9ea224/run.yaml

# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda

# Activate built environment
conda activate llamastack-groq

# Test deps
pip install pytest pytest_html pytest_asyncio
```
</details>



<details>
<summary>Unit tests</summary>

```shell
# Setup
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s

# Result
llama_stack/providers/tests/inference/groq/test_groq_utils.py .......................

========================================= 23 passed, 11 warnings in 0.06s =========================================
```
</details>

<details>
<summary>Integration tests</summary>

```shell
# Tests
 pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s

# Results
___________________________ TestInference.test_chat_completion_with_tool_calling[-groq] ___________________________
llama_stack/providers/tests/inference/test_text_inference.py:403: in test_chat_completion_with_tool_calling
    assert len(message.tool_calls) > 0
E   assert 0 > 0
E    +  where 0 = len([])
E    +    where [] = CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]).tool_calls
============================================= short test summary info =============================================
FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-groq] - assert 0 > 0
======================== 1 failed, 3 passed, 5 skipped, 99 deselected, 7 warnings in 2.13s ========================
```

(One failure as expected from 3.2 3B - re:
https://github.com/meta-llama/llama-stack/pull/630#discussion_r1914056503)
</details>

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-29 12:02:12 -08:00
Matthew Farrellee
1a5c17a92f
align with CompletionResponseStreamChunk.delta as str (instead of TextDelta) (#900)
# What does this PR do?

fix type mismatch in /v1/inference/completion

## Test Plan

`llama stack run ./llama_stack/templates/nvidia/run.yaml`

`LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v
tests/client-sdk/inference/test_inference.py`

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-29 09:25:50 -08:00
Ashwin Bharambe
f2feb7d15c
Fix Chroma adapter (#893)
Chroma method had the wrong signature.

## Test Plan

Start Chroma: `chroma run --path /tmp/foo/chroma2 --host localhost
--port 6001`

Modify run.yaml to include Chroma server pointing to localhost:6001 and
run `llama stack run`

Then:

```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v agents/test_agents.py -k rag
```

passes
2025-01-28 13:19:47 -08:00
Zhonglin Han
229f0d5f7c
Agent response format (#660)
# What does this PR do?

Add response format for agents structured output.

- [ ] Using structured output for agents (interior_design app as an
example) (#issue)
https://github.com/meta-llama/llama-stack-apps/issues/122


## Test Plan
E2E test plan with llama-stack-apps interior_design

Please describe:
 Test ran: 

 - provide instructions so it can be reproduced.
 Start your distro:
llama stack run llama_stack/templates/fireworks/run.yaml --env
FIREWORKS_API_KEY=<API_KEY>
 
Run api test:
```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces```


## Sources
Results: 
https://github.com/meta-llama/llama-stack-client-python/pull/72

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 05:05:38 -08:00
snova-edwardm
aa65610e75
Sambanova - LlamaGuard (#886)
# What does this PR do?

- Fix loading SambaNovaImpl issue
- Add LlamaGuard model support for inference

## Test Plan

Run the following unit test scripts and results

### Embedding
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)

=================================================================================================================== 2 skipped, 1 warning in 0.32s ===================================================================================================================
```

### Vision
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```

```
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED

=================================================================================================================== 3 passed, 1 warning in 2.68s ====================================================================================================================
```

### Text
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```

```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED

=================================================================================================================== 1 passed, 1 warning in 0.46s ====================================================================================================================
```

```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```

```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED

=================================================================================================================== 1 passed, 1 warning in 0.48s ====================================================================================================================
```




## Before submitting

- [] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y] Wrote necessary unit or integration tests.
2025-01-27 15:46:30 -08:00
Dinesh Yeduguru
3c1a2c3d66
Fix telemetry init (#885)
# What does this PR do?

When you re-initialize the library client in a notebook, we were seeing
this error:
```
Getting traces for session_id=5c8d1969-0957-49d2-b852-32cbb8ef8caf
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-11-d74bb6cdd3ab>](https://localhost:8080/#) in <cell line: 0>()
      7 agent_logs = []
      8 
----> 9 for span in client.telemetry.query_spans(
     10     attribute_filters=[
     11         {"key": "session_id", "op": "eq", "value": session_id},

10 frames
[/usr/local/lib/python3.11/dist-packages/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py](https://localhost:8080/#) in query_traces(self, attribute_filters, limit, offset, order_by)
    246     ) -> QueryTracesResponse:
    247         return QueryTracesResponse(
--> 248             data=await self.trace_store.query_traces(
    249                 attribute_filters=attribute_filters,
    250                 limit=limit,

AttributeError: 'TelemetryAdapter' object has no attribute 'trace_store'
```

This is happening because the we were skipping some required steps for
the object state as part of the global _TRACE_PROVIDER check. This PR
moves the initialization of the object state out of the TRACE_PROVIDER
init.
2025-01-27 11:20:28 -08:00
Ashwin Bharambe
087a83f673 Bump key for faiss 2025-01-24 12:08:36 -08:00
Ashwin Bharambe
d78027f3b5 Move runpod provider to the correct directory
Also cleanup the test code to avoid skipping tests. Let failures be
known and public.
2025-01-23 12:25:12 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Marut Pandya
e2b5456e48
Add Runpod Provider + Distribution (#362)
Add Runpod as a inference provider for openAI compatible managed
endpoints.

Testing 
- Configured llama stack from scratch, set `remote::runpod` as a
inference provider.
- Added Runpod Endpoint URL and API key. 
- Started llama-stack server - llama stack run my-local-stack --port
3000
```
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
	"model": "Llama3.1-8B-Instruct",
	"messages": [
		{"role": "system", "content": "You are a helpful assistant."},
		{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
	],
	"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}' ```

---------

Signed-off-by: pandyamarut <pandyamarut@gmail.com>
2025-01-23 12:19:02 -08:00
Sixian Yi
bfbd773b54 remove test report 2025-01-23 01:06:39 -08:00
Aidan Do
910717c1fd
Add vLLM raw completions API (#823)
# What does this PR do?

Adds raw completions API to vLLM

## Test Plan

<details>
<summary>Setup</summary>

```bash
# Run vllm server
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm

# Run llamastack
conda create --name llamastack-vllm python=3.10
conda activate llamastack-vllm

export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct && \
pip install -e . && \
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7 && \
llama stack build --template remote-vllm --image-type conda && \
llama stack run ./distributions/remote-vllm/run.yaml \
  --port 5000 \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env VLLM_URL=http://localhost:8000/v1 | tee -a llama-stack.log
```
</details>

<details>
<summary>Integration</summary>

```bash
# Run
conda activate llamastack-vllm
export VLLM_URL=http://localhost:8000/v1
pip install pytest pytest_html pytest_asyncio aiosqlite
pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k vllm

# Results
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[-vllm_remote] PASSED            [ 11%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-vllm_remote] PASSED            [ 22%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-vllm_remote] SKIPPED  [ 33%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-vllm_remote] SKIPPED [ 44%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-vllm_remote] PASSED [ 55%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[-vllm_remote] PASSED     [ 66%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-vllm_remote] PASSED [ 77%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-vllm_remote] PASSED [ 88%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-vllm_remote] PASSED [100%]

====================================== 7 passed, 2 skipped, 99 deselected, 1 warning in 9.80s ======================================
```
</details>

<details>
<summary>Manual</summary>

```bash
# Install
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7
```

Apply this diff
```diff
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index 8dbb193..95173e2 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -250,7 +250,7 @@ class ClientVersionMiddleware:
                     server_version_parts = tuple(
                         map(int, self.server_version.split(".")[:2])
                     )
-                    if client_version_parts != server_version_parts:
+                    if False and client_version_parts != server_version_parts:
 
                         async def send_version_error(send):
                             await send(
diff --git a/llama_stack/templates/remote-vllm/run.yaml b/llama_stack/templates/remote-vllm/run.yaml
index 4eac4da..32eb50e 100644
--- a/llama_stack/templates/remote-vllm/run.yaml
+++ b/llama_stack/templates/remote-vllm/run.yaml
@@ -94,7 +94,8 @@ metadata_store:
   type: sqlite
   db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db
 models:
-- metadata: {}
+- metadata: 
+    llama_model: meta-llama/Llama-3.2-3B-Instruct
   model_id: ${env.INFERENCE_MODEL}
   provider_id: vllm-inference
   model_type: llm
```

Test 1:

```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5000",
)

response = client.inference.completion(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    content="Hello, world client!",
)

print(response)
```

Test 2

```
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5000",
)

response = client.inference.completion(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    content="Hello, world client!",
    stream=True,
)

for chunk in response:
    print(chunk.delta, end="", flush=True)
```

```
 I'm excited to introduce you to our latest project, a comprehensive guide to the best coffee shops in [City]. As a coffee connoisseur, you're in luck because we've scoured the city to bring you the top picks for the perfect cup of joe.

In this guide, we'll take you on a journey through the city's most iconic coffee shops, highlighting their unique features, must-try drinks, and insider tips from the baristas themselves. From cozy cafes to trendy cafes, we've got you covered.

**Top 5 Coffee Shops in [City]**

1. **The Daily Grind**: This beloved institution has been serving up expertly crafted pour-overs and lattes for over 10 years. Their expert baristas are always happy to guide you through their menu, which features a rotating selection of single-origin beans from around the world...
```

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 22:58:27 -08:00
Ashwin Bharambe
6c205e1d5a Fix tool tests 2025-01-22 20:31:18 -08:00
Ashwin Bharambe
0bff6e1658 Move tool_runtime.memory -> tool_runtime.rag 2025-01-22 20:25:02 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
23f1980f9c Fix meta-reference GPU implementation for inference 2025-01-22 18:31:59 -08:00
Ashwin Bharambe
08dcb9e31e Accept "query_config" params for the RAG tool 2025-01-22 16:42:36 -08:00
Ashwin Bharambe
07b87365ab
[inference api] modify content types so they follow a more standard structure (#841)
Some small updates to the inference types to make them more standard

Specifically:
- image data is now located in a "image" subkey
- similarly tool call data is located in a "tool_call" subkey

The pattern followed is `dict(type="foo", foo=<...>)`
2025-01-22 12:16:18 -08:00
Hardik Shah
caa8387dd2
Fix fireworks client sdk chat completion with images (#840)
Enable downloads before sending request to fireworks. 

Test using -- 
`LLAMA_STACK_CONFIG=./llama_stack/templates/fireworks/run.yaml pytest -s
-v -k 'test_image_chat_completion_streaming' tests/client-sdk`
2025-01-22 11:25:10 -08:00
Ashwin Bharambe
a63a43c646
[memory refactor][6/n] Update naming and routes (#839)
Making a few small naming changes as per feedback:

- RAGToolRuntime methods are called `insert` and `query` to keep them
more general
- The tool names are changed to non-namespaced forms
`insert_into_memory` and `query_from_memory`
- The REST endpoints are more REST-ful
2025-01-22 10:39:13 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Ashwin Bharambe
78a481bb22
[memory refactor][2/n] Update faiss and make it pass tests (#830)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Second part:

- updates routing table / router code 
- updates the faiss implementation


## Test Plan

```
pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384
```
2025-01-22 10:02:15 -08:00
Ashwin Bharambe
3ae8585b65
[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This is the first part:

- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
2025-01-22 09:59:30 -08:00
Sixian Yi
e41873f268
[ez] structured output for /completion ollama & enable tests (#822)
# What does this PR do?

1) enabled structured output for ollama /completion API. It seems we
missed this one.
2) fixed ollama structured output test in client sdk - ollama does not
support list format for structured output
3) enable structured output unit test as the result was stable on
Llama-3.1-8B-Instruct and ollama, fireworks, together.


## Test Plan
1) Run `test_completion_structured_output` on /completion API with 3
providers: ollama, fireworks, together.
pytest -v -s -k "together"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output


```
(base) sxyi@sxyi-mbp llama-stack % pytest -s -v llama_stack/providers/tests/inference --config=ci_test_config.yaml
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================================================================ test session starts =================================================================================================
platform darwin -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0 -- /Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13
cachedir: .pytest_cache
metadata: {'Python': '3.13.0', 'Platform': 'macOS-15.1.1-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'md': '0.2.0', 'dependency': '0.6.0', 'md-report': '0.6.3', 'anyio': '4.6.2.post1'}}
rootdir: /Users/sxyi/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, md-0.2.0, dependency-0.6.0, md-report-0.6.3, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 85 items / 82 deselected / 3 selected                                                                                                                                                                      

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-ollama] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-fireworks]
PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-together] PASSED

==================================================================================== 3 passed, 82 deselected, 8 warnings in 5.67s ====================================================================================
```
2)  
` LLAMA_STACK_CONFIG="./llama_stack/templates/ollama/run.yaml"
/opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/inference`

Before: 
```
________________________________________________________________________________________ test_completion_structured_output __________________________________________________________________________________________
tests/client-sdk/inference/test_inference.py:174: in test_completion_structured_output
    answer = AnswerFormat.model_validate_json(response.content)
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for AnswerFormat
E     Invalid JSON: expected value at line 1 column 2 [type=json_invalid, input_value=' The year he retired, he...5\n\nThe best answer is', input_type=str]
E       For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
```

After: 
test consistently passes


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-21 21:10:24 -08:00
Ashwin Bharambe
75a2694daa Refactor the API enum to an independent file into llama_stack/apis/ 2025-01-19 12:22:40 -08:00
Sixian Yi
55067fa81d
test report for v0.1 (#814)
# What does this PR do?

MD file for the test results of provider <> inference tests 


## Test Plan

1) install `pip install pytest-md-report`
2) Run inference tests with the additions to the commands
`--md-report --md-report-verbose=1 --md-report-output=tgi.md`

Test text model: meta-llama/Llama-3.1-8B-Instruct
Test vision model: meta-llama/Llama-3.2-11B-Vision-Instruct


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Xi Yan <xiyan@meta.com>
2025-01-18 07:50:45 -08:00
Xi Yan
3a9468ce9b
fix again vllm for non base64 (#818)
# What does this PR do?

- previous fix introduced regression for non base64 image
- add back download, and base64 check


## Test Plan

<img width="835" alt="image"
src="https://github.com/user-attachments/assets/b70bf725-035a-4b42-b492-53daaf71458a"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 18:33:40 -08:00
Xi Yan
3e7496e835
fix vllm base64 image inference (#815)
# What does this PR do?

- fix base64 based image url for vllm
- add a test case for base64 based image_url
- fixes issue: https://github.com/meta-llama/llama-stack/issues/571

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url
```

<img width="991" alt="image"
src="https://github.com/user-attachments/assets/d56381ba-6777-4d23-9da9-81f73ce93566"
/>

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 17:07:28 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Ashwin Bharambe
eb60f04f86
optional api dependencies (#793)
Co-authored-by: Dinesh Yeduguru <yvdinesh@gmail.com>
2025-01-17 15:26:53 -08:00
Aidan Do
1f60c0286d
cannot import name 'GreedySamplingStrategy' (#806)
# What does this PR do?

Fixes error when running an provider using openai_compat.py

```python
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 426, in <module>
    main()
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 349, in main
    impls = asyncio.run(construct_stack(config))
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/stack.py", line 207, in construct_stack
    impls = await resolve_impls(
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 239, in resolve_impls
    impl = await instantiate_provider(
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 330, in instantiate_provider
    impl = await fn(*args)
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/__init__.py", line 11, in get_adapter_impl
    from .vllm import VLLMInferenceAdapter
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 39, in <module>
    from llama_stack.providers.utils.inference.openai_compat import (
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 11, in <module>
    from llama_models.llama3.api.datatypes import (
ImportError: cannot import name 'GreedySamplingStrategy' from 'llama_models.llama3.api.datatypes' (/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py)
++ error_handler 61
++ echo 'Error occurred in script at line: 61'
Error occurred in script at line: 61
++ exit 1
```

## Test Plan

```bash
conda create --name llamastack-vllm python=3.10
conda activate llamastack-vllm

# To sync with the current llama-models repo
pip install -e git+https://github.com/meta-llama/llama-models.git#egg=llama-models

export INFERENCE_MODEL=unsloth/Llama-3.3-70B-Instruct-bnb-4bit && \
pip install -e . && \
llama stack build --template remote-vllm --image-type conda && \
llama stack run ./distributions/remote-vllm/run.yaml \
  --port 5000 \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env VLLM_URL=http://localhost:8000
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 14:34:29 -08:00
Hardik Shah
b2ac29b9da
fix provider model list test (#800)
Fixes provider tests

```
pytest -v -s -k "together or fireworks or ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py 
```
```
...
.... 
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-together] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-together] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-together] PASSED

================ 21 passed, 6 skipped, 81 deselected, 5 warnings in 32.11s =================
```

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-16 19:27:29 -08:00
Ashwin Bharambe
9f14382d82
meta reference inference fixes (#797)
Miscellaneous fixes for meta reference inference

Tests for log probs dont pass because meta reference does not support
top_k > 1
2025-01-16 18:17:46 -08:00
Xi Yan
0fefd4390a
Fix tgi adapter (#796)
# What does this PR do?

- Fix TGI adapter

## Test Plan

<img width="851" alt="image"
src="https://github.com/user-attachments/assets/0084cbc6-6713-4079-b87b-0befd9aca0b0"
/>

- most inference working
- agent test failure due to model outputs

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 17:44:12 -08:00
Botao Chen
35bf6ea75a
Pin torchtune pkg version (#791)
## context
This is the follow up of
https://github.com/meta-llama/llama-stack/pull/674. Since torchtune is
still in alpha stage and the apis are not guarantee backward compatible.
Pin the torchtune and torchao pkg version to avoid the latest torchtune
release breaks llama stack post training.

We will bump the version number manually after with the new pkg release
some testing

## test 
ping an old torchtune pkg version (0.4.0) and the 0.4.0 was installed 
<img width="1016" alt="Screenshot 2025-01-16 at 3 06 47 PM"
src="https://github.com/user-attachments/assets/630b05d0-8d0d-4e2f-8b48-22e578a62659"
/>
2025-01-16 16:31:13 -08:00
Sixian Yi
48b12b9777
[Test automation] generate custom test report (#739)
# What does this PR do?

Generate a test report in MD that contains two main infos: 
1)  custom report on inference provider -> API / functionalities
2) [TO BE ADDED] test log for easy debugging

## Test Plan
For local testing, run test script in command line. See a test report
being generated at tests/report.html

`pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--config=ci_test_config.yaml`

See
[gist](https://gist.github.com/sixianyi0721/a421fd3bc450b74354a1c2c7da483fa5)
for output MD file
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 15:33:50 -08:00
Ashwin Bharambe
cee3816609
Make llama stack build not create a new conda by default (#788)
## What does this PR do?

So far `llama stack build` has always created a separate conda
environment for packaging the dependencies of a distribution. The main
reason to do so is isolation -- distributions are composed of providers
which can have a variety of potentially conflicting dependencies. That
said, this has created significant annoyance for new users since it is
not at all transparent. The fact that `llama stack run` is actually
running the code in some other conda is very surprising.

This PR tries to make things better. 

- Both `llama stack build` and `llama stack run` now accept an
`--image-name` argument which represents the (conda, docker, virtualenv)
image you want to operate upon.
- For the default (conda) mode, the script checks if a current conda
environment exists. If one exists, it uses it.
- If `--image-name` is provided, that option is used. In this case, an
environment is created if needed.
- There is no automatic `llamastack-` prefixing of the environment names
done anymore.


## Test Plan

Start in a conda environment, run `llama stack build --template
fireworks`; verify that it successfully built into the current
environment and stored the build file at
`$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks`
which started correctly in the current environment.

Ran the same build command outside of conda. It failed asking for
`--image-name`. Ran it with `llama stack build --template fireworks
--image-name foo`. This successfully created a conda environment called
`foo` and installed deps. Ran `llama stack run fireworks` outside conda
which failed. Activated a different conda, ran again, it failed saying
it did not find the `llamastack-build.yaml` file. Then used
`--image-name foo` option and it ran successfully.
2025-01-16 13:44:53 -08:00
Dinesh Yeduguru
59eeaf7f81
Idiomatic REST API: Telemetry (#786)
# What does this PR do?

Changes Telemetry API to follow more idiomatic REST


- [ ] Addresses issue (#issue)


## Test Plan

TBD, once i get an approval for rest endpoints
2025-01-16 12:08:46 -08:00
Sixian Yi
c79b087552
[test automation] support run tests on config file (#730)
# Context
For test automation, the end goal is to run a single pytest command from
root test directory (llama_stack/providers/tests/.) such that we execute
push-blocking tests

The work plan: 
1) trigger pytest from llama_stack/providers/tests/.
2) use config file to determine what tests and parametrization we want
to run

# What does this PR do?
1) consolidates the "inference-models" / "embedding-model" /
"judge-model" ... options in root conftest.py. Without this change, we
will hit into error when trying to run `pytest
/Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of
duplicated `addoptions` definitions across child conftest files.

2) Add a `config` option to specify test config in YAML. (see
[`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e)
for example config file)

For provider_fixtures, we allow users to use either a default fixture
combination or define their own {api:provider} combinations.

```

memory:
....
  fixtures:
    provider_fixtures:
    - default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk)
    - inference: sentence_transformers
      memory: faiss
    - default_fixture_param_id: chroma

```
3) generate tests according to the config. Logic lives in two places: 
a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to
do parametrization.
b) after test collection, in `pytest_collection_modifyitems`, we filter
the tests to include only functions listed in config.

## Test Plan
1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--collect-only --config=ci_test_config.yaml`

Using `--collect-only` tag to print the pytests listed in the config
file (`ci_test_config.yaml`).

output:
[gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e)


2) sanity check on `--inference-model` option

```
pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 12:05:49 -08:00