Commit graph

906 commits

Author SHA1 Message Date
Ashwin Bharambe
a63a43c646
[memory refactor][6/n] Update naming and routes (#839)
Making a few small naming changes as per feedback:

- RAGToolRuntime methods are called `insert` and `query` to keep them
more general
- The tool names are changed to non-namespaced forms
`insert_into_memory` and `query_from_memory`
- The REST endpoints are more REST-ful
2025-01-22 10:39:13 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
63f37f9b7c
[memory refactor][4/n] Update the client-sdk test for RAG (#834)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Update client-sdk tests
2025-01-22 10:15:19 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Ashwin Bharambe
78a481bb22
[memory refactor][2/n] Update faiss and make it pass tests (#830)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Second part:

- updates routing table / router code 
- updates the faiss implementation


## Test Plan

```
pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384
```
2025-01-22 10:02:15 -08:00
Ashwin Bharambe
3ae8585b65
[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This is the first part:

- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
2025-01-22 09:59:30 -08:00
Sixian Yi
35a00d004a
bug fix for distro report generation (#836)
# What does this PR do?

Minor bug fix and simplify code
- [ ] Addresses issue (#issue)


## Test Plan


See the updated `llama_stack/templates/fireworks/report.md`

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-21 21:44:06 -08:00
Sixian Yi
edf56884a7
add pytest option to generate a functional report for distribution (#833)
# What does this PR do?

add pytest option (`--report`) to support generating a functional report
for llama stack distribution

## Test Plan
```
export LLAMA_STACK_CONFIG=./llama_stack/templates/fireworks/run.yaml
/opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/  --report
```

See a report file was generated under
`./llama_stack/templates/fireworks/report.md`


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-21 21:18:23 -08:00
Sixian Yi
e41873f268
[ez] structured output for /completion ollama & enable tests (#822)
# What does this PR do?

1) enabled structured output for ollama /completion API. It seems we
missed this one.
2) fixed ollama structured output test in client sdk - ollama does not
support list format for structured output
3) enable structured output unit test as the result was stable on
Llama-3.1-8B-Instruct and ollama, fireworks, together.


## Test Plan
1) Run `test_completion_structured_output` on /completion API with 3
providers: ollama, fireworks, together.
pytest -v -s -k "together"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output


```
(base) sxyi@sxyi-mbp llama-stack % pytest -s -v llama_stack/providers/tests/inference --config=ci_test_config.yaml
/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
================================================================================================ test session starts =================================================================================================
platform darwin -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0 -- /Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13
cachedir: .pytest_cache
metadata: {'Python': '3.13.0', 'Platform': 'macOS-15.1.1-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'md': '0.2.0', 'dependency': '0.6.0', 'md-report': '0.6.3', 'anyio': '4.6.2.post1'}}
rootdir: /Users/sxyi/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, md-0.2.0, dependency-0.6.0, md-report-0.6.3, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 85 items / 82 deselected / 3 selected                                                                                                                                                                      

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-ollama] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-fireworks]
PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-together] PASSED

==================================================================================== 3 passed, 82 deselected, 8 warnings in 5.67s ====================================================================================
```
2)  
` LLAMA_STACK_CONFIG="./llama_stack/templates/ollama/run.yaml"
/opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/inference`

Before: 
```
________________________________________________________________________________________ test_completion_structured_output __________________________________________________________________________________________
tests/client-sdk/inference/test_inference.py:174: in test_completion_structured_output
    answer = AnswerFormat.model_validate_json(response.content)
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for AnswerFormat
E     Invalid JSON: expected value at line 1 column 2 [type=json_invalid, input_value=' The year he retired, he...5\n\nThe best answer is', input_type=str]
E       For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
```

After: 
test consistently passes


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-21 21:10:24 -08:00
Dinesh Yeduguru
7a4b382ae9
add section for mcp tool usage in notebook (#831)
# What does this PR do?

Adds a section to the notebook on how to use tools hosted in MCP server.


![Screenshot 2025-01-21 at 11 05
39 AM](https://github.com/user-attachments/assets/23e900f1-e2a7-4a46-be9b-13642753dca1)
Notebook:
https://colab.research.google.com/drive/1hBKX01NlG6p2BUrBU0ynwIlWjXQRxc3k?usp=sharing

Rendered notebook on this branch:
https://github.com/meta-llama/llama-stack/blob/mcp-notebook/docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
2025-01-21 13:10:42 -08:00
Ashwin Bharambe
75a2694daa Refactor the API enum to an independent file into llama_stack/apis/ 2025-01-19 12:22:40 -08:00
Xi Yan
74f6af8bbe
[CICD] add simple test step for docker build workflow, fix prefix bug (#821)
# What does this PR do?

**Main Thing**
- Add a simple test step before publishing docker image in workflow

**Side Fix**
- Docker push action fails recently due to extra prefix introduced. E.g.
see:
https://github.com/meta-llama/llama-stack/pull/802#issuecomment-2599507062

cc @terrytangyuan 

## Test Plan

1. Release a TestPyPi version on this code: 0.0.63.dev51206766


3581203331

```
# 1. build docker image
TEST_PYPI_VERSION=0.0.63.dev51206766 llama stack build --template fireworks

# 2. test the docker image
cd distributions/fireworks && docker compose up
```

4. Test the full build + test docker flow using TestPyPi from (1):
1284218494

<img width="1049" alt="image"
src="https://github.com/user-attachments/assets/c025893d-5ce2-48ff-aa90-de00e105ee09"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-18 15:16:05 -08:00
Sixian Yi
55067fa81d
test report for v0.1 (#814)
# What does this PR do?

MD file for the test results of provider <> inference tests 


## Test Plan

1) install `pip install pytest-md-report`
2) Run inference tests with the additions to the commands
`--md-report --md-report-verbose=1 --md-report-output=tgi.md`

Test text model: meta-llama/Llama-3.1-8B-Instruct
Test vision model: meta-llama/Llama-3.2-11B-Vision-Instruct


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Xi Yan <xiyan@meta.com>
2025-01-18 07:50:45 -08:00
Yuan Tang
5379eca9fd
Fix incorrect image type in publish-to-docker workflow (#819) 2025-01-17 21:33:03 -08:00
Yuan Tang
5a63d0ff1d
Fix incorrect RunConfigSettings due to the removal of conda_env (#801) 2025-01-17 21:30:57 -08:00
Xi Yan
3a9468ce9b
fix again vllm for non base64 (#818)
# What does this PR do?

- previous fix introduced regression for non base64 image
- add back download, and base64 check


## Test Plan

<img width="835" alt="image"
src="https://github.com/user-attachments/assets/b70bf725-035a-4b42-b492-53daaf71458a"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 18:33:40 -08:00
Xi Yan
3e7496e835
fix vllm base64 image inference (#815)
# What does this PR do?

- fix base64 based image url for vllm
- add a test case for base64 based image_url
- fixes issue: https://github.com/meta-llama/llama-stack/issues/571

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url
```

<img width="991" alt="image"
src="https://github.com/user-attachments/assets/d56381ba-6777-4d23-9da9-81f73ce93566"
/>

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 17:07:28 -08:00
Dinesh Yeduguru
3d4c53dfec
add mcp runtime as default to all providers (#816)
# What does this PR do?

This is needed to have the notebook work with MCP
2025-01-17 16:40:58 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Xi Yan
9d005154d7
fix vllm template (#813)
# What does this PR do?

- Fix vLLM template to resolve
https://github.com/meta-llama/llama-stack/issues/805
- Fix agents test with shields

## Test Plan

```
vllm serve meta-llama/Llama-3.1-8B-Instruct
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack run ./llama_stack/templates/remote-vllm/run.yaml
```

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/
```

<img width="1245" alt="image"
src="https://github.com/user-attachments/assets/9af27684-5a9c-4187-b338-cbfc5211bd99"
/>


- custom tool flaky due to model outputs
- /completions API not implemented

**Vision Model**
- 11B-Vision-Instruct
<img width="1240" alt="image"
src="https://github.com/user-attachments/assets/1d3b3b17-fa09-43a7-b56c-3f77263825c5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 15:34:29 -08:00
Ashwin Bharambe
eb60f04f86
optional api dependencies (#793)
Co-authored-by: Dinesh Yeduguru <yvdinesh@gmail.com>
2025-01-17 15:26:53 -08:00
Aidan Do
1f60c0286d
cannot import name 'GreedySamplingStrategy' (#806)
# What does this PR do?

Fixes error when running an provider using openai_compat.py

```python
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 426, in <module>
    main()
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/server/server.py", line 349, in main
    impls = asyncio.run(construct_stack(config))
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/stack.py", line 207, in construct_stack
    impls = await resolve_impls(
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 239, in resolve_impls
    impl = await instantiate_provider(
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/distribution/resolver.py", line 330, in instantiate_provider
    impl = await fn(*args)
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/__init__.py", line 11, in get_adapter_impl
    from .vllm import VLLMInferenceAdapter
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 39, in <module>
    from llama_stack.providers.utils.inference.openai_compat import (
  File "/home/ubuntu/us-south-2/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 11, in <module>
    from llama_models.llama3.api.datatypes import (
ImportError: cannot import name 'GreedySamplingStrategy' from 'llama_models.llama3.api.datatypes' (/home/ubuntu/miniconda3/envs/llamastack-vllm/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py)
++ error_handler 61
++ echo 'Error occurred in script at line: 61'
Error occurred in script at line: 61
++ exit 1
```

## Test Plan

```bash
conda create --name llamastack-vllm python=3.10
conda activate llamastack-vllm

# To sync with the current llama-models repo
pip install -e git+https://github.com/meta-llama/llama-models.git#egg=llama-models

export INFERENCE_MODEL=unsloth/Llama-3.3-70B-Instruct-bnb-4bit && \
pip install -e . && \
llama stack build --template remote-vllm --image-type conda && \
llama stack run ./distributions/remote-vllm/run.yaml \
  --port 5000 \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env VLLM_URL=http://localhost:8000
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 14:34:29 -08:00
Paul McCarthy
e1decaec9d
Fixing small typo in quick start guide (#807)
# What does this PR do?

Fixing small typo in the quick start guide

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
2025-01-17 11:15:55 -08:00
Dinesh Yeduguru
53b5f6b24a
add json_schema_type to ParamType deps (#808)
# What does this PR do?

Add missing json_schema_type annotation to ParamType deps
2025-01-17 11:02:25 -08:00
Xi Yan
c2a072911d
fix eval notebook & add test to workflow (#803) 2025-01-16 23:11:21 -08:00
Xi Yan
9d574f4aee
fix playground for v1 (#799)
# What does this PR do?

- update playground callsites for v1 api changes

## Test Plan

```
cd llama_stack/distribution/ui
streamlit run app.py
```


https://github.com/user-attachments/assets/eace11c6-600a-42dc-b4e7-6948a706509f




## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 19:32:07 -08:00
Hardik Shah
b2ac29b9da
fix provider model list test (#800)
Fixes provider tests

```
pytest -v -s -k "together or fireworks or ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py 
```
```
...
.... 
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-together] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-together] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-together] PASSED

================ 21 passed, 6 skipped, 81 deselected, 5 warnings in 32.11s =================
```

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-16 19:27:29 -08:00
Ashwin Bharambe
9f14382d82
meta reference inference fixes (#797)
Miscellaneous fixes for meta reference inference

Tests for log probs dont pass because meta reference does not support
top_k > 1
2025-01-16 18:17:46 -08:00
Ashwin Bharambe
cb41848a2a disable version check optionally 2025-01-16 18:14:48 -08:00
Xi Yan
38009631bc
Remove llama-guard in Cerebras template & improve agent test (#798)
# What does this PR do?

- fix cerebras template
- fix agent test case without shields

## Test Plan

<img width="1261" alt="image"
src="https://github.com/user-attachments/assets/04381f85-9192-4fc6-984b-c9bec99bdb82"
/>

```
llama stack run ./llama_stack/templates/cerebras/run.yaml 

LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v tests/client-sdk/ --html=report.html --self-contained-html
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 18:11:35 -08:00
Xi Yan
0fefd4390a
Fix tgi adapter (#796)
# What does this PR do?

- Fix TGI adapter

## Test Plan

<img width="851" alt="image"
src="https://github.com/user-attachments/assets/0084cbc6-6713-4079-b87b-0befd9aca0b0"
/>

- most inference working
- agent test failure due to model outputs

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 17:44:12 -08:00
Dinesh Yeduguru
73215460ba
add default toolgroups to all providers (#795)
# What does this PR do?

Add toolgroup defs to all the distribution templates
2025-01-16 16:54:59 -08:00
Dinesh Yeduguru
e88faa91e2
fix the code execution test in sdk tests (#794)
# What does this PR do?

remove hardcoded model id for the code execution tests


Tests:

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py -k
"test_code_execution"
2025-01-16 16:42:25 -08:00
Botao Chen
35bf6ea75a
Pin torchtune pkg version (#791)
## context
This is the follow up of
https://github.com/meta-llama/llama-stack/pull/674. Since torchtune is
still in alpha stage and the apis are not guarantee backward compatible.
Pin the torchtune and torchao pkg version to avoid the latest torchtune
release breaks llama stack post training.

We will bump the version number manually after with the new pkg release
some testing

## test 
ping an old torchtune pkg version (0.4.0) and the 0.4.0 was installed 
<img width="1016" alt="Screenshot 2025-01-16 at 3 06 47 PM"
src="https://github.com/user-attachments/assets/630b05d0-8d0d-4e2f-8b48-22e578a62659"
/>
2025-01-16 16:31:13 -08:00
Xi Yan
d1f3b032c9
cerebras template update for memory (#792)
# What does this PR do?

- we no longer have meta-reference as memory provider, update cerebras
template


## Test Plan

```
python llama_stack/scripts/distro_codegen.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 16:07:53 -08:00
Sixian Yi
48b12b9777
[Test automation] generate custom test report (#739)
# What does this PR do?

Generate a test report in MD that contains two main infos: 
1)  custom report on inference provider -> API / functionalities
2) [TO BE ADDED] test log for easy debugging

## Test Plan
For local testing, run test script in command line. See a test report
being generated at tests/report.html

`pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--config=ci_test_config.yaml`

See
[gist](https://gist.github.com/sixianyi0721/a421fd3bc450b74354a1c2c7da483fa5)
for output MD file
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 15:33:50 -08:00
Ashwin Bharambe
03ac84a829 Update default port from 5000 -> 8321 2025-01-16 15:26:48 -08:00
Hardik Shah
f1faa9c924 pop fix 2025-01-16 14:09:59 -08:00
Dinesh Yeduguru
fcd1a57429 update notebook 2025-01-16 14:00:48 -08:00
Xi Yan
a6b9f2cec7
fix cerebras template (#790)
# What does this PR do?

- fix cerebras template

## Test Plan

```
llama stack build --template cerebras --image-type conda
llama stack run cerebras
LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/ --html=report.html --self-contained-html
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 13:53:06 -08:00
Dinesh Yeduguru
12c994b5b2
REST API fixes (#789)
# What does this PR do?

Client SDK fixes

## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/memory/test_memory.py
2025-01-16 13:47:08 -08:00
Ashwin Bharambe
cee3816609
Make llama stack build not create a new conda by default (#788)
## What does this PR do?

So far `llama stack build` has always created a separate conda
environment for packaging the dependencies of a distribution. The main
reason to do so is isolation -- distributions are composed of providers
which can have a variety of potentially conflicting dependencies. That
said, this has created significant annoyance for new users since it is
not at all transparent. The fact that `llama stack run` is actually
running the code in some other conda is very surprising.

This PR tries to make things better. 

- Both `llama stack build` and `llama stack run` now accept an
`--image-name` argument which represents the (conda, docker, virtualenv)
image you want to operate upon.
- For the default (conda) mode, the script checks if a current conda
environment exists. If one exists, it uses it.
- If `--image-name` is provided, that option is used. In this case, an
environment is created if needed.
- There is no automatic `llamastack-` prefixing of the environment names
done anymore.


## Test Plan

Start in a conda environment, run `llama stack build --template
fireworks`; verify that it successfully built into the current
environment and stored the build file at
`$CONDA_PREFIX/llamastack-build.yaml`. Run `llama stack run fireworks`
which started correctly in the current environment.

Ran the same build command outside of conda. It failed asking for
`--image-name`. Ran it with `llama stack build --template fireworks
--image-name foo`. This successfully created a conda environment called
`foo` and installed deps. Ran `llama stack run fireworks` outside conda
which failed. Activated a different conda, ran again, it failed saying
it did not find the `llamastack-build.yaml` file. Then used
`--image-name foo` option and it ran successfully.
2025-01-16 13:44:53 -08:00
Dinesh Yeduguru
59eeaf7f81
Idiomatic REST API: Telemetry (#786)
# What does this PR do?

Changes Telemetry API to follow more idiomatic REST


- [ ] Addresses issue (#issue)


## Test Plan

TBD, once i get an approval for rest endpoints
2025-01-16 12:08:46 -08:00
Sixian Yi
c79b087552
[test automation] support run tests on config file (#730)
# Context
For test automation, the end goal is to run a single pytest command from
root test directory (llama_stack/providers/tests/.) such that we execute
push-blocking tests

The work plan: 
1) trigger pytest from llama_stack/providers/tests/.
2) use config file to determine what tests and parametrization we want
to run

# What does this PR do?
1) consolidates the "inference-models" / "embedding-model" /
"judge-model" ... options in root conftest.py. Without this change, we
will hit into error when trying to run `pytest
/Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of
duplicated `addoptions` definitions across child conftest files.

2) Add a `config` option to specify test config in YAML. (see
[`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e)
for example config file)

For provider_fixtures, we allow users to use either a default fixture
combination or define their own {api:provider} combinations.

```

memory:
....
  fixtures:
    provider_fixtures:
    - default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk)
    - inference: sentence_transformers
      memory: faiss
    - default_fixture_param_id: chroma

```
3) generate tests according to the config. Logic lives in two places: 
a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to
do parametrization.
b) after test collection, in `pytest_collection_modifyitems`, we filter
the tests to include only functions listed in config.

## Test Plan
1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--collect-only --config=ci_test_config.yaml`

Using `--collect-only` tag to print the pytests listed in the config
file (`ci_test_config.yaml`).

output:
[gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e)


2) sanity check on `--inference-model` option

```
pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 12:05:49 -08:00
Hardik Shah
74e4d520ac un-skip telemetry cells in notebook 2025-01-16 11:54:25 -08:00
Hardik Shah
821ac674ab
Add notebook testing to nightly build job (#785)
# What does this PR do?

Adds testing of the notebook to the nightly build job 

## Test Plan

Here is a sample run -- 

1281588919

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-16 11:24:50 -08:00
Dinesh Yeduguru
8d30ecb91a
Idiomatic REST API: Evals (#782)
# What does this PR do?

Changes Evals API to follow more idiomatic REST


## Test Plan

TBD, once i get an approval for rest endpoints
2025-01-16 11:02:42 -08:00
Dinesh Yeduguru
678ab29129
Idiomatic REST API: Inspect (#779)
# What does this PR do?

Since provider list returns a map grouping providers by API, we should
not be using data. This PR fixes the types to just be the plain dict,
basically reverting back to previous behavior



## Test Plan

llama-stack on  fix-provider-list [$] 🅒 stack❯
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py
2025-01-16 10:39:42 -08:00
Xi Yan
e239280932
fireworks add completion logprobs adapter (#778)
# What does this PR do?

- add completion log probs for fireworks

## Test Plan

<img width="849" alt="image"
src="https://github.com/user-attachments/assets/5aa1f27f-02a6-422c-8478-94dd1e345342"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 10:37:07 -08:00
Dinesh Yeduguru
05f6b44da7
Fix telemetry (#787)
# What does this PR do?

PR fixes couple of issues with telemetry:
1) The REST refactor changed the method from get_span_tree to
query_span_tree, which is causing the server side to return empty spans
2) Library client has introduced a new event loop, which required
changing the location of where start and end trace are called


## Test Plan

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py -k
"test_builtin_tool_web_search"


And querying for spans from the agent run using the library client.
2025-01-16 10:36:13 -08:00