Commit graph

1079 commits

Author SHA1 Message Date
Sébastien Han
657f24b964
chore: add missing ToolConfig import in groq.py (#983)
# What does this PR do?

Imported `ToolConfig` from the `llama_stack.apis.inference` module to
resolve missing reference and ensure proper functionality within the
`groq.py` file.

Signed-off-by: Sébastien Han <seb@redhat.com>


## Test Plan

Without the change, pytest will run with the following error:

```
uv run pytest -v -s -k "ollama" llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 379 items / 1 error / 349 deselected / 30 selected                                                 

=================================================== ERRORS ===================================================
__________________ ERROR collecting llama_stack/providers/tests/inference/groq/test_init.py __________________
llama_stack/providers/tests/inference/groq/test_init.py:11: in <module>
    from llama_stack.providers.remote.inference.groq.groq import GroqInferenceAdapter
llama_stack/providers/remote/inference/groq/groq.py:72: in <module>
    class GroqInferenceAdapter(Inference, ModelRegistryHelper, NeedsRequestProviderData):
llama_stack/providers/remote/inference/groq/groq.py:102: in GroqInferenceAdapter
    tool_config: Optional[ToolConfig] = None,
E   NameError: name 'ToolConfig' is not defined
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/groq/test_init.py - NameError: name 'ToolConfig' is not defined
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================== 349 deselected, 22 warnings, 1 error in 0.28s ================================
```

With the change the test continues to run and fails with a different
error:

```
uv run pytest -v -s llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 342 items / 1 error                                                                                

=================================================== ERRORS ===================================================
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:29: in <module>
    class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:35: in TestVisionModelInference
    ImageContentItem(image=dict(data=PASTA_IMAGE)),
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E   image.data
E     Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E       For further information visit https://errors.pydantic.dev/2.10/v/string_unicode
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/test_vision_inference.py - pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 22 warnings, 1 error in 0.25s ========================================
```

Which is fixed in https://github.com/meta-llama/llama-stack/pull/1003.

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 09:35:00 -08:00
Ashwin Bharambe
e6c9f2a485 Delete CHANGELOG.md
We use weekly releases as a way to communicate important improvements.
Keeping this information synced across is more overhead than we have
bandwidth for right now. We may change this process over time.
2025-02-07 09:03:35 -08:00
Yuan Tang
3f9764d50c
fix: List providers command prints out non-existing APIs from registry. Fixes #966 (#969)
Fixes #966.

Verified that:
1. Correct list of APIs are printed out when running `llama stack
list-providers`
2. `llama stack list-providers <api>` works as expected.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-07 09:02:15 -08:00
Sébastien Han
840344975d
test: rm unused exception alias in pytest.raises (#991)
# What does this PR do?

Refactored tests by removing unused exception alias (as exc_info) in
pytest.raises, improving code clarity and reducing lint warnings.
exc_info was never used.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-07 08:04:25 -08:00
ehhuang
d0d568c5ba
test: fix flaky agent test (#1002)
Summary:

Test Plan:

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8

all tests passed
2025-02-06 20:19:38 -08:00
ehhuang
af15426ad7
doc: getting started notebook (#996)
# What does this PR do?

Fix link

## Test Plan

<!--
Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.
-->

<!--
## Sources

Please link relevant resources if necessary.

-->

<!--
## Documentation

- [ ] Added a
[Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md)
entry if the change is significant (new feature, breaking change etc.).

-->
2025-02-06 17:30:21 -08:00
Ashwin Bharambe
7ec79c0297 Add Terry to CODEOWNERS 2025-02-06 16:23:23 -08:00
Hardik Shah
28a0fe57cc
fix: Update rag examples to use fresh faiss index every time (#998)
# What does this PR do?
In several examples we use the same faiss index , which means running it
multiple times fills up the index with duplicates which eventually
degrades the model performance on RAG as multiple copies of the same
irrelevant chunks might be picked up several times.

Fix is to ensure we create a new index each time. 

Resolves issue in this discussion -
https://github.com/meta-llama/llama-stack/discussions/995

## Test Plan
Re-ran the getting started guide multiple times to see the same output

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 16:12:29 -08:00
Xi Yan
06e5af1435 update test 2025-02-06 16:11:20 -08:00
Ashwin Bharambe
c79cc92b37 Update PR Template to be much more succinct 2025-02-06 15:57:22 -08:00
Maxime Lecanu
e964ec95e9
docs: Correct typos in Zero to Hero guide (#997)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue. -->
Corrects some typographical errors found in the
`docs/zero_to_hero_guide/README.md` file.

<!-- Uncomment this section with the issue number if an issue is being
resolved
**Issue resolved by this Pull Request:** Closes #
--->


## Test Plan

<!--
Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.
-->
N/A


<!--
## Sources

Please link relevant resources if necessary. 

-->

<!--
## Documentation

- [ ] Added a
[Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md)
entry if the change is significant (new feature, breaking change etc.).

-->

Co-authored-by: Maxime Lecanu <mlecanu@fb.com>
2025-02-06 17:29:52 -05:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Yuan Tang
dd1265bea7
ci: Add semantic PR title check (#979)
This adds a new workflow to check semantic PR titles to match the
[Conventional Commits spec](https://www.conventionalcommits.org/). This
will make it easier to browse commit history and enable automation in
the future.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-06 12:22:34 -08:00
Ashwin Bharambe
21f763c4f3 Reduce noise from PR templates further 2025-02-06 11:02:53 -08:00
Yuan Tang
0a0ee5ca96
Fix incorrect handling of chat completion endpoint in remote::vLLM (#951)
# What does this PR do?

Fixes https://github.com/meta-llama/llama-stack/issues/949.


## Test Plan

Verified that the correct chat completion endpoint is called after the
change.

Llama Stack server:
```
INFO:     ::1:32838 - "POST /v1/inference/chat-completion HTTP/1.1" 200 OK
18:36:28.187 [END] /v1/inference/chat-completion [StatusCode.OK] (1276.12ms)

```

vLLM server:
```
INFO:     ::1:36866 - "POST /v1/chat/completions HTTP/1.1" 200 OK
```

```bash
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -s -v tests/client-sdk/inference/test_inference.py -k "test_image_chat_completion_base64 or test_image_chat_completion_non_streaming or test_image_chat_completion_streaming"
================================================================== test session starts ===================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10
cachedir: .pytest_cache
rootdir: /home/yutang/repos/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0
collected 16 items / 12 deselected / 4 selected                                                                                                          

tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-url] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-data] PASSED
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-06 10:45:19 -08:00
Yuan Tang
09ed0e9c9f
Add Kubernetes deployment guide (#899)
This PR moves some content from [the recent blog
post](https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html)
to here as a more official guide for users who'd like to deploy Llama
Stack on Kubernetes.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-06 10:28:02 -08:00
Yuan Tang
a25e3b405c
docs: Add license badge to README.md (#994)
This would be useful to know for people who arrive at the project for
the first time.
2025-02-06 10:22:02 -08:00
Sébastien Han
a764b823ee
docs: use uv in CONTRIBUTING guide (#970)
# What does this PR do?

Switch to uv for dependency management and update CONTRIBUTING.md with
new setup instructions. Add missing dev dependencies to pyproject.toml
and apply minor formatting fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-06 10:21:27 -08:00
Sébastien Han
403292fcf6
test: replace memory with vector_io fixture (#984)
# What does this PR do?

Replaced references to `memory` with `vector_io` in
`DEFAULT_PROVIDER_COMBINATIONS` and adjusted corresponding fixture
imports to ensure proper configuration for vector I/O during tests. This
change aligns with the new testing structure.

Followup of https://github.com/meta-llama/llama-stack/pull/830 when the
memory fixture was removed.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-06 10:12:59 -08:00
Charlie Doern
f5e4bf2edf
chore: remove unused argument (#987)
# What does this PR do?

very small fix I noticed some unused arguments, but this seems like the
easiest one to remove since its passed in explicitly.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-06 10:05:35 -08:00
Ihar Hrachyshka
42c10da1c3
github: update PR template to use correct syntax to auto-close issues (#989)
Also, hiding guidance to the author under comments to avoid polluting
the description with ti.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Using `Closes #` syntax in PR template, as per:

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests

```
In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue.
```

Hides this ^.

```
Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.
```

And this ^.

```
Please link relevant resources if necessary.
```

And this ^.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-06 09:59:26 -08:00
Sébastien Han
610de1ba05
chore: update PR template to reinforce changelog (#988)
# What does this PR do?

- Added a checklist item in the PR template to ensure significant
changes are documented in the changelog.
- Updated `CHANGELOG.md` with a placeholder for version `0.2.0`.
- This is an effort to resurrect the consistent usage of the changelog
file.

Signed-off-by: Sébastien Han <seb@redhat.com>

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-06 09:58:30 -08:00
ehhuang
3922999118
sys_prompt support in Agent (#938)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan


LLAMA_STACK_CONFIG=together pytest
\-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v
tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 21:11:32 -08:00
Nathan Weinberg
e777d965a1
docs: add addn server guidance for Linux users in Quick Start (#972)
# What does this PR do?

- [x] Addresses issue #971


## Test Plan
Ran docs build locally

## Sources
See discussion linked in the issue

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Co-authored-by: Mert Parker <mertpaker@gmail.com>
2025-02-05 20:57:51 -08:00
Ihar Hrachyshka
f4343f7dc0
docs: clarify host.docker.internal works for recent podman (#977)
The host.docker.internal alias was implemented in podman 4.7.0:


b672ddc792

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Follow-up to previous podman specific doc update.

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-05 16:02:05 -08:00
Aakanksha Duggal
8fa642835b
Fix README.md notebook links (#976)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
2025-02-05 14:33:46 -08:00
Ryan Cook
2d9c8b549e
docs: missing T in import (#974)
# What does this PR do?

Missing T in import

## Test Plan

N/A doc update

## Sources

Please link relevant resources if necessary.


## Before submitting

- [X ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 17:06:39 -05:00
Kamesh Akella
d9c0b4e3ba
[docs] update the zero_to_hero_guide llama stack version to 0.1.0 (#960)
# What does this PR do?

The Zero to Hero guide currently references an older 0.0.61 llama-stack
version. Using the most recent stable release of the product in the
documentation, would help the users not to go through any issues from
the older llama-stack versions.

## Test Plan

I have ran the workflow locally using the proposed version change and I
am able to proceed further ahead without any issue.

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 11:49:26 -08:00
Yuan Tang
a79a083e39
Fix broken pgvector provider and memory leaks (#947)
This PR fixes the broken pgvector provider as well as wraps all cursor
object creations with context manager to ensure that they get properly
closed to avoid potential memory leaks.

```
> pytest llama_stack/providers/tests/vector_io/test_vector_io.py   -m "pgvector" --env EMBEDDING_DIMENSION=384 --env PGVECTOR_PORT=7432 --env PGVECTOR_DB=db --env PGVECTOR_USER=user --env PGVECTOR_PASSWORD=pass   -v -s --tb=short --disable-warnings

llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_list[-pgvector] PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_register[-pgvector] PASSED
llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_query_documents[-pgvector] The scores are: [0.8168284974053789, 0.8080469278964486, 0.8050996198466661]
PASSED
```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-05 09:32:05 -08:00
Ihar Hrachyshka
5c8e35a9e2
docs, tests: replace datasets.rst with memory_optimizations.rst (#968)
datasets.rst was removed from torchtune repo.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Replace a missing 404 document with another one that exists. (Removed it
from
the list when memory_optimizations.rst was already pulled.)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-05 11:25:56 -05:00
Ihar Hrachyshka
529708215c
[docs] Make RAG example self-contained (#962)
Before the patch, the example could not be executed verbatim without
copy-pasting client function from the inference example. I think it's
better to have examples self-contained, especially in a getting started
guide.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

See above.

## Test Plan

Confirmed example can now be executed verbatim.

## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-04 16:22:50 -08:00
Ashwin Bharambe
474c4bdd7a
Make a couple properties optional (#963) 2025-02-04 16:20:24 -08:00
Ihar Hrachyshka
0cbb3e401c
docs: miscellaneous small fixes (#961)
- **[docs] Fix misc typos and formatting issues in intro docs**
- **[docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started**
- **[docs] Show that `llama-stack-client configure` will ask for api
key**

# What does this PR do?

Miscellaneous fixes in the documentation; not worth reporting an issue.

## Test Plan

No code changes. Addressed issues spotted when walking through the
guide.
Confirmed locally.

## Sources

Please link relevant resources if necessary.

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-04 15:31:30 -08:00
Nathan Weinberg
b84ab6c6b8
github: issue templates automatically apply relevant label (#956)
# What does this PR do?
the `bug` and `enhancement` labels will be automatically applied to bugs
and feature requests that are opened

## Test Plan
N/A

## Sources

https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/configuring-issue-templates-for-your-repository

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-04 14:44:03 -08:00
Bill Murdock
b0dec797a0
Add Podman instructions to Quick Start (#957)
Podman is a popular alternative to Docker, so it would be nice to make
it clear that it can also be used to deploy the container for the
server. The instructions are a little different because you have to
create the directory (unlike with Docker which makes the directory for
you).

# What does this PR do?

- [ ] Add Podman instructions to Quick Start

## Test Plan

Documentation only.


## Sources

I tried it out and it worked.

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-04 14:37:02 -08:00
Ashwin Bharambe
d67401c644 Several documentation fixes and fix link to API reference 2025-02-04 14:00:43 -08:00
Charlie Doern
26aef50bc5
if client.initialize fails, the example should exit (#954)
# What does this PR do?

the example script can gracefully exit if the boolean returned from
initialize is used properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-04 13:54:21 -08:00
Ashwin Bharambe
981bb52b59 Quote the token properly 2025-02-04 11:44:29 -08:00
Ashwin Bharambe
5005939494 Use a secret again for the workflow 2025-02-04 11:42:47 -08:00
Ashwin Bharambe
7392daddee Try a new webhook 2025-02-04 11:36:54 -08:00
Ashwin Bharambe
2987fb37c3 fixes? 2025-02-04 11:34:27 -08:00
Ashwin Bharambe
766b11f1f8 Debug workflow 2025-02-04 11:09:16 -08:00
Ashwin Bharambe
5233666143 Debug workflow 2025-02-04 11:07:04 -08:00
Ashwin Bharambe
b35930a7e5 rename 2025-02-04 11:02:45 -08:00
Ashwin Bharambe
ea538e4b32 Add a workflow to trigger readthedocs rebuild 2025-02-04 11:02:06 -08:00
Ashwin Bharambe
b17277b06a Fix the OpenAPI HTML 2025-02-04 10:38:49 -08:00
ehhuang
c9ab72fa82
Support sys_prompt behavior in inference (#937)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan

python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
2025-02-03 23:35:16 -08:00
Xi Yan
62cd3c391e notebook point to github as source of truth 2025-02-03 15:08:25 -08:00
Ashwin Bharambe
753a1aa7bc Update colab link to be pointing back to github source 2025-02-03 15:00:21 -08:00
Ashwin Bharambe
aefd5bb619 Test notebook update 2025-02-03 14:59:06 -08:00