forked from phoenix-oss/llama-stack-mirror
6 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
34ab7a3b6c
|
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
|
39c34dd25f
|
[#432] Groq Provider tool call tweaks (#811)
# What does this PR do? Follow up for @ashwinb's comments in https://github.com/meta-llama/llama-stack/pull/630 - [x] Contributes to issue (#432) ## Test Plan <details> <summary>Environment</summary> ```shell export GROQ_API_KEY=<api-key> # Create environment if not already conda create --name llamastack-groq python=3.10 conda activate llamastack-groq wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml wget https://raw.githubusercontent.com/meta-llama/llama-stack/918172c7fa92522c9ebc586bdb4f386b1d9ea224/run.yaml # Build pip install -e . && llama stack build --config ./build.yaml --image-type conda # Activate built environment conda activate llamastack-groq # Test deps pip install pytest pytest_html pytest_asyncio ``` </details> <details> <summary>Unit tests</summary> ```shell # Setup conda activate llamastack-groq pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s # Result llama_stack/providers/tests/inference/groq/test_groq_utils.py ....................... ========================================= 23 passed, 11 warnings in 0.06s ========================================= ``` </details> <details> <summary>Integration tests</summary> ```shell # Tests pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s # Results ___________________________ TestInference.test_chat_completion_with_tool_calling[-groq] ___________________________ llama_stack/providers/tests/inference/test_text_inference.py:403: in test_chat_completion_with_tool_calling assert len(message.tool_calls) > 0 E assert 0 > 0 E + where 0 = len([]) E + where [] = CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]).tool_calls ============================================= short test summary info ============================================= FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-groq] - assert 0 > 0 ======================== 1 failed, 3 passed, 5 skipped, 99 deselected, 7 warnings in 2.13s ======================== ``` (One failure as expected from 3.2 3B - re: https://github.com/meta-llama/llama-stack/pull/630#discussion_r1914056503) </details> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
07b87365ab
|
[inference api] modify content types so they follow a more standard structure (#841)
Some small updates to the inference types to make them more standard Specifically: - image data is now located in a "image" subkey - similarly tool call data is located in a "tool_call" subkey The pattern followed is `dict(type="foo", foo=<...>)` |
||
|
a51c8b4efc
|
Convert SamplingParams.strategy to a union (#767)
# What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com> |
||
|
fdcc74fda2
|
[#432] Add Groq Provider - tool calls (#630)
# What does this PR do?
Contributes to issue #432
- Adds tool calls to Groq provider
- Enables tool call integration tests
### PR Train
- https://github.com/meta-llama/llama-stack/pull/609
- https://github.com/meta-llama/llama-stack/pull/630 👈
## Test Plan
Environment:
```shell
export GROQ_API_KEY=<api-key>
# build.yaml and run.yaml files
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml
# Create environment if not already
conda create --prefix ./envs python=3.10
conda activate ./envs
# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda
# Activate built environment
conda activate llamastack-groq
```
<details>
<summary>Unit tests</summary>
```shell
# Setup
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s
# Result
llama_stack/providers/tests/inference/groq/test_groq_utils.py .....................
======================================== 21 passed, 1 warning in 0.05s ========================================
```
</details>
<details>
<summary>Integration tests</summary>
```shell
# Run
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s
# Result
llama_stack/providers/tests/inference/test_text_inference.py .sss.s.ss.sss.s...
========================== 8 passed, 10 skipped, 180 deselected, 7 warnings in 2.73s ==========================
```
</details>
<details>
<summary>Manual</summary>
```bash
llama stack run ./run.yaml --port 5001
```
Via this Jupyter notebook:
|
||
|
e1f42eb5a5
|
[#432] Add Groq Provider - chat completions (#609)
# What does this PR do?
Contributes towards issue (#432)
- Groq text chat completions
- Streaming
- All the sampling params that Groq supports
A lot of inspiration taken from @mattf's good work at
https://github.com/meta-llama/llama-stack/pull/355
**What this PR does not do**
- Tool calls (Future PR)
- Adding llama-guard model
- See if we can add embeddings
### PR Train
- https://github.com/meta-llama/llama-stack/pull/609 👈
- https://github.com/meta-llama/llama-stack/pull/630
## Test Plan
<details>
<summary>Environment</summary>
```bash
export GROQ_API_KEY=<api_key>
wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml
# Build and run environment
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
--port 5001
```
</details>
<details>
<summary>Manual tests</summary>
Using this jupyter notebook to test manually:
|