ehhuang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9ab72fa82 
								
							 
						 
						
							
							
								
								Support sys_prompt behavior in inference ( #937 )  
							
							... 
							
							
							
							# What does this PR do?
The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.
This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.
- [ ] Addresses issue (#issue)
## Test Plan
python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com ). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937 ).
* #938 
* __->__ #937  
							
						 
						
							2025-02-03 23:35:16 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Yuan Tang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								34ab7a3b6c 
								
							 
						 
						
							
							
								
								Fix precommit check after moving to ruff ( #927 )  
							
							... 
							
							
							
							Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921 . We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> 
							
						 
						
							2025-02-02 06:46:45 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Aidan Do 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								39c34dd25f 
								
							 
						 
						
							
							
								
								[ #432 ] Groq Provider tool call tweaks ( #811 )  
							
							... 
							
							
							
							# What does this PR do?
Follow up for @ashwinb's comments in
https://github.com/meta-llama/llama-stack/pull/630 
- [x] Contributes to issue (#432 )
## Test Plan
<details>
<summary>Environment</summary>
```shell
export GROQ_API_KEY=<api-key>
# Create environment if not already
conda create --name llamastack-groq python=3.10
conda activate llamastack-groq
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml 
wget https://raw.githubusercontent.com/meta-llama/llama-stack/918172c7fa92522c9ebc586bdb4f386b1d9ea224/run.yaml 
# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda
# Activate built environment
conda activate llamastack-groq
# Test deps
pip install pytest pytest_html pytest_asyncio
```
</details>
<details>
<summary>Unit tests</summary>
```shell
# Setup
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s
# Result
llama_stack/providers/tests/inference/groq/test_groq_utils.py .......................
========================================= 23 passed, 11 warnings in 0.06s =========================================
```
</details>
<details>
<summary>Integration tests</summary>
```shell
# Tests
 pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s
# Results
___________________________ TestInference.test_chat_completion_with_tool_calling[-groq] ___________________________
llama_stack/providers/tests/inference/test_text_inference.py:403: in test_chat_completion_with_tool_calling
    assert len(message.tool_calls) > 0
E   assert 0 > 0
E    +  where 0 = len([])
E    +    where [] = CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]).tool_calls
============================================= short test summary info =============================================
FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-groq] - assert 0 > 0
======================== 1 failed, 3 passed, 5 skipped, 99 deselected, 7 warnings in 2.13s ========================
```
(One failure as expected from 3.2 3B - re:
https://github.com/meta-llama/llama-stack/pull/630#discussion_r1914056503 )
</details>
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> 
							
						 
						
							2025-01-29 12:02:12 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07b87365ab 
								
							 
						 
						
							
							
								
								[inference api] modify content types so they follow a more standard structure ( #841 )  
							
							... 
							
							
							
							Some small updates to the inference types to make them more standard
Specifically:
- image data is now located in a "image" subkey
- similarly tool call data is located in a "tool_call" subkey
The pattern followed is `dict(type="foo", foo=<...>)` 
							
						 
						
							2025-01-22 12:16:18 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Hardik Shah 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a51c8b4efc 
								
							 
						 
						
							
							
								
								Convert SamplingParams.strategy to a union ( #767 )  
							
							... 
							
							
							
							# What does this PR do?
Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier, 
```
class SamplingParams: 
    strategy: enum ()
    top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.
Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.
Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
   
## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together 
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks 
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
      Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.
---------
Co-authored-by: Hardik Shah <hjshah@fb.com> 
							
						 
						
							2025-01-15 05:38:51 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								d9d34433fc 
								
							 
						 
						
							
							
								
								Update spec  
							
							
							
						 
						
							2025-01-13 23:16:53 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ashwin Bharambe 
								
							 
						 
						
							
							
							
							
								
							
							
								9a5803a429 
								
							 
						 
						
							
							
								
								move all implementations to use updated type  
							
							
							
						 
						
							2025-01-13 23:16:53 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Aidan Do 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fdcc74fda2 
								
							 
						 
						
							
							
								
								[ #432 ] Add Groq Provider - tool calls ( #630 )  
							
							... 
							
							
							
							# What does this PR do?
Contributes to issue #432 
- Adds tool calls to Groq provider
- Enables tool call integration tests
### PR Train
- https://github.com/meta-llama/llama-stack/pull/609  
- https://github.com/meta-llama/llama-stack/pull/630  👈 
## Test Plan
Environment:
```shell
export GROQ_API_KEY=<api-key>
# build.yaml and run.yaml files
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml 
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml 
# Create environment if not already
conda create --prefix ./envs python=3.10
conda activate ./envs
# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda
# Activate built environment
conda activate llamastack-groq
```
<details>
<summary>Unit tests</summary>
```shell
# Setup
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s
# Result
llama_stack/providers/tests/inference/groq/test_groq_utils.py .....................
======================================== 21 passed, 1 warning in 0.05s ========================================
```
</details>
<details>
<summary>Integration tests</summary>
```shell
# Run
conda activate llamastack-groq
pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s
# Result
llama_stack/providers/tests/inference/test_text_inference.py .sss.s.ss.sss.s...
========================== 8 passed, 10 skipped, 180 deselected, 7 warnings in 2.73s ==========================
```
</details>
<details>
<summary>Manual</summary>
```bash
llama stack run ./run.yaml --port 5001
```
Via this Jupyter notebook:
9165502582/hello.ipynbhttps://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
      Pull Request section?
- [x] Updated relevant documentation. (no relevant documentation it
seems)
- [x] Wrote necessary unit or integration tests. 
							
						 
						
							2025-01-13 18:17:38 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Aidan Do 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e1f42eb5a5 
								
							 
						 
						
							
							
								
								[ #432 ] Add Groq Provider - chat completions ( #609 )  
							
							... 
							
							
							
							# What does this PR do?
Contributes towards issue (#432 )
- Groq text chat completions
- Streaming
- All the sampling params that Groq supports
A lot of inspiration taken from @mattf's good work at
https://github.com/meta-llama/llama-stack/pull/355 
**What this PR does not do**
- Tool calls (Future PR)
- Adding llama-guard model
- See if we can add embeddings
### PR Train
- https://github.com/meta-llama/llama-stack/pull/609  👈  
- https://github.com/meta-llama/llama-stack/pull/630 
## Test Plan
<details>
<summary>Environment</summary>
```bash
export GROQ_API_KEY=<api_key>
wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml 
wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml 
# Build and run environment
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
  --port 5001
```
</details>
<details>
<summary>Manual tests</summary>
Using this jupyter notebook to test manually:
2140976d76/hello.ipynbhttp://localhost:5001 ",
)
response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, world client!"},
    ],
    # Test passing in groq_api_key from the client
    # Need to comment out the groq_api_key in the run.yaml file
    x_llama_stack_provider_data='{"groq_api_key": "<api-key>"}',
    # stream=True,
)
response
```
</details>
<details>
<summary>Integration</summary>
`pytest llama_stack/providers/tests/inference/test_text_inference.py -v
-k groq`
(run in same environment)
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-groq] PASSED                 [  6%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_3b-groq] SKIPPED (Other inf...) [ 12%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_3b-groq] SKIPPED [ 18%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-groq] PASSED [ 25%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_3b-groq] SKIPPED (Ot...) [ 31%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-groq] PASSED  [ 37%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_3b-groq] SKIPPED [ 43%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_3b-groq] SKIPPED [ 50%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_8b-groq] PASSED                 [ 56%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_8b-groq] SKIPPED (Other inf...) [ 62%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_8b-groq] SKIPPED [ 68%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_8b-groq] PASSED [ 75%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_8b-groq] SKIPPED (Ot...) [ 81%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_8b-groq] PASSED  [ 87%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_8b-groq] SKIPPED [ 93%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_8b-groq] SKIPPED [100%]
======================================= 6 passed, 10 skipped, 160 deselected, 7 warnings in 2.05s ========================================
```
</details>
<details>
<summary>Unit tests</summary>
`pytest llama_stack/providers/tests/inference/groq/ -v`
```
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_sets_model PASSED            [  5%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_user_message PASSED [ 10%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_system_message PASSED [ 15%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_completion_message PASSED [ 20%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_logprobs PASSED [ 25%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_response_format PASSED [ 30%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_repetition_penalty PASSED [ 35%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_stream PASSED       [ 40%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_n_is_1 PASSED                [ 45%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_if_max_tokens_is_0_then_it_is_not_included PASSED [ 50%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_max_tokens_if_set PASSED [ 55%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_temperature PASSED  [ 60%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_top_p PASSED        [ 65%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_returns_response PASSED [ 70%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_stop_to_end_of_message PASSED [ 75%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_length_to_end_of_message PASSED [ 80%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertStreamChatCompletionResponse::test_returns_stream PASSED [ 85%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_raises_runtime_error_if_config_is_not_groq_config PASSED [ 90%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_returns_groq_adapter PASSED                            [ 95%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqConfig::test_api_key_defaults_to_env_var PASSED                   [100%]
==================================================== 20 passed, 11 warnings in 0.08s =====================================================
```
</details>
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
      Pull Request section?
- [x] Updated relevant documentation
- [x] Wrote necessary unit or integration tests. 
							
						 
						
							2025-01-03 08:27:49 -08:00