Commit graph

838 commits

Author SHA1 Message Date
Xi Yan
97d31d7ab3 add back requirements 2025-01-09 18:37:07 -08:00
Xi Yan
1ea46660a5 add back requirements 2025-01-09 18:35:05 -08:00
Xi Yan
2847d70f38 remove dispatch on push 2025-01-09 17:25:35 -08:00
Xi Yan
5f051b210c final workflow 2025-01-09 17:24:31 -08:00
Xi Yan
4387863a19 final workflow 2025-01-09 17:24:15 -08:00
Xi Yan
dc74675dc8 add ver 2025-01-09 17:19:46 -08:00
Xi Yan
cca27819b9 fix versions 2025-01-09 17:15:47 -08:00
Xi Yan
63232d7771 remove double quotes 2025-01-09 17:09:46 -08:00
Xi Yan
d8c9798ca8 test 2025-01-09 17:07:07 -08:00
Xi Yan
0b0446f219 fix 2025-01-09 17:02:35 -08:00
Xi Yan
df55ec654e fix 2025-01-09 16:59:49 -08:00
Xi Yan
2644e096d6 bugfix 2025-01-09 16:54:04 -08:00
Xi Yan
19887139b4 update requirements 2025-01-09 16:51:49 -08:00
Xi Yan
ccd3ec142a test 2025-01-09 16:45:20 -08:00
Xi Yan
7ca2f5edb1 llama-stack-client-python 2025-01-09 16:34:20 -08:00
Xi Yan
16af87c822 test trigger 2025-01-09 16:33:18 -08:00
Xi Yan
620250324c initial test 2025-01-09 16:15:37 -08:00
Xi Yan
8527b79bfd test 2025-01-09 15:37:43 -08:00
Xi Yan
20dc1860c6 test 2025-01-09 15:22:25 -08:00
Xi Yan
45cf46e62f rebase 2025-01-09 11:45:51 -08:00
Yuan Tang
b8df87bd85
Add automatic PyPI release GitHub workflow (#618)
This PR adds a workflow to automatically publish the package (including
attestations) to Python upon tag/release creation.

Note that this relies on trusted publishing:
https://docs.pypi.org/trusted-publishers/

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-09 11:29:26 -08:00
Xi Yan
a45ce85ec1 change schedule 2025-01-08 17:24:01 -08:00
Xi Yan
6c3b9fa09b back to rc 2025-01-08 17:22:22 -08:00
Xi Yan
3ce9601f9d nightly 2025-01-08 17:20:42 -08:00
Xi Yan
10b136055a remove hash 2025-01-08 17:19:08 -08:00
Xi Yan
8640a30e6a rc? 2025-01-08 17:17:47 -08:00
Xi Yan
8ffdff1c7a rc? 2025-01-08 17:16:42 -08:00
Xi Yan
a6e1740464 test 0.0.64 2025-01-08 17:13:58 -08:00
Xi Yan
c7becdaffc test 2025-01-08 17:07:19 -08:00
Xi Yan
e855291d3b test 2025-01-08 17:04:49 -08:00
Xi Yan
87e2cb8029 test 2025-01-08 17:03:16 -08:00
Xi Yan
94d619b58e nightly 2025-01-08 17:02:24 -08:00
Xi Yan
efb14c154e cleanup setup 2025-01-08 16:57:37 -08:00
Xi Yan
665c088adb on workflow dispatch 2025-01-08 16:56:15 -08:00
Xi Yan
074d8561e5 test 2025-01-08 16:52:51 -08:00
Xi Yan
bc27343c75 test workflow 2025-01-08 16:45:44 -08:00
Xi Yan
596afc6497
add --version to llama stack CLI & /version endpoint (#732)
# What does this PR do?

- add --version to llama stack CLI 
- add /version endpoint
- run OpenAPI generator for the new endpoint

## Test Plan

**CLI**
<img width="184" alt="image"
src="https://github.com/user-attachments/assets/3acb1d22-453e-4b79-baf6-e98e88d0671c"
/>



**endpoint**
<img width="430" alt="image"
src="https://github.com/user-attachments/assets/79cdd670-493b-40cf-8f9e-28a4ac0988ac"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-08 16:30:06 -08:00
Xi Yan
a5e6f10e33
fix links for distro (#733)
# What does this PR do?

- fix links for distro docs


## Test Plan

<img width="653" alt="image"
src="https://github.com/user-attachments/assets/a546a11e-2071-4d72-8232-8f30552b7341"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-08 14:47:09 -08:00
Sixian Yi
ca66a1b188
Update CODEOWNERS - add sixianyi0721 as the owner (#731)
# What does this PR do?

Add my own github id to CODEOWNERS file
- [ ] Addresses issue (#issue)


## Test Plan


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-07 21:11:59 -08:00
Xi Yan
7a4383e4c1
add 3.3 to together inference provider (#729)
# What does this PR do?

- add llama3.3 model for together
- fix fireworks distro_codegen

```
python llama_stack/scripts/distro_codegen.py
```

## Test Plan

<img width="1132" alt="image"
src="https://github.com/user-attachments/assets/bf94b933-9200-4e73-878e-d1a95d450a88"
/>

**Tests**
```
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.3-70B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```
<img width="1139" alt="image"
src="https://github.com/user-attachments/assets/407dc98b-8de3-4841-8cb1-75e4b5128544"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-06 15:39:41 -08:00
Xi Yan
7a90fc5854
move DataSchemaValidatorMixin into standalone utils (#720)
# What does this PR do?

- there's no value in keeping data schema validation logic in a
DataSchemaValidatorMixin
- move into data schema validation logic into standalone utils

## Test Plan
```
pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py

pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```



## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-06 13:25:09 -08:00
Dinesh Yeduguru
0bc5d05243
remove default logger handlers when using libcli with notebook (#718)
# What does this PR do?

Remove the default log handlers for notebook to avoid polluting logs
2025-01-06 13:06:22 -08:00
Botao Chen
e86271aeac
support llama3.1 8B instruct in post training (#698)
## What does this PR do? 
- Change to support llama3.1 8B instruct model other than llama3 8B
model as llama3.1 8B instruct model is a better model to finetune on top
of
- Make the copy files logic in checkpointer safer in case the file be
copied doesn't exist in source path

## test
issue a post training request from client and verify training works as
expect
<img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM"
src="https://github.com/user-attachments/assets/47cc4df9-3edc-4afd-b5dd-abe1f039f1ed"
/>

<img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM"
src="https://github.com/user-attachments/assets/b9435274-ef1d-4570-bd8e-0880c3a4b2e9"
/>
2025-01-03 17:33:05 -08:00
Aidan Do
485476c29a
Fix Groq invalid self.config reference (#719)
# What does this PR do?

Contributes towards: #432

RE: https://github.com/meta-llama/llama-stack/pull/609

I missed this one while refactoring. Fixes:

```python
Traceback (most recent call last):
  File "/Users/aidand/dev/llama-stack/llama_stack/distribution/server/server.py", line 191, in endpoint
    return await maybe_await(value)
  File "/Users/aidand/dev/llama-stack/llama_stack/distribution/server/server.py", line 155, in maybe_await
    return await value
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/Users/aidand/dev/llama-stack/llama_stack/distribution/routers/routers.py", line 156, in chat_completion
    return await provider.chat_completion(**params)
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 101, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/remote/inference/groq/groq.py", line 127, in chat_completion
    response = self._get_client().chat.completions.create(**request)
  File "/Users/aidand/dev/llama-stack/llama_stack/providers/remote/inference/groq/groq.py", line 143, in _get_client
    return Groq(api_key=self.config.api_key)
AttributeError: 'GroqInferenceAdapter' object has no attribute 'config'. Did you mean: '_config'?
```


## Test Plan

Environment:

```shell
export GROQ_API_KEY=<api-key>

# build.yaml and run.yaml files
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/run.yaml

# Create environment if not already
conda create --prefix ./envs python=3.10
conda activate ./envs

# Build
pip install -e . && llama stack build --config ./build.yaml --image-type conda

# Activate built environment
conda activate llamastack-groq
```
<details>
<summary>Manual</summary>

```bash
llama stack run ./run.yaml --port 5001
```

Via this Jupyter notebook:
9165502582/hello.ipynb
</details>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-03 15:47:10 -08:00
Yuan Tang
04d5b9814f
Fix assert message and call to completion_request_to_prompt in remote:vllm (#709)
The current message is incorrect and model arg is not needed in
`completion_request_to_prompt`.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-03 13:44:49 -08:00
Yuan Tang
96d8375663
Fix incorrect entrypoint for broken llama stack run (#706)
This fixes the issue when using `llama stack run` by correctly
specifying entrypoint:

```
LLAMA_STACK_DIR=. llama stack run /home/yutang/.llama/distributions/llamastack-vllm/vllm-run.yaml
Using config file: /home/yutang/.llama/distributions/llamastack-vllm/vllm-run.yaml
+ command -v selinuxenabled
+ selinuxenabled
+ DOCKER_OPTS=' --security-opt label=disable'
+ mounts=
+ '[' -n . ']'
++ readlink -f .
+ mounts=' -v /home/yutang/repos/llama-stack:/app/llama-stack-source'
+ '[' -n '' ']'
+ version_tag=latest
+ '[' -n '' ']'
+ '[' -n . ']'
+ version_tag=dev
+ podman run --security-opt label=disable -it -p 5000:5000 -v /home/yutang/.llama/distributions/llamastack-vllm/vllm-run.yaml:/app/config.yaml -v /home/yutang/repos/llama-stack:/app/llama-stack-source localhost/distribution-vllm:dev python -m llama_stack.distribution.server.server --yaml-config /app/config.yaml --port 5000
usage: server.py
       [-h]
       [--yaml-config YAML_CONFIG]
       [--template TEMPLATE]
       [--port PORT]
       [--disable-ipv6]
       [--env ENV]
server.py: error: unrecognized arguments: python -m llama_stack.distribution.server.server
++ error_handler 88
++ echo 'Error occurred in script at line: 88'
Error occurred in script at line: 88
++ exit 1

```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-03 09:47:10 -08:00
Ashwin Bharambe
21357a6dee Kill autocomplete slop 2025-01-03 09:29:25 -08:00
Botao Chen
4320b0ebb2
[Post training] make validation steps configurable (#715)
## what does this PR do? 
The current code hardcode the validation steps to run (forgot to change
it after testing). in this PR, we make it configurable by training
config

## test 
On client side, issue a post training request with 20 validation steps,
server side logging shows that it runs 20 validation steps successfully
<img width="1128" alt="Screenshot 2025-01-02 at 8 21 06 PM"
src="https://github.com/user-attachments/assets/7a757516-c6ba-41d4-85c5-361a80ecf46e"
/>
2025-01-03 08:43:24 -08:00
Botao Chen
f450a0fd32
Change post training run.yaml inference config (#710)
## Context
Colab notebook provides some limited free T4 GPU. 

Making post training template e2e works with colab notebook T4 is
critical for early adoption of the stack post training apis. However, we
found that the existing LlamaModelParallelGenerator
(https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/inline/inference/meta_reference/inference.py#L82)
in meta-reference inference implementation isn't compatible with T4
machine.

In this PR, We change to disable create_distributed_process_group for
inference api in post training run.yaml config and setup up the
distributed env variables in notebook
<img width="493" alt="Screenshot 2025-01-02 at 3 48 08 PM"
src="https://github.com/user-attachments/assets/dd159f70-4cff-475c-b459-1fc6e2c720ba"
/>

to make meta reference inference compatible with the free T4 machine

 ## test
Test with the WIP post training showcase colab notebook
https://colab.research.google.com/drive/1K4Q2wZq232_Bpy2ud4zL9aRxvCWAwyQs?usp=sharing
2025-01-03 08:37:48 -08:00
Aidan Do
e1f42eb5a5
[#432] Add Groq Provider - chat completions (#609)
# What does this PR do?

Contributes towards issue (#432)

- Groq text chat completions
- Streaming
- All the sampling params that Groq supports

A lot of inspiration taken from @mattf's good work at
https://github.com/meta-llama/llama-stack/pull/355

**What this PR does not do**

- Tool calls (Future PR)
- Adding llama-guard model
- See if we can add embeddings

### PR Train

- https://github.com/meta-llama/llama-stack/pull/609 👈 
- https://github.com/meta-llama/llama-stack/pull/630


## Test Plan

<details>

<summary>Environment</summary>

```bash
export GROQ_API_KEY=<api_key>

wget https://raw.githubusercontent.com/aidando73/llama-stack/240e6e2a9c20450ffdcfbabd800a6c0291f19288/build.yaml
wget https://raw.githubusercontent.com/aidando73/llama-stack/92c9b5297f9eda6a6e901e1adbd894e169dbb278/run.yaml

# Build and run environment
pip install -e . \
&& llama stack build --config ./build.yaml --image-type conda \
&& llama stack run ./run.yaml \
  --port 5001
```

</details>

<details>

<summary>Manual tests</summary>

Using this jupyter notebook to test manually:
2140976d76/hello.ipynb

Use this code to test passing in the api key from provider_data

```
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5001",
)

response = client.inference.chat_completion(
    model_id="Llama3.2-3B-Instruct",
    messages=[
        {"role": "user", "content": "Hello, world client!"},
    ],
    # Test passing in groq_api_key from the client
    # Need to comment out the groq_api_key in the run.yaml file
    x_llama_stack_provider_data='{"groq_api_key": "<api-key>"}',
    # stream=True,
)
response
```

</details>

<details>
<summary>Integration</summary>

`pytest llama_stack/providers/tests/inference/test_text_inference.py -v
-k groq`

(run in same environment)

```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_3b-groq] PASSED                 [  6%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_3b-groq] SKIPPED (Other inf...) [ 12%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_3b-groq] SKIPPED [ 18%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_3b-groq] PASSED [ 25%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_3b-groq] SKIPPED (Ot...) [ 31%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_3b-groq] PASSED  [ 37%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_3b-groq] SKIPPED [ 43%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_3b-groq] SKIPPED [ 50%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[llama_8b-groq] PASSED                 [ 56%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[llama_8b-groq] SKIPPED (Other inf...) [ 62%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[llama_8b-groq] SKIPPED [ 68%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[llama_8b-groq] PASSED [ 75%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[llama_8b-groq] SKIPPED (Ot...) [ 81%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[llama_8b-groq] PASSED  [ 87%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[llama_8b-groq] SKIPPED [ 93%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[llama_8b-groq] SKIPPED [100%]

======================================= 6 passed, 10 skipped, 160 deselected, 7 warnings in 2.05s ========================================
```
</details>

<details>
<summary>Unit tests</summary>

`pytest llama_stack/providers/tests/inference/groq/ -v`

```
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_sets_model PASSED            [  5%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_user_message PASSED [ 10%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_system_message PASSED [ 15%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_converts_completion_message PASSED [ 20%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_logprobs PASSED [ 25%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_response_format PASSED [ 30%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_does_not_include_repetition_penalty PASSED [ 35%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_stream PASSED       [ 40%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_n_is_1 PASSED                [ 45%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_if_max_tokens_is_0_then_it_is_not_included PASSED [ 50%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_max_tokens_if_set PASSED [ 55%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_temperature PASSED  [ 60%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertChatCompletionRequest::test_includes_top_p PASSED        [ 65%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_returns_response PASSED [ 70%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_stop_to_end_of_message PASSED [ 75%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertNonStreamChatCompletionResponse::test_maps_length_to_end_of_message PASSED [ 80%]
llama_stack/providers/tests/inference/groq/test_groq_utils.py::TestConvertStreamChatCompletionResponse::test_returns_stream PASSED [ 85%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_raises_runtime_error_if_config_is_not_groq_config PASSED [ 90%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqInit::test_returns_groq_adapter PASSED                            [ 95%]
llama_stack/providers/tests/inference/groq/test_init.py::TestGroqConfig::test_api_key_defaults_to_env_var PASSED                   [100%]

==================================================== 20 passed, 11 warnings in 0.08s =====================================================
```

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation
- [x] Wrote necessary unit or integration tests.
2025-01-03 08:27:49 -08:00