Commit graph

132 commits

Author SHA1 Message Date
Dinesh Yeduguru
b46d94d87d do not pass memory tools to inference 2025-01-08 18:53:32 -08:00
Dinesh Yeduguru
a7a55748ca address feedback 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
e08b7f4432 move _interpret_content_as_attachment to outside 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
67b35613bb test turn overrides in unit tests 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
854fef7478 add unit tests for chat agent 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
db2ec110a1 fix failing code interpreter tests 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
82395ba654 fix the rag query generator types 2025-01-08 18:25:52 -08:00
Dinesh Yeduguru
efe3189728 client sdk test fixes 2025-01-08 18:25:51 -08:00
Dinesh Yeduguru
c3865faf37 minor fixes 2025-01-08 18:25:51 -08:00
Dinesh Yeduguru
6632d7e410 fix list tools method name 2025-01-08 18:25:51 -08:00
Dinesh Yeduguru
94cca7a72a add wolfram alpha, bing search 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
f9a98c278a simplify toolgroups registration 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
ba242c04cc remove memory from available tools to agent 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
e3775eb6f6 rename UserDefinedToolDef to ToolDef 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
db0b2a60c1 remove breakpoints 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
17abffb505 fix handle_docs 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
9efe30c9d3 add documents to turn 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
d0e8e1647b add matplotlib_custom_backend.py 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
229999c572 add init.py 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
0bc876c130 minor fixes to agent instance 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
16d1f66f55 address feedback 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
ac46bd5eb4 address feedback 2025-01-08 18:25:21 -08:00
Dinesh Yeduguru
70b2a58bef linter fixes 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
9a3d7fa33c rebase fixes 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
f408fd3aca remove attachements, move memory bank to tool metadata 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
97798c8442 add a RAG test to client SDK 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
c76f5f418f move brave and tavily to remote 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
4dd2f4c363 working end to end client sdk tests with custom tools 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
1a66ddc1b5 add support for built in tool type 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
40f35f3a8d add code interpreter 2025-01-08 18:25:20 -08:00
Dinesh Yeduguru
2ad67529ef fix agents to run custom tools 2025-01-08 18:24:53 -08:00
Dinesh Yeduguru
9192a9bbb4 add tavily 2025-01-08 18:24:53 -08:00
Dinesh Yeduguru
dcdf9da6ef remove all usages of builtin tools in agents 2025-01-08 18:24:53 -08:00
Dinesh Yeduguru
f90e9c2003 agents to use tools api 2025-01-08 18:24:53 -08:00
Xi Yan
7a90fc5854
move DataSchemaValidatorMixin into standalone utils (#720)
# What does this PR do?

- there's no value in keeping data schema validation logic in a
DataSchemaValidatorMixin
- move into data schema validation logic into standalone utils

## Test Plan
```
pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py

pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```



## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-06 13:25:09 -08:00
Botao Chen
e86271aeac
support llama3.1 8B instruct in post training (#698)
## What does this PR do? 
- Change to support llama3.1 8B instruct model other than llama3 8B
model as llama3.1 8B instruct model is a better model to finetune on top
of
- Make the copy files logic in checkpointer safer in case the file be
copied doesn't exist in source path

## test
issue a post training request from client and verify training works as
expect
<img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM"
src="https://github.com/user-attachments/assets/47cc4df9-3edc-4afd-b5dd-abe1f039f1ed"
/>

<img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM"
src="https://github.com/user-attachments/assets/b9435274-ef1d-4570-bd8e-0880c3a4b2e9"
/>
2025-01-03 17:33:05 -08:00
Ashwin Bharambe
21357a6dee Kill autocomplete slop 2025-01-03 09:29:25 -08:00
Botao Chen
4320b0ebb2
[Post training] make validation steps configurable (#715)
## what does this PR do? 
The current code hardcode the validation steps to run (forgot to change
it after testing). in this PR, we make it configurable by training
config

## test 
On client side, issue a post training request with 20 validation steps,
server side logging shows that it runs 20 validation steps successfully
<img width="1128" alt="Screenshot 2025-01-02 at 8 21 06 PM"
src="https://github.com/user-attachments/assets/7a757516-c6ba-41d4-85c5-361a80ecf46e"
/>
2025-01-03 08:43:24 -08:00
Botao Chen
d9f75cc98f
Import from the right path (#708)
Import BaseModel and Field from pydantic
2025-01-02 13:15:31 -08:00
Botao Chen
750604c7af
[Post Training] Fix missing import (#705)
## context
Post training apis are broken after the import * refactor
https://github.com/meta-llama/llama-stack/pull/689. This PR is adding
the missing import back

## Test
Issue a post training request from client and the training finishes
successfully

<img width="1101" alt="Screenshot 2025-01-02 at 12 18 45 PM"
src="https://github.com/user-attachments/assets/8c781459-f340-4021-85e1-fc68b1dcb8c8"
/>

<img width="782" alt="Screenshot 2025-01-02 at 12 18 52 PM"
src="https://github.com/user-attachments/assets/14b04b7d-e5c7-4662-8fa6-748446ad3511"
/>
2025-01-02 13:08:20 -08:00
Xi Yan
3a269c4635
[rag evals] refactor & add ability to eval retrieval + generation in agentic eval pipeline (#664)
# What does this PR do?

- See https://github.com/meta-llama/llama-stack/pull/666 &
https://github.com/meta-llama/llama-stack/pull/668

- Refactor BaseScoringFn to be just a minimal interface, add new
RegistrableBaseScoring
- Refactor data schema check
- To separately evaluate retrieval component in RAG, we will have
scoring functions needing "context" column additionally.
- Refactor braintrust eval (more scoring fn added & tested in following
PR)

## Test Plan

```
pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py
```

<img width="847" alt="image"
src="https://github.com/user-attachments/assets/d099cb2d-6f9c-4bdf-9d0d-f388cf758c0f"
/>

```
pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```
<img width="850" alt="image"
src="https://github.com/user-attachments/assets/dce28fc3-0493-4d34-820a-567260873cc8"
/>



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-02 11:21:33 -08:00
Aidan Do
49ad168336
[#407] Agents: Avoid calling tools that haven't been explicitly enabled (#637)
# What does this PR do?

Contributes to issue (#407)

tl;dr - @subramen was getting a 500 error because llama-stack called
code_interpreter when it never was defined as a tool.

Prevents failures like:

<img width="544" alt="image"
src="https://github.com/user-attachments/assets/392683d2-4670-414c-aaba-07ebc006d748"
/>

```
# Server side
Traceback (most recent call last):
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator
    async for item in await event_gen:
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn
    async for chunk in self.run(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run
    async for res in self._run(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 560, in _run
    result_messages = await execute_tool_call_maybe(
  File "/opt/conda/envs/llamastack-vllm-stack/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 824, in execute_tool_call_maybe
    assert name in tools_dict, f"Tool {name} not found"
AssertionError: Tool code_interpreter not found
```

Instead, if the model hallucinates, we just let it hallucinate and let
the client know.

<img width="544" alt="image"
src="https://github.com/user-attachments/assets/d2418583-d45a-48db-b476-45a584f2986f"
/>

## Test Plan

<details>
<summary>pytest llama_stack/providers/tests/agents/test_agents.py -k
ollama</summary>

```
llama stack build --template ollama --image-type conda 
conda activate llamastack-ollama
```

```
llama_stack/providers/tests/agents/test_agents.py ..Fss                                                                                          [100%]

======================================================================= FAILURES =======================================================================
_________________________________________ TestAgents.test_rag_agent_as_attachments[--ollama][ollama] __________________________________________
llama_stack/providers/tests/agents/test_agents.py:261: in test_rag_agent_as_attachments
    turn_response = [
llama_stack/providers/tests/agents/test_agents.py:261: in <listcomp>
    turn_response = [
llama_stack/providers/inline/agents/meta_reference/agents.py:153: in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
llama_stack/providers/inline/agents/meta_reference/agent_instance.py:179: in create_and_execute_turn
    async for chunk in self.run(
llama_stack/providers/inline/agents/meta_reference/agent_instance.py:250: in run
    async for res in self._run(
llama_stack/providers/inline/agents/meta_reference/agent_instance.py:363: in _run
    rag_context, bank_ids = await self._retrieve_context(
llama_stack/providers/inline/agents/meta_reference/agent_instance.py:698: in _retrieve_context
    bank_id = await self._ensure_memory_bank(session_id)
llama_stack/providers/inline/agents/meta_reference/agent_instance.py:653: in _ensure_memory_bank
    await self.memory_banks_api.register_memory_bank(
llama_stack/providers/utils/telemetry/trace_protocol.py:101: in async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/distribution/routers/routing_tables.py:312: in register_memory_bank
    raise ValueError(
E   ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additional inference provider. See https://github.com/meta-llama/llama-stack/blob/main/llama_stack/templates/together/run.yaml for an example.
=============================================================== short test summary info ================================================================
FAILED llama_stack/providers/tests/agents/test_agents.py::TestAgents::test_rag_agent_as_attachments[--ollama] - ValueError: Embeddings are now served via Inference providers. Please upgrade your run.yaml to include inline::sentence-transformer as an additiona...
========================================== 1 failed, 2 passed, 2 skipped, 20 deselected, 5 warnings in 14.24s ==========================================
```

Unrelated test is failing (also failing on main)
</details>

<details>
<summary>Manual</summary>

Using this client code:
7ebc257b27/client.py

<img width="544" alt="Screenshot 2024-12-16 at 17 41 31"
src="https://github.com/user-attachments/assets/7425deaf-c94a-4dda-a635-922728e373f1"
/>

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-02 09:21:35 -08:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Botao Chen
bae197c37e
Fix post training apis broken by torchtune release (#674)
There is a torchtune release this morning
https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post
training apis

## test 
spinning up server and the post training works again after the fix 
<img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM"
src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee"
/>


## Note
We need to think hard of how to avoid this happen again and have a fast
follow up on this after holidays
2024-12-20 16:12:02 -08:00
Botao Chen
06cb0c837e
[torchtune integration] post training + eval (#670)
## What does this PR do?

- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training 


## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully

<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a"
/>
2024-12-20 13:43:13 -08:00
Dinesh Yeduguru
c8be0bf1c9
Tools API with brave and MCP providers (#639)
This PR adds a new Tools api and adds two tool runtime providers: brave
and MCP.

Test plan:
```
curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{ "tool_group_id": "simple_tool",
  "tool_group": {
    "type": "model_context_protocol",
    "endpoint": {"uri": "http://localhost:56000/sse"}
  },
  "provider_id": "model-context-protocol"
}'

 curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{
  "tool_group_id": "search", "provider_id": "brave-search",
  "tool_group": {
    "type": "user_defined",
    "tools": [
      {
        "name": "brave_search",
        "description": "A web search tool",
        "parameters": [
          {
            "name": "query",
            "parameter_type": "string",
            "description": "The query to search"
          }
        ],
        "metadata": {},
        "tool_prompt_format": "json"
      }
    ]
  }
}'

 curl -X GET http://localhost:5000/alpha/tools/list | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   662  100   662    0     0   333k      0 --:--:-- --:--:-- --:--:--  646k
[
  {
    "identifier": "brave_search",
    "provider_resource_id": "brave_search",
    "provider_id": "brave-search",
    "type": "tool",
    "tool_group": "search",
    "description": "A web search tool",
    "parameters": [
      {
        "name": "query",
        "parameter_type": "string",
        "description": "The query to search"
      }
    ],
    "metadata": {},
    "tool_prompt_format": "json"
  },
  {
    "identifier": "fetch",
    "provider_resource_id": "fetch",
    "provider_id": "model-context-protocol",
    "type": "tool",
    "tool_group": "simple_tool",
    "description": "Fetches a website and returns its content",
    "parameters": [
      {
        "name": "url",
        "parameter_type": "string",
        "description": "URL to fetch"
      }
    ],
    "metadata": {
      "endpoint": "http://localhost:56000/sse"
    },
    "tool_prompt_format": "json"
  }
]

curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' \
-d '{
    "tool_name": "fetch",
    "args": {
        "url": "http://google.com/"
    }
}'

 curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \
-d '{
    "tool_name": "brave_search",
    "args": {
        "query": "who is meta ceo"
    }
}'
```
2024-12-19 21:25:17 -08:00
Ashwin Bharambe
540fc4d717
Fix Meta reference GPU implementation (#663)
By performing in-place mutations, we lost. Never in life do that.
2024-12-19 14:09:45 -08:00
Ashwin Bharambe
f19eb8eee3 Update types in parallel_utils for meta-refernece-gpu impl 2024-12-19 13:58:41 -08:00
Xi Yan
5be2ea37b1 fix context_retriever model->model_id 2024-12-19 12:52:00 -08:00
Dinesh Yeduguru
03607a68c7
remove unused telemetry related code for console (#659)
# What does this PR do?
Remove unused code since this now exists in the meta reference provider
as a sink


## Test Plan

llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml
2024-12-19 11:21:11 -08:00