Commit graph

21 commits

Author SHA1 Message Date
Ashwin Bharambe
abfbaf3c1b
refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401)
All of the tests from `llama_stack/providers/tests/` are now moved to
`tests/integration`.

I converted the `tools`, `scoring` and `datasetio` tests to use API.
However, `eval` and `post_training` proved to be a bit challenging to
leaving those. I think `post_training` should be relatively
straightforward also.

As part of this, I noticed that `wolfram_alpha` tool wasn't added to
some of our commonly used distros so I added it. I am going to remove a
lot of code duplication from distros next so while this looks like a
one-off right now, it will go away and be there uniformly for all
distros.
2025-03-04 14:53:47 -08:00
Ashwin Bharambe
04de2f84e9
fix: register provider model name and HF alias in run.yaml (#1304)
Each model known to the system has two identifiers: 

- the `provider_resource_id` (what the provider calls it) -- e.g.,
`accounts/fireworks/models/llama-v3p1-8b-instruct`
- the `identifier` (`model_id`) under which it is registered and gets
routed to the appropriate provider.

We have so far used the HuggingFace repo alias as the standardized
identifier you can use to refer to the model. So in the above example,
we'd use `meta-llama/Llama-3.1-8B-Instruct` as the name under which it
gets registered. This makes it convenient for users to refer to these
models across providers.

However, we forgot to register the _actual_ provider model ID also. You
should be able to route via `provider_resource_id` also, of course.

This change fixes this (somewhat grave) omission.

*Note*: this change is additive -- more aliases work now compared to
before.

## Test Plan

Run the following for distro=(ollama fireworks together)
```
LLAMA_STACK_CONFIG=$distro \
   pytest -s -v tests/client-sdk/inference/test_text_inference.py \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct --vision-inference-model=""
```
2025-02-27 16:39:23 -08:00
Ashwin Bharambe
2608b6074f Update embedding dimension singular 2025-02-20 16:14:46 -08:00
Ashwin Bharambe
9436dd570d
feat: register embedding models for ollama, together, fireworks (#1190)
# What does this PR do?

We have support for embeddings in our Inference providers, but so far we
haven't done the final step of actually registering the known embedding
models and making sure they are extremely easy to use. This is one step
towards that.

## Test Plan

Run existing inference tests.

```bash

$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
   --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$  pytest -s -v -k together test_embeddings.py \
   --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
   --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784
```

The value of the EMBEDDING_DIMENSION isn't actually used in these tests,
it is merely used by the test fixtures to check if the model is an LLM
or Embedding.
2025-02-20 15:39:08 -08:00
Xi Yan
8b655e3cd2
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do?

- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- This change is backward compatible 
- Run notebook test with

```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```

<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>



[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 16:40:58 -08:00
Ellis Tarn
ab9516c789
fix: Gaps in doc codegen (#1035)
# What does this PR do?
Catches docs up to source with:
```
python llama_stack/scripts/distro_codegen.py
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Manually checked
```
sphinx-autobuild docs/source build/html
```

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
2025-02-10 13:24:15 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Dinesh Yeduguru
3d4c53dfec
add mcp runtime as default to all providers (#816)
# What does this PR do?

This is needed to have the notebook work with MCP
2025-01-17 16:40:58 -08:00
Xi Yan
d1f3b032c9
cerebras template update for memory (#792)
# What does this PR do?

- we no longer have meta-reference as memory provider, update cerebras
template


## Test Plan

```
python llama_stack/scripts/distro_codegen.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 16:07:53 -08:00
Dinesh Yeduguru
a5c57cd381
agents to use tools api (#673)
# What does this PR do?

PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator


## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

pytest -s -v -k together  llama_stack/providers/tests/tools/test_tools.py \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994

Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
2025-01-08 19:01:00 -08:00
Xi Yan
7a4383e4c1
add 3.3 to together inference provider (#729)
# What does this PR do?

- add llama3.3 model for together
- fix fireworks distro_codegen

```
python llama_stack/scripts/distro_codegen.py
```

## Test Plan

<img width="1132" alt="image"
src="https://github.com/user-attachments/assets/bf94b933-9200-4e73-878e-d1a95d450a88"
/>

**Tests**
```
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.3-70B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```
<img width="1139" alt="image"
src="https://github.com/user-attachments/assets/407dc98b-8de3-4841-8cb1-75e4b5128544"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-06 15:39:41 -08:00
Dinesh Yeduguru
516e1a3e59
add embedding model by default to distribution templates (#617)
# What does this PR do?
Adds the sentence transformer provider and the `all-MiniLM-L6-v2`
embedding model to the default models to register in the run.yaml for
all providers.

## Test Plan
llama stack build --template together --image-type conda
llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml
2024-12-13 12:48:00 -08:00
Xi Yan
cd40a5fdbf
update template run.yaml to include openai api key for braintrust (#590)
# What does this PR do?

**Why**
- braintrust provider needs OpenAI API Key set in config for
DirectClient to work

## Test Plan
```
python llama_stack/scripts/distro_codegen.py 
```

<img width="340" alt="image"
src="https://github.com/user-attachments/assets/eae38296-f880-40f0-9a9e-46a12038db64">

- set API key in client via provider_data
<img width="907" alt="image"
src="https://github.com/user-attachments/assets/3d74cd7c-dc7e-4a42-8a40-c22f19b0c534">


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-09 15:40:59 -08:00
Ashwin Bharambe
e951852848 Miscellaneous fixes around telemetry, library client and run yaml autogen
Also add a `venv` image-type for llama stack build
2024-12-08 20:40:22 -08:00
Xi Yan
7301403ce3
Add eval/scoring/datasetio API providers to distribution templates & UI developer guide (#564)
# What does this PR do?

- add /eval, /scoring, /datasetio API providers to distribution
templates
- regenerate build.yaml / run.yaml files
- fix `template.py` to take in list of providers instead of only first
one
- override memory provider as faiss default for all distro (as only 1
memory provider is needed to start basic flow, chromadb/pgvector need
additional setup step).
```
python llama_stack/scripts/distro_codegen.py
```

- updated README to start UI via conda builds. 

## Test Plan

```
python llama_stack/scripts/distro_codegen.py
```

- Use newly generated `run.yaml` to start server
```
llama stack run ./llama_stack/templates/together/run.yaml
```
<img width="1191" alt="image"
src="https://github.com/user-attachments/assets/62f7d179-0cd0-427c-b6e8-e087d4648f09">


#### Registration
```
❯ llama-stack-client datasets register \
--dataset-id "mmlu" \
--provider-id "huggingface" \
--url "https://huggingface.co/datasets/llamastack/evals" \
--metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
--schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string", "chat_completion_input": {"type": "string"}}}'
❯ llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata                                ┃ type    ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ mmlu       │ huggingface │ {'path': 'llamastack/evals', 'name':    │ dataset │
│            │             │ 'evals__mmlu__details', 'split':        │         │
│            │             │ 'train'}                                │         │
└────────────┴─────────────┴─────────────────────────────────────────┴─────────┘
```

```
❯ llama-stack-client datasets register \
--dataset-id "simpleqa" \
--provider-id "huggingface" \
--url "https://huggingface.co/datasets/llamastack/evals" \
--metadata '{"path": "llamastack/evals", "name": "evals__simpleqa", "split": "train"}' \
--schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string", "chat_completion_input": {"type": "string"}}}'
❯ llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata                                                      ┃ type    ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ mmlu       │ huggingface │ {'path': 'llamastack/evals', 'name': 'evals__mmlu__details',  │ dataset │
│            │             │ 'split': 'train'}                                             │         │
│ simpleqa   │ huggingface │ {'path': 'llamastack/evals', 'name': 'evals__simpleqa',       │ dataset │
│            │             │ 'split': 'train'}                                             │         │
└────────────┴─────────────┴───────────────────────────────────────────────────────────────┴─────────┘
```

```
❯ llama-stack-client eval_tasks register \
> --eval-task-id meta-reference-mmlu \
> --provider-id meta-reference \
> --dataset-id mmlu \
> --scoring-functions basic::regex_parser_multiple_choice_answer
❯ llama-stack-client eval_tasks register \
--eval-task-id meta-reference-simpleqa \
--provider-id meta-reference \
--dataset-id simpleqa \
--scoring-functions llm-as-judge::405b-simpleqa
❯ llama-stack-client eval_tasks list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ dataset_id ┃ identifier       ┃ metadata ┃ provider_id    ┃ provider_resour… ┃ scoring_functio… ┃ type      ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ mmlu       │ meta-reference-… │ {}       │ meta-reference │ meta-reference-… │ ['basic::regex_… │ eval_task │
│ simpleqa   │ meta-reference-… │ {}       │ meta-reference │ meta-reference-… │ ['llm-as-judge:… │ eval_task │
└────────────┴──────────────────┴──────────┴────────────────┴──────────────────┴──────────────────┴───────────┘
```

#### Test with UI
```
streamlit run app.py
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-05 16:29:32 -08:00
Ashwin Bharambe
d463d68e1e Update docs 2024-11-18 23:21:25 -08:00
Ashwin Bharambe
7693786322 Use HF names for registering fireworks and together models 2024-11-18 22:34:47 -08:00
Ashwin Bharambe
1fb61137ad Add conda_env 2024-11-18 16:08:14 -08:00
Ashwin Bharambe
2a31163178
Auto-generate distro yamls + docs (#468)
# What does this PR do?

Automatically generates
- build.yaml
- run.yaml
- run-with-safety.yaml
- parts of markdown docs

for the distributions.

## Test Plan

At this point, this only updates the YAMLs and the docs. Some testing
(especially with ollama and vllm) has been performed but needs to be
much more tested.
2024-11-18 14:57:06 -08:00