Embedding models are tiny and can be pulled on-demand. Let's do that so
the user doesn't have to do "yet another thing" to get themselves set
up.
Thanks @hardikjshah for the suggestion.
Also fixed a build dependency miss (TODO: distro_codegen needs to
actually check that the build template contains all providers mentioned
for the run.yaml file)
## Test Plan
First run `ollama rm all-minilm:latest`.
Run `llama stack build --template ollama && llama stack run ollama --env
INFERENCE_MODEL=llama3.2:3b-instruct-fp16`. See that it outputs a
"Pulling embedding model `all-minilm:latest`" output and the stack
starts up correctly. Verify that `ollama list` shows the model is
correctly downloaded.
# What does this PR do?
Catches a bug in the previous codegen which was removing newlines.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
python llama_stack/scripts/distro_codegen.py
```
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
# What does this PR do?
Catches docs up to source with:
```
python llama_stack/scripts/distro_codegen.py
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Manually checked
```
sphinx-autobuild docs/source build/html
```
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
- **[docs] Fix misc typos and formatting issues in intro docs**
- **[docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started**
- **[docs] Show that `llama-stack-client configure` will ask for api
key**
# What does this PR do?
Miscellaneous fixes in the documentation; not worth reporting an issue.
## Test Plan
No code changes. Addressed issues spotted when walking through the
guide.
Confirmed locally.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Rename environment var for consistency
## Test Plan
No regressions
## Sources
## Before submitting
- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator
## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
pytest -s -v -k together llama_stack/providers/tests/tools/test_tools.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994
Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing