# What does this PR do?
Serialize objects to built in types to avoid otel warnings
## Test Plan
╰─❯ llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml
# What does this PR do?
Cerebras is rolling out support for llama 3.3 70b and deprecating llama
3.1 70b. This PR updates the documentation, config, and internal mapping
to reflect this change.
cc: @ashwinb @raghotham
## What does this PR do?
This is a long-pending change and particularly important to get done
now.
Specifically:
- we cannot "localize" (aka download) any URLs from media attachments
anywhere near our modeling code. it must be done within llama-stack.
- `PIL.Image` is infesting all our APIs via `ImageMedia ->
InterleavedTextMedia` and that cannot be right at all. Anything in the
API surface must be "naturally serializable". We need a standard `{
type: "image", image_url: "<...>" }` which is more extensible
- `UserMessage`, `SystemMessage`, etc. are moved completely to
llama-stack from the llama-models repository.
See https://github.com/meta-llama/llama-models/pull/244 for the
corresponding PR in llama-models.
## Test Plan
```bash
cd llama_stack/providers/tests
pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py
pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py
pytest -s -v -k chroma memory/test_memory.py \
--env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar
pytest -s -v -k fireworks agents/test_agents.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
```
Updated the client sdk (see PR ...), installed the SDK in the same
environment and then ran the SDK tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py
LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py
# this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly
INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py
```
# What does this PR do?
This PR fixes a broken link in the README.md that was causing a 404
error. The link to `getting_started.ipynb` was pointing to a
non-existent file. Updated it to point to the correct notebook
`Llama_Stack_Building_AI_Applications.ipynb` which contains the
walk-through for text and vision inference llama_stack_client APIs.
- [x] Addresses issue (#633 )
## Test Plan
1. Verified that the new notebook path exists:
```bash
ls docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
```
2. Verified the notebook content contains text and vision inference
examples by:
- Checking the notebook contents
- Confirming the presence of vision models like
Llama-3.2-11B-Vision-Instruct
- Verifying llama_stack_client API usage examples
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section.
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests (N/A - documentation
change only).
# What does this PR do?
**Why**
- When AgentConfig has no `input_shields` / `output_shields` defined, we
still outputs a shield_call step with violation=None. This is impossible
to distinguish the case b/w (1) no violation from running shields v.s.
(2) no shields call
**What**
- We should not have a shield_call step when no `input_shields` /
`output_shields` are defined.
- Also removes a never reached try/catch code block in agent loop.
`run_multiple_shields` is never called in the try block (verified by
stacktrace print)
**Side Note**
- pre-commit fix
## Test Plan
Tested w/ DirectClient via:
https://gist.github.com/yanxi0830/b48f2a53b6f5391b9ff1e39992bc05b3
**No Shields**
<img width="858" alt="image"
src="https://github.com/user-attachments/assets/67319370-329f-4954-bd16-d21ce54c6ebf"
/>
**With Input + Output Shields**
<img width="854" alt="image"
src="https://github.com/user-attachments/assets/75ab1bee-3ba9-4549-ab51-23210be83da7"
/>
**Input Shields Only**
<img width="858" alt="image"
src="https://github.com/user-attachments/assets/1897206b-13dd-4ea5-92c2-b39bf68e9286"
/>
E2E pytest
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk/agents/test_agents.py
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
We cannot use recursive types because not only does our OpenAPI
generator not like them, even if it did, it is not easy for all client
languages to automatically construct proper APIs (especially considering
garbage collection) around them. For now, we can return a `Dict[str,
SpanWithStatus]` instead of `SpanWithChildren` and rely on the client to
reconstruct the tree.
Also fixed a super subtle issue with the OpenAPI generation process
(monkey-patching of json_schema_type wasn't working because of import
reordering.)
# What does this PR do?
**Why**
- Clean up examples which we will not maintain; reduce the surface area
to the minimal showcases
**What**
- Delete `client.py` in /apis/*
- Move all scripts to unit tests
- SDK sync in the future will just require running pytests
**Side notes**
- `bwrap` not available on Mac so code_interpreter will not work
## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/36bfe537-628d-43c3-8479-dcfcfe2e4035"
/>
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
### Context
In this PR, we
- Implement the post training job management and get training artifacts
apis
- get_training_jobs
- get_training_job_status
- get_training_job_artifacts
- get_training_job_logstream is deleted since the trace can be directly
accessed by UI with Jaeger
https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#jaeger-to-visualize-traces
- Refactor the post training and training types definition to make them
more intuitive.
- Rewrite the checkpointer to make it compatible with llama-stack file
system and can be recognized during inference
### Test
Unit test
`pytest llama_stack/providers/tests/post_training/test_post_training.py
-m "torchtune_post_training_huggingface_datasetio" -v -s --tb=short
--disable-warnings`
<img width="1506" alt="Screenshot 2024-12-10 at 4 06 17 PM"
src="https://github.com/user-attachments/assets/16225029-bdb7-48c4-9d13-e580cc769c0a">
e2e test with client side call
<img width="888" alt="Screenshot 2024-12-10 at 4 09 44 PM"
src="https://github.com/user-attachments/assets/de375e4c-ef67-4dcc-a045-4037d9489191">
# What does this PR do?
Adds the sentence transformer provider and the `all-MiniLM-L6-v2`
embedding model to the default models to register in the run.yaml for
all providers.
## Test Plan
llama stack build --template together --image-type conda
llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml