# What does this PR do?
- Update readme for typescript SDK.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- Fix loading SambaNovaImpl issue
- Add LlamaGuard model support for inference
## Test Plan
Run the following unit test scripts and results
### Embedding
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
=================================================================================================================== 2 skipped, 1 warning in 0.32s ===================================================================================================================
```
### Vision
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 3 passed, 1 warning in 2.68s ====================================================================================================================
```
### Text
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.46s ====================================================================================================================
```
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.48s ====================================================================================================================
```
## Before submitting
- [] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [Y] Wrote necessary unit or integration tests.
# What does this PR do?
Update README and other documentation
## Before submitting
- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
I noticed that the documentation for other providers have this header,
so I have added it to the Cerebras docs too.
```
---
orphan: true
---
# TGI Distribution
```{toctree}
:maxdepth: 2
:hidden:
self
```
```
This also fixes a typo in README.md where the link to the Cerebras docs included an extra `getting_started` section.
I did notice however that https://hub.docker.com/r/llamastack/distribution-cerebras still does not exist. How do I get the Cerebras Docker image uploaded?
cc: @ashwinb @raghotham
## Before submitting
- [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR fixes a broken link in the README.md that was causing a 404
error. The link to `getting_started.ipynb` was pointing to a
non-existent file. Updated it to point to the correct notebook
`Llama_Stack_Building_AI_Applications.ipynb` which contains the
walk-through for text and vision inference llama_stack_client APIs.
- [x] Addresses issue (#633 )
## Test Plan
1. Verified that the new notebook path exists:
```bash
ls docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
```
2. Verified the notebook content contains text and vision inference
examples by:
- Checking the notebook contents
- Confirming the presence of vision models like
Llama-3.2-11B-Vision-Instruct
- Verifying llama_stack_client API usage examples
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section.
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests (N/A - documentation
change only).
# What does this PR do?
I think I misunderstood the meaning of “single node” when describing the
type of the Cerebras integration. It should be hosted instead of single
node as the inference is done via API call.
cc: @ashwinb @raghotham
- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
#525 introduced a telemetry configuration named jaeger, but what it
really is pointing to is an OTLP HTTP endpoint which is supported by
most servers in the ecosystem, including raw opentelemetry collectors,
several APMs, and even https://github.com/ymtdzzz/otel-tui
I chose to rename this to "otel" as it will bring in more people to the
ecosystem vs feeling it only works with jaeger. Later, we can use the
[standard
ENV](https://opentelemetry.io/docs/specs/otel/protocol/exporter/) to
configure this if we like so that you can override things with variables
people might expect.
Note: I also added to the README that you have to install conda.
Depending on experience level of the user, and especially with miniforge
vs other ways, I felt this helps.
## Test Plan
I would like to test this, but actually got a little lost. The previous
PRs referenced yaml which doesn't seem published anywhere. It would be
nice to have a pre-canned setup that uses ollama and turns on otel, but
would also appreciate a hand on instructions meanwhile.
## Sources
https://github.com/meta-llama/llama-stack/pull/525
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
# What does this PR do?
Many of the URLs pointing to the Llama Stack's Read The Docs webpages
were broken, presumably due to recent refactor of the documentation.
This PR fixes all effected URLs throughout the repository.
# What does this PR do?
- updated the notebooks to reflect past changes up to llama-stack 0.0.53
- updated readme to provide accurate and up-to-date info
- improve the current zero to hero by integrating an example using
together api
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Co-authored-by: Sanyam Bhutani <sanyambhutani@meta.com>
# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
Add Kotlin package link into readme docs
# What does this PR do?
It shows a complete zero-setup Colab using the Llama Stack server
implemented and powered by together.ai: using Llama Stack Client API to
run inference, agent and 3.2 models. Good for a quick start guide.
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
* API Keys passed from Client instead of distro configuration
* delete distribution registry
* Rename the "package" word away
* Introduce a "Router" layer for providers
Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:
- The inference API should be a routing layer over inference providers,
routed using the "model" key
- The memory banks API is another instance where various memory bank
types will be provided by independent providers (e.g., a vector store
is served by Chroma while a keyvalue memory can be served by Redis or
PGVector)
This commit introduces a generalized routing layer for this purpose.
* update `apis_to_serve`
* llama_toolchain -> llama_stack
* Codemod from llama_toolchain -> llama_stack
- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent
* Moved some stuff out of common/; re-generated OpenAPI spec
* llama-toolchain -> llama-stack (hyphens)
* add control plane API
* add redis adapter + sqlite provider
* move core -> distribution
* Some more toolchain -> stack changes
* small naming shenanigans
* Removing custom tool and agent utilities and moving them client side
* Move control plane to distribution server for now
* Remove control plane from API list
* no codeshield dependency randomly plzzzzz
* Add "fire" as a dependency
* add back event loggers
* stack configure fixes
* use brave instead of bing in the example client
* add init file so it gets packaged
* add init files so it gets packaged
* Update MANIFEST
* bug fix
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
* Add distribution CLI scaffolding
* More progress towards `llama distribution install`
* getting closer to a distro definition, distro install + configure works
* Distribution server now functioning
* read existing configuration, save enums properly
* Remove inference uvicorn server entrypoint and llama inference CLI command
* updated dependency and client model name
* Improved exception handling
* local imports for faster cli
* undo a typo, add a passthrough distribution
* implement full-passthrough in the server
* add safety adapters, configuration handling, server + clients
* cleanup, moving stuff to common, nuke utils
* Add a Path() wrapper at the earliest place
* fixes
* Bring agentic system api to toolchain
Add adapter dependencies and resolve adapters using a topological sort
* refactor to reduce size of `agentic_system`
* move straggler files and fix some important existing bugs
* ApiSurface -> Api
* refactor a method out
* Adapter -> Provider
* Make each inference provider into its own subdirectory
* installation fixes
* Rename Distribution -> DistributionSpec, simplify RemoteProviders
* dict key instead of attr
* update inference config to take model and not model_dir
* Fix passthrough streaming, send headers properly not part of body :facepalm
* update safety to use model sku ids and not model dirs
* Update cli_reference.md
* minor fixes
* add DistributionConfig, fix a bug in model download
* Make install + start scripts do proper configuration automatically
* Update CLI_reference
* Nuke fp8_requirements, fold fbgemm into common requirements
* Update README, add newline between API surface configurations
* Refactor download functionality out of the Command so can be reused
* Add `llama model download` alias for `llama download`
* Show message about checksum file so users can check themselves
* Simpler intro statements
* get ollama working
* Reduce a bunch of dependencies from toolchain
Some improvements to the distribution install script
* Avoid using `conda run` since it buffers everything
* update dependencies and rely on LLAMA_TOOLCHAIN_DIR for dev purposes
* add validation for configuration input
* resort imports
* make optional subclasses default to yes for configuration
* Remove additional_pip_packages; move deps to providers
* for inline make 8b model the default
* Add scripts to MANIFEST
* allow installing from test.pypi.org
* Fix#2 to help with testing packages
* Must install llama-models at that same version first
* fix PIP_ARGS
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Hardik Shah <hjshah@meta.com>