forked from phoenix-oss/llama-stack-mirror
- **[docs] Fix misc typos and formatting issues in intro docs** - **[docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started** - **[docs] Show that `llama-stack-client configure` will ask for api key** # What does this PR do? Miscellaneous fixes in the documentation; not worth reporting an issue. ## Test Plan No code changes. Addressed issues spotted when walking through the guide. Confirmed locally. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
55 lines
2.3 KiB
Markdown
55 lines
2.3 KiB
Markdown
# List of Distributions
|
|
|
|
Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.
|
|
|
|
## Selection of a Distribution / Template
|
|
|
|
Which templates / distributions to choose depends on the hardware you have for running LLM inference.
|
|
|
|
- **Do you want a hosted Llama Stack endpoint?** If so, we suggest leveraging our partners who host Llama Stack endpoints. Namely, _fireworks.ai_ and _together.xyz_.
|
|
- Read more about it here - [Remote-Hosted Endpoints](remote_hosted_distro/index).
|
|
|
|
|
|
- **Do you have access to machines with GPUs?** If you wish to run Llama Stack locally or on a cloud instance and host your own Llama Stack endpoint, we suggest:
|
|
- {dockerhub}`distribution-remote-vllm` ([Guide](self_hosted_distro/remote-vllm))
|
|
- {dockerhub}`distribution-meta-reference-gpu` ([Guide](self_hosted_distro/meta-reference-gpu))
|
|
- {dockerhub}`distribution-tgi` ([Guide](self_hosted_distro/tgi))
|
|
- {dockerhub}`distribution-nvidia` ([Guide](self_hosted_distro/nvidia))
|
|
|
|
- **Are you running on a "regular" desktop or laptop ?** We suggest using the ollama template for quick prototyping and get started without having to worry about needing GPUs.
|
|
- {dockerhub}`distribution-ollama` ([link](self_hosted_distro/ollama))
|
|
|
|
- **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest:
|
|
- {dockerhub}`distribution-together` ([Guide](self_hosted_distro/together))
|
|
- {dockerhub}`distribution-fireworks` ([Guide](self_hosted_distro/fireworks))
|
|
|
|
- **Do you want to run Llama Stack inference on your iOS / Android device?** Lastly, we also provide templates for running Llama Stack inference on your iOS / Android device:
|
|
- [iOS SDK](ondevice_distro/ios_sdk)
|
|
- [Android](ondevice_distro/android_sdk)
|
|
|
|
|
|
- **If none of the above fit your needs, you can also build your own [custom distribution](building_distro).**
|
|
|
|
### Distribution Details
|
|
|
|
```{toctree}
|
|
:maxdepth: 1
|
|
|
|
remote_hosted_distro/index
|
|
self_hosted_distro/remote-vllm
|
|
self_hosted_distro/meta-reference-gpu
|
|
self_hosted_distro/tgi
|
|
self_hosted_distro/nvidia
|
|
self_hosted_distro/ollama
|
|
self_hosted_distro/together
|
|
self_hosted_distro/fireworks
|
|
```
|
|
|
|
### On-Device Distributions
|
|
|
|
```{toctree}
|
|
:maxdepth: 1
|
|
|
|
ondevice_distro/ios_sdk
|
|
ondevice_distro/android_sdk
|
|
```
|