forked from phoenix-oss/llama-stack-mirror

Ihar Hrachyshka 0cbb3e401c

- **[docs] Fix misc typos and formatting issues in intro docs**
- **[docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started**
- **[docs] Show that `llama-stack-client configure` will ask for api
key**

# What does this PR do?

Miscellaneous fixes in the documentation; not worth reporting an issue.

## Test Plan

No code changes. Addressed issues spotted when walking through the
guide.
Confirmed locally.

## Sources

Please link relevant resources if necessary.

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

2025-02-04 15:31:30 -08:00

2.3 KiB

Raw Permalink Blame History

List of Distributions

Here are a list of distributions you can use to start a Llama Stack server that are provided out of the box.

Selection of a Distribution / Template

Which templates / distributions to choose depends on the hardware you have for running LLM inference.

Do you want a hosted Llama Stack endpoint? If so, we suggest leveraging our partners who host Llama Stack endpoints. Namely, fireworks.ai and together.xyz.
- Read more about it here - Remote-Hosted Endpoints.
Do you have access to machines with GPUs? If you wish to run Llama Stack locally or on a cloud instance and host your own Llama Stack endpoint, we suggest:
- {dockerhub}distribution-remote-vllm (Guide)
- {dockerhub}distribution-meta-reference-gpu (Guide)
- {dockerhub}distribution-tgi (Guide)
- {dockerhub}distribution-nvidia (Guide)
Are you running on a "regular" desktop or laptop ? We suggest using the ollama template for quick prototyping and get started without having to worry about needing GPUs.
- {dockerhub}distribution-ollama (link)
Do you have an API key for a remote inference provider like Fireworks, Together, etc.? If so, we suggest:
- {dockerhub}distribution-together (Guide)
- {dockerhub}distribution-fireworks (Guide)
Do you want to run Llama Stack inference on your iOS / Android device? Lastly, we also provide templates for running Llama Stack inference on your iOS / Android device:
- iOS SDK
- Android
If none of the above fit your needs, you can also build your own custom distribution.

Distribution Details

:maxdepth: 1

remote_hosted_distro/index
self_hosted_distro/remote-vllm
self_hosted_distro/meta-reference-gpu
self_hosted_distro/tgi
self_hosted_distro/nvidia
self_hosted_distro/ollama
self_hosted_distro/together
self_hosted_distro/fireworks

On-Device Distributions

:maxdepth: 1

ondevice_distro/ios_sdk
ondevice_distro/android_sdk

2.3 KiB Raw Permalink Blame History

List of Distributions

Selection of a Distribution / Template

Distribution Details

On-Device Distributions

2.3 KiB

Raw Permalink Blame History