From 1e6006c5993b2adb8040e76fe83a404ac1f20602 Mon Sep 17 00:00:00 2001 From: Ashwin Bharambe Date: Fri, 22 Nov 2024 22:38:53 -0800 Subject: [PATCH] More simplification of the "Starting a Llama Stack" doc --- .../distributions/importing_as_library.md | 4 +-- docs/source/distributions/index.md | 28 ++++++++++--------- .../distributions/ondevice_distro/index.md | 12 -------- .../distributions/self_hosted_distro/index.md | 27 ------------------ 4 files changed, 17 insertions(+), 54 deletions(-) delete mode 100644 docs/source/distributions/ondevice_distro/index.md delete mode 100644 docs/source/distributions/self_hosted_distro/index.md diff --git a/docs/source/distributions/importing_as_library.md b/docs/source/distributions/importing_as_library.md index 63191981a..573779f82 100644 --- a/docs/source/distributions/importing_as_library.md +++ b/docs/source/distributions/importing_as_library.md @@ -1,6 +1,6 @@ -# Importing Llama Stack as a Python Library +# Using Llama Stack as a Library -Llama Stack is typically utilized in a client-server configuration. To get started quickly, you can import Llama Stack as a library and call the APIs directly without needing to set up a server. For [example](https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/lib/direct/test.py): +If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server. For [example](https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/lib/direct/test.py): ```python from llama_stack_client.lib.direct.direct import LlamaStackDirectClient diff --git a/docs/source/distributions/index.md b/docs/source/distributions/index.md index 8e4a75d08..04c495418 100644 --- a/docs/source/distributions/index.md +++ b/docs/source/distributions/index.md @@ -4,31 +4,33 @@ :hidden: importing_as_library -self_hosted_distro/index -remote_hosted_distro/index building_distro -ondevice_distro/index ``` -You can start a Llama Stack server using "distributions" (see [Concepts](../concepts/index)) in one of the following ways: -- **Docker**: we provide a number of pre-built Docker containers allowing you to get started instantly. If you are focused on application development, we recommend this option. You can also build your own custom Docker container. -- **Conda**: the `llama` CLI provides a simple set of commands to build, configure and run a Llama Stack server containing the exact combination of providers you wish. We have provided various templates to make getting started easier. + + + -Which distribution to choose depends on the hardware you have for running LLM inference. +You can instantiate a Llama Stack in one of the following ways: +- **As a Library**: this is the simplest, especially if you are using an external inference service. See [Using Llama Stack as a Library](importing_as_library) +- **Docker**: we provide a number of pre-built Docker containers so you can start a Llama Stack server instantly. You can also build your own custom Docker container. +- **Conda**: finally, you can build a custom Llama Stack server using `llama stack build` containing the exact combination of providers you wish. We have provided various templates to make getting started easier. + +Which templates / distributions to choose depends on the hardware you have for running LLM inference. - **Do you have access to a machine with powerful GPUs?** If so, we suggest: - - [distribution-remote-vllm](self_hosted_distro/remote-vllm) - - [distribution-meta-reference-gpu](self_hosted_distro/meta-reference-gpu) - - [distribution-tgi](self_hosted_distro/tgi) + - {dockerhub}`distribution-remote-vllm` ([Guide](self_hosted_distro/remote-vllm)) + - {dockerhub}`distribution-meta-reference-gpu` ([Guide](self_hosted_distro/meta-reference-gpu)) + - {dockerhub}`distribution-tgi` ([Guide](self_hosted_distro/tgi)) - **Are you running on a "regular" desktop machine?** If so, we suggest: - - [distribution-ollama](self_hosted_distro/ollama) + - {dockerhub}`distribution-ollama` ([Guide](self_hosted_distro/ollama)) - **Do you have an API key for a remote inference provider like Fireworks, Together, etc.?** If so, we suggest: - - [distribution-together](remote_hosted_distro/index) - - [distribution-fireworks](remote_hosted_distro/index) + - {dockerhub}`distribution-together` ([Guide](remote_hosted_distro/index)) + - {dockerhub}`distribution-fireworks` ([Guide](remote_hosted_distro/index)) - **Do you want to run Llama Stack inference on your iOS / Android device** If so, we suggest: - [iOS](ondevice_distro/ios_sdk) diff --git a/docs/source/distributions/ondevice_distro/index.md b/docs/source/distributions/ondevice_distro/index.md deleted file mode 100644 index cb2fe1959..000000000 --- a/docs/source/distributions/ondevice_distro/index.md +++ /dev/null @@ -1,12 +0,0 @@ -# On-Device Distributions - -```{toctree} -:maxdepth: 1 -:hidden: - -ios_sdk -``` - -On device distributions are Llama Stack distributions that run locally on your iOS / Android device. - -Currently, we only support the [iOS SDK](ios_sdk); support for Android is coming soon. diff --git a/docs/source/distributions/self_hosted_distro/index.md b/docs/source/distributions/self_hosted_distro/index.md deleted file mode 100644 index d2d4e365d..000000000 --- a/docs/source/distributions/self_hosted_distro/index.md +++ /dev/null @@ -1,27 +0,0 @@ -# Self-Hosted Distributions -```{toctree} -:maxdepth: 1 -:hidden: - -ollama -tgi -remote-vllm -meta-reference-gpu -meta-reference-quantized-gpu -together -fireworks -bedrock -``` - -We offer deployable distributions where you can host your own Llama Stack server using local inference. - -| **Distribution** | **Llama Stack Docker** | Start This Distribution | -|:----------------: |:------------------------------------------: |:-----------------------: | -| Ollama | {dockerhub}`distribution-ollama` | [Guide](ollama) | -| TGI | {dockerhub}`distribution-tgi` | [Guide](tgi) | -| vLLM | {dockerhub}`distribution-remote-vllm` | [Guide](remote-vllm) | -| Meta Reference | {dockerhub}`distribution-meta-reference-gpu` | [Guide](meta-reference-gpu) | -| Meta Reference Quantized | {dockerhub}`distribution-meta-reference-quantized-gpu` | [Guide](meta-reference-quantized-gpu) | -| Together | {dockerhub}`distribution-together` | [Guide](together) | -| Fireworks | {dockerhub}`distribution-fireworks` | [Guide](fireworks) | -| Bedrock | {dockerhub}`distribution-bedrock` | [Guide](bedrock) |