diff --git a/docs/source/distribution_dev/building_distro.md b/docs/source/distribution_dev/building_distro.md index 49daba71c..a5f09a1c7 100644 --- a/docs/source/distribution_dev/building_distro.md +++ b/docs/source/distribution_dev/building_distro.md @@ -1,4 +1,6 @@ -# Building a Llama Stack Distribution +# Developer Guide: Assemble a Llama Stack Distribution + +> NOTE: This doc is out-of-date. This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution. @@ -237,27 +239,3 @@ INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) > You might need to use the flag `--disable-ipv6` to Disable IPv6 support This server is running a Llama model locally. - -## Step 4. Test with Client -Once the server is setup, we can test it with a client to see the example outputs. - -``` -curl http://localhost:5000/inference/chat_completion \ --H "Content-Type: application/json" \ --d '{ - "model": "Llama3.1-8B-Instruct", - "messages": [ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": "Write me a 2 sentence poem about the moon"} - ], - "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} -}' - -Output: -{'completion_message': {'role': 'assistant', - 'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.', - 'stop_reason': 'out_of_tokens', - 'tool_calls': []}, - 'logprobs': null} - -``` diff --git a/docs/source/distribution_dev/index.md b/docs/source/distribution_dev/index.md new file mode 100644 index 000000000..3b7f3b01d --- /dev/null +++ b/docs/source/distribution_dev/index.md @@ -0,0 +1,19 @@ +# Llama Stack Developer Guide + +## Key Concepts + +### API Provider +A Provider is what makes the API real -- they provide the actual implementation backing the API. + +As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options. + +A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs. + +### Distribution +A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications. + +```{toctree} +:maxdepth: 1 + +building_distro +``` diff --git a/docs/source/getting_started/developer_cookbook.md b/docs/source/getting_started/developer_cookbook.md index a85055a39..3aef150a5 100644 --- a/docs/source/getting_started/developer_cookbook.md +++ b/docs/source/getting_started/developer_cookbook.md @@ -13,13 +13,13 @@ Based on your developer needs, below are references to guides to help you get st * Developer Need: I want to start a local Llama Stack server with my GPU using meta-reference implementations. * Effort: 5min * Guide: - - Please see our [Getting Started Guide](./getting_started.md) on starting up a meta-reference Llama Stack server. + - Please see our [meta-reference-gpu](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html) on starting up a meta-reference Llama Stack server. ### Llama Stack Server with Remote Providers * Developer need: I want a Llama Stack distribution with a remote provider. * Effort: 10min * Guide - - Please see our [Distributions Guide](../../../distributions/) on starting up distributions with remote providers. + - Please see our [Distributions Guide](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/index.html) on starting up distributions with remote providers. ### On-Device (iOS) Llama Stack @@ -38,4 +38,4 @@ Based on your developer needs, below are references to guides to help you get st * Developer Need: I want to add a new API provider to Llama Stack. * Effort: 3hr * Guide - - Please see our [Adding a New API Provider](./new_api_provider.md) guide for adding a new API provider. + - Please see our [Adding a New API Provider](https://llama-stack.readthedocs.io/en/latest/api_providers/new_api_provider.html) guide for adding a new API provider. diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 52e2ad514..eb1d404ce 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -52,8 +52,8 @@ If so, we suggest: - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#docker-start-the-distribution-single-node-cpu) - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#conda-llama-stack-run-single-node-cpu) - `distribution-fireworks`: - - [Docker]() - - [Conda]() + - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html) + - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/fireworks.html#conda-llama-stack-run-single-node-cpu) ## Build Your Llama Stack App