format

2025-12-08 03:00:56 +00:00 · 2024-10-29 16:33:47 -07:00 · 2024-10-29 16:33:47 -07:00 · acefea7821
commit acefea7821
parent 980f2ae039
4 changed files with 27 additions and 30 deletions
--- a/docs/source/distribution_dev/building_distro.md
+++ b/docs/source/distribution_dev/building_distro.md
@ -1,4 +1,6 @@
-# Building a Llama Stack Distribution
+# Developer Guide: Assemble a Llama Stack Distribution
+
+> NOTE: This doc is out-of-date.

 This guide will walk you through the steps to get started with building a Llama Stack distributiom from scratch with your choice of API providers. Please see the [Getting Started Guide](./getting_started.md) if you just want the basic steps to start a Llama Stack distribution.

@ -237,27 +239,3 @@ INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
 > You might need to use the flag `--disable-ipv6` to  Disable IPv6 support

 This server is running a Llama model locally.
-
-## Step 4. Test with Client
-Once the server is setup, we can test it with a client to see the example outputs.
-
-```
-curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
-    "model": "Llama3.1-8B-Instruct",
-    "messages": [
-        {"role": "system", "content": "You are a helpful assistant."},
-        {"role": "user", "content": "Write me a 2 sentence poem about the moon"}
-    ],
-    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
-}'
-
-Output:
-{'completion_message': {'role': 'assistant',
-  'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
-  'stop_reason': 'out_of_tokens',
-  'tool_calls': []},
- 'logprobs': null}
-
-```
--- a/docs/source/distribution_dev/index.md
+++ b/docs/source/distribution_dev/index.md
@ -0,0 +1,19 @@
+# Llama Stack Developer Guide
+
+## Key Concepts
+
+### API Provider
+A Provider is what makes the API real -- they provide the actual implementation backing the API.
+
+As an example, for Inference, we could have the implementation be backed by open source libraries like `[ torch | vLLM | TensorRT ]` as possible options.
+
+A provider can also be just a pointer to a remote REST service -- for example, cloud providers or dedicated inference providers could serve these APIs.
+
+### Distribution
+A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
+
+```{toctree}
+:maxdepth: 1
+
+building_distro
+```