refactor structure

2025-10-16 14:57:20 +00:00 · 2024-10-29 14:04:41 -07:00 · 2024-10-29 14:04:41 -07:00 · 42104361a3
commit 42104361a3
parent 9ddc28eca7
13 changed files with 293 additions and 562 deletions
--- a/docs/source/getting_started/conda.md
+++ b/docs/source/getting_started/conda.md
@ -1,2 +0,0 @@
-# Conda
-WIP
--- a/docs/source/getting_started/developer_cookbook.md
+++ b/docs/source/getting_started/developer_cookbook.md
@ -0,0 +1,41 @@
+# Llama Stack Developer Cookbook
+
+Based on your developer needs, below are references to guides to help you get started.
+
+### Hosted Llama Stack Endpoint
+* Developer Need: I want to connect to a Llama Stack endpoint to build my applications.
+* Effort: 1min
+* Guide:
+  - Checkout our [DeepLearning course](https://www.deeplearning.ai/short-courses/introducing-multimodal-llama-3-2) on building with Llama Stack apps on pre-hosted Llama Stack endpoint.
+
+
+### Local meta-reference Llama Stack Server
+* Developer Need: I want to start a local Llama Stack server with my GPU using meta-reference implementations.
+* Effort: 5min
+* Guide:
+  - Please see our [Getting Started Guide](./getting_started.md) on starting up a meta-reference Llama Stack server.
+
+### Llama Stack Server with Remote Providers
+* Developer need: I want a Llama Stack distribution with a remote provider.
+* Effort: 10min
+* Guide
+  - Please see our [Distributions Guide](../../../distributions/) on starting up distributions with remote providers.
+
+
+### On-Device (iOS) Llama Stack
+* Developer Need: I want to use Llama Stack on-Device
+* Effort: 1.5hr
+* Guide:
+  - Please see our [iOS Llama Stack SDK](./ios_setup.md) implementations
+
+### Assemble your own Llama Stack Distribution
+* Developer Need: I want to assemble my own distribution with API providers to my likings
+* Effort: 30min
+* Guide
+  - Please see our [Building Distribution](./building_distro.md) guide for assembling your own Llama Stack distribution with your choice of API providers.
+
+### Adding a New API Provider
+* Developer Need: I want to add a new API provider to Llama Stack.
+* Effort: 3hr
+* Guide
+  - Please see our [Adding a New API Provider](./new_api_provider.md) guide for adding a new API provider.
--- a/docs/source/getting_started/distributions/index.md
+++ b/docs/source/getting_started/distributions/index.md
@ -0,0 +1,9 @@
+# Llama Stack Distribution
+
+A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
+
+```{toctree}
+:maxdepth: 2
+
+meta-reference-gpu
+```
--- a/docs/source/getting_started/distributions/meta-reference-gpu.md
+++ b/docs/source/getting_started/distributions/meta-reference-gpu.md
@ -0,0 +1,111 @@
+# Meta Reference Distribution
+
+The `llamastack/distribution-meta-reference-gpu` distribution consists of the following provider configurations.
+
+
+| **API**         	| **Inference** 	| **Agents**     	| **Memory**                                       	| **Safety**     	| **Telemetry**  	|
+|-----------------	|---------------	|----------------	|--------------------------------------------------	|----------------	|----------------	|
+| **Provider(s)** 	| meta-reference  	| meta-reference 	| meta-reference, remote::pgvector, remote::chroma 	| meta-reference 	| meta-reference 	|
+
+
+### Prerequisite
+Please make sure you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide]() here to download the models.
+
+```
+$ ls ~/.llama/checkpoints
+Llama3.1-8B           Llama3.2-11B-Vision-Instruct  Llama3.2-1B-Instruct  Llama3.2-90B-Vision-Instruct  Llama-Guard-3-8B
+Llama3.1-8B-Instruct  Llama3.2-1B                   Llama3.2-3B-Instruct  Llama-Guard-3-1B              Prompt-Guard-86M
+```
+
+### Start the Distribution (Single Node GPU)
+
+```
+$ cd distributions/meta-reference-gpu
+$ ls
+build.yaml  compose.yaml  README.md  run.yaml
+$ docker compose up
+```
+
+> [!NOTE]
+> This assumes you have access to GPU to start a local server with access to your GPU.
+
+
+> [!NOTE]
+> `~/.llama` should be the path containing downloaded weights of Llama models.
+
+
+This will download and start running a pre-built docker container. Alternatively, you may use the following commands:
+
+```
+docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v ./run.yaml:/root/my-run.yaml --gpus=all distribution-meta-reference-gpu --yaml_config /root/my-run.yaml
+```
+
+### Alternative (Build and start distribution locally via conda)
+- You may checkout the [Getting Started](../../docs/getting_started.md) for more details on building locally via conda and starting up a meta-reference distribution.
+
+### Start Distribution With pgvector/chromadb Memory Provider
+##### pgvector
+1. Start running the pgvector server:
+
+```
+docker run --network host --name mypostgres -it -p 5432:5432 -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres pgvector/pgvector:pg16
+```
+
+2. Edit the `run.yaml` file to point to the pgvector server.
+```
+memory:
+  - provider_id: pgvector
+    provider_type: remote::pgvector
+    config:
+      host: 127.0.0.1
+      port: 5432
+      db: postgres
+      user: postgres
+      password: mysecretpassword
+```
+
+> [!NOTE]
+> If you get a `RuntimeError: Vector extension is not installed.`. You will need to run `CREATE EXTENSION IF NOT EXISTS vector;` to include the vector extension. E.g.
+
+```
+docker exec -it mypostgres ./bin/psql -U postgres
+postgres=# CREATE EXTENSION IF NOT EXISTS vector;
+postgres=# SELECT extname from pg_extension;
+ extname
+```
+
+3. Run `docker compose up` with the updated `run.yaml` file.
+
+##### chromadb
+1. Start running chromadb server
+```
+docker run -it --network host --name chromadb -p 6000:6000 -v ./chroma_vdb:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest
+```
+
+2. Edit the `run.yaml` file to point to the chromadb server.
+```
+memory:
+  - provider_id: remote::chromadb
+    provider_type: remote::chromadb
+    config:
+      host: localhost
+      port: 6000
+```
+
+3. Run `docker compose up` with the updated `run.yaml` file.
+
+### Serving a new model
+You may change the `config.model` in `run.yaml` to update the model currently being served by the distribution. Make sure you have the model checkpoint downloaded in your `~/.llama`.
+```
+inference:
+  - provider_id: meta0
+    provider_type: meta-reference
+    config:
+      model: Llama3.2-11B-Vision-Instruct
+      quantization: null
+      torch_seed: null
+      max_seq_len: 4096
+      max_batch_size: 1
+```
+
+Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
--- a/docs/source/getting_started/docker.md
+++ b/docs/source/getting_started/docker.md
@ -1,3 +0,0 @@
-# Docker
-
-WIP
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@ -0,0 +1,81 @@
+# Getting Started with Llama Stack
+
+At the end of the guide, you will have learnt how to:
+- get a Llama Stack server up and running
+- get a agent (with tool-calling, vector stores) which works with the above server
+
+To see more example apps built using Llama Stack, see [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main).
+
+## Starting Up Llama Stack Server
+
+### Decide your
+There are two ways to start a Llama Stack:
+
+- **Docker**: we provide a number of pre-built Docker containers allowing you to get started instantly. If you are focused on application development, we recommend this option.
+- **Conda**: the `llama` CLI provides a simple set of commands to build, configure and run a Llama Stack server containing the exact combination of providers you wish. We have provided various templates to make getting started easier.
+
+Both of these provide options to run model inference using our reference implementations, Ollama, TGI, vLLM or even remote providers like Fireworks, Together, Bedrock, etc.
+
+### Decide Your Inference Provider
+
+Running inference of the underlying Llama model is one of the most critical requirements. Depending on what hardware you have available, you have various options:
+
+- **Do you have access to a machine with powerful GPUs?**
+If so, we suggest:
+  - `distribution-meta-reference-gpu`:
+    - [Docker]()
+    - [Conda]()
+  - `distribution-tgi`:
+    - [Docker]()
+    - [Conda]()
+
+- **Are you running on a "regular" desktop machine?**
+If so, we suggest:
+  - `distribution-ollama`:
+    - [Docker]()
+    - [Conda]()
+
+- **Do you have access to a remote inference provider like Fireworks, Togther, etc.?** If so, we suggest:
+  - `distribution-fireworks`:
+    - [Docker]()
+    - [Conda]()
+  - `distribution-together`:
+    - [Docker]()
+    - [Conda]()
+
+## Testing with client
+Once the server is setup, we can test it with a client to see the example outputs by . This will run the chat completion client and query the distribution’s `/inference/chat_completion` API. Send a POST request to the server:
+
+```
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "Llama3.1-8B-Instruct",
+    "messages": [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Write me a 2 sentence poem about the moon"}
+    ],
+    "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+
+Output:
+{'completion_message': {'role': 'assistant',
+  'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
+  'stop_reason': 'out_of_tokens',
+  'tool_calls': []},
+ 'logprobs': null}
+
+```
+
+Check out our client SDKs for connecting to Llama Stack server in your preferred language, you can choose from [python](https://github.com/meta-llama/llama-stack-client-python), [node](https://github.com/meta-llama/llama-stack-client-node), [swift](https://github.com/meta-llama/llama-stack-client-swift), and [kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) programming languages to quickly build your applications.
+
+You can find more example scripts with client SDKs to talk with the Llama Stack server in our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repo.
+
+
+```{toctree}
+:hidden:
+:maxdepth: 2
+
+developer_cookbook
+distributions/index
+```