forked from phoenix-oss/llama-stack-mirror
docs: Updated documentation and Sphinx configuration (#1845)
# What does this PR do? The goal of this PR is to make the pages easier to navigate by surfacing the child pages on the navbar, updating some of the copy, moving some of the files around. Some changes: 1. Clarifying Titles 2. Restructuring "Distributions" more formally in its own page to be consistent with Providers and adding some clarity to the child pages to surface them and make them easier to navigate 3. Updated sphinx config to not collapse navigation by default 4. Updated copyright year to be calculated dynamically 5. Moved `docs/source/distributions/index.md` -> `docs/source/distributions/starting_llama_stack_server.md` Another for https://github.com/meta-llama/llama-stack/issues/1815 ## Test Plan Tested locally and pages build (screen shots for example). ## Documentation ### Before:  ### After:  Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
60430da48a
commit
d495922949
10 changed files with 69 additions and 32 deletions
|
@ -1,6 +1,9 @@
|
|||
# Kubernetes Deployment Guide
|
||||
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
|
||||
|
||||
### Prerequisites
|
||||
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
|
||||
First, create a local Kubernetes cluster via Kind:
|
||||
|
||||
|
@ -33,6 +36,7 @@ data:
|
|||
token: $(HF_TOKEN)
|
||||
```
|
||||
|
||||
|
||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||
|
||||
```bash
|
||||
|
@ -127,6 +131,7 @@ EOF
|
|||
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
|
||||
```
|
||||
|
||||
### Deploying Llama Stack Server in Kubernetes
|
||||
|
||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||
|
||||
|
@ -187,6 +192,7 @@ spec:
|
|||
EOF
|
||||
```
|
||||
|
||||
### Verifying the Deployment
|
||||
We can check that the LlamaStack server has started:
|
||||
|
||||
```bash
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue