chore: use uvicorn to start llama stack server everywhere (#3625)

# What does this PR do? https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn to start llama stack server which supports spawning multiple workers. This PR enables us to launch >1 workers from `llama stack run` (will add the parameter in a follow-up PR, keeping this PR on simplifying) by removing the old way of launching stack server and consolidates launching via uvicorn.run only. ## Test Plan ran `llama stack run starter` CI
2025-12-03 09:53:45 +00:00 · 2025-10-06 05:27:40 -07:00 · 2025-10-06 05:27:40 -07:00 · 426cac078b
commit 426cac078b
parent 92219fd8fb
8 changed files with 92 additions and 149 deletions
--- a/docs/docs/concepts/apis/external.mdx
+++ b/docs/docs/concepts/apis/external.mdx
@ -357,7 +357,7 @@ server:
 8. Run the server:

 ```bash
-python -m llama_stack.core.server.server --yaml-config ~/.llama/run-byoa.yaml
+llama stack run ~/.llama/run-byoa.yaml
 ```

 9. Test the API:
--- a/docs/docs/deploying/kubernetes_deployment.mdx
+++ b/docs/docs/deploying/kubernetes_deployment.mdx
@ -170,7 +170,7 @@ spec:
      - name: llama-stack
        image: localhost/llama-stack-run-k8s:latest
        imagePullPolicy: IfNotPresent
-        command: ["python", "-m", "llama_stack.core.server.server", "--config", "/app/config.yaml"]
+        command: ["llama", "stack", "run", "/app/config.yaml"]
        ports:
          - containerPort: 5000
        volumeMounts:
--- a/docs/docs/distributions/k8s/stack-k8s.yaml.template
+++ b/docs/docs/distributions/k8s/stack-k8s.yaml.template
@ -52,7 +52,7 @@ spec:
          value: "${SAFETY_MODEL}"
        - name: TAVILY_SEARCH_API_KEY
          value: "${TAVILY_SEARCH_API_KEY}"
-        command: ["python", "-m", "llama_stack.core.server.server", "/etc/config/stack_run_config.yaml", "--port", "8321"]
+        command: ["llama", "stack", "run", "/etc/config/stack_run_config.yaml", "--port", "8321"]
        ports:
          - containerPort: 8321
        volumeMounts: