fix: Default to port 8321 everywhere (#1734)

As titled, moved all instances of 5001 to 8321
2025-06-28 02:53:30 +00:00 · 2025-03-20 15:50:41 -07:00 · 2025-03-20 15:50:41 -07:00 · 127bac6869
commit 127bac6869
parent 581e8ae562
56 changed files with 2352 additions and 2305 deletions
--- a/docs/zero_to_hero_guide/README.md
+++ b/docs/zero_to_hero_guide/README.md
@ -96,7 +96,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
 3. **Set the ENV variables by exporting them to the terminal**:
   ```bash
   export OLLAMA_URL="http://localhost:11434"
-   export LLAMA_STACK_PORT=5001
+   export LLAMA_STACK_PORT=8321
   export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
   export SAFETY_MODEL="meta-llama/Llama-Guard-3-1B"
   ```
@ -112,7 +112,7 @@ If you're looking for more specific topics, we have a [Zero to Hero Guide](#next
   ```
   Note: Every time you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model.

-The server will start and listen on `http://localhost:5001`.
+The server will start and listen on `http://localhost:8321`.

 ---
 ## Test with `llama-stack-client` CLI
@ -120,11 +120,11 @@ After setting up the server, open a new terminal window and configure the llama-

 1. Configure the CLI to point to the llama-stack server.
   ```bash
-   llama-stack-client configure --endpoint http://localhost:5001
+   llama-stack-client configure --endpoint http://localhost:8321
   ```
   **Expected Output:**
   ```bash
-   Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5001
+   Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
   ```
 2. Test the CLI by running inference:
   ```bash
@ -218,7 +218,7 @@ if INFERENCE_MODEL is None:
    raise ValueError("The environment variable 'INFERENCE_MODEL' is not set.")

 # Initialize the clien
-client = LlamaStackClient(base_url="http://localhost:5001")
+client = LlamaStackClient(base_url="http://localhost:8321")

 # Create a chat completion reques
 response = client.inference.chat_completion(