From 39e99b39fe60b0064f91cacd52911b9863da54c9 Mon Sep 17 00:00:00 2001 From: Henry Tai Date: Wed, 20 Nov 2024 02:32:19 +0800 Subject: [PATCH] update quick start to have the working instruction (#467) # What does this PR do? Fix the instruction in quickstart readme so the new developers/users can run it without issues. ## Test Plan None ## Sources Please link relevant resources if necessary. ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Co-authored-by: Henry Tai --- docs/zero_to_hero_guide/quickstart.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/docs/zero_to_hero_guide/quickstart.md b/docs/zero_to_hero_guide/quickstart.md index 54a01e219..df8e9abc4 100644 --- a/docs/zero_to_hero_guide/quickstart.md +++ b/docs/zero_to_hero_guide/quickstart.md @@ -22,14 +22,22 @@ If you're looking for more specific topics like tool calling or agent setup, we - Download and unzip `Ollama-darwin.zip`. - Run the `Ollama` application. -2. **Download the Ollama CLI**: +1. **Download the Ollama CLI**: - Ensure you have the `ollama` command line tool by downloading and installing it from the same website. -3. **Verify Installation**: +1. **Start ollama server**: + - Open the terminal and run: + ``` + ollama serve + ``` + +1. **Run the model**: - Open the terminal and run: ```bash - ollama run llama3.2:1b + ollama run llama3.2:3b-instruct-fp16 ``` + **Note**: The supported models for llama stack for now is listed in [here](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/ollama/ollama.py#L43) + --- @@ -84,6 +92,8 @@ If you're looking for more specific topics like tool calling or agent setup, we ```bash llama stack run /path/to/your/distro/llamastack-ollama/ollama-run.yaml --port 5050 ``` + Note: + 1. Everytime you run a new model with `ollama run`, you will need to restart the llama stack. Otherwise it won't see the new model The server will start and listen on `http://localhost:5050`. @@ -97,7 +107,7 @@ After setting up the server, open a new terminal window and verify it's working curl http://localhost:5050/inference/chat_completion \ -H "Content-Type: application/json" \ -d '{ - "model": "llama3.2:1b", + "model": "Llama3.2-3B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2-sentence poem about the moon"} @@ -106,6 +116,8 @@ curl http://localhost:5050/inference/chat_completion \ }' ``` +You can check the available models with the command `llama-stack-client models list`. + **Expected Output:** ```json {