diff --git a/distributions/fireworks/README.md b/distributions/fireworks/README.md index fcf74d809..e3987e1e2 100644 --- a/distributions/fireworks/README.md +++ b/distributions/fireworks/README.md @@ -49,7 +49,7 @@ inference: **Via Conda** ```bash -llama stack build --config ./build.yaml +llama stack build --template fireworks --image-type conda # -- modify run.yaml to a valid Fireworks server endpoint llama stack run ./run.yaml ``` diff --git a/distributions/ollama/README.md b/distributions/ollama/README.md index d59c3f9e1..70bc27a85 100644 --- a/distributions/ollama/README.md +++ b/distributions/ollama/README.md @@ -86,6 +86,6 @@ inference: **Via Conda** ``` -llama stack build --config ./build.yaml +llama stack build --template ollama --image-type conda llama stack run ./gpu/run.yaml ``` diff --git a/distributions/tgi/README.md b/distributions/tgi/README.md index 86d2636d7..886252ecd 100644 --- a/distributions/tgi/README.md +++ b/distributions/tgi/README.md @@ -88,7 +88,7 @@ inference: **Via Conda** ```bash -llama stack build --config ./build.yaml +llama stack build --template tgi --image-type conda # -- start a TGI server endpoint llama stack run ./gpu/run.yaml ``` diff --git a/distributions/together/README.md b/distributions/together/README.md index 227c7a450..b964673e0 100644 --- a/distributions/together/README.md +++ b/distributions/together/README.md @@ -62,7 +62,7 @@ memory: **Via Conda** ```bash -llama stack build --config ./build.yaml +llama stack build --template together --image-type conda # -- modify run.yaml to a valid Together server endpoint llama stack run ./run.yaml ``` diff --git a/docs/cli_reference.md b/docs/cli_reference.md index f0f67192f..ddc8e6b3e 100644 --- a/docs/cli_reference.md +++ b/docs/cli_reference.md @@ -279,11 +279,11 @@ llama stack build --list-templates You may then pick a template to build your distribution with providers fitted to your liking. ``` -llama stack build --template local-tgi --name my-tgi-stack +llama stack build --template local-tgi --name my-tgi-stack --image-type conda ``` ``` -$ llama stack build --template local-tgi --name my-tgi-stack +$ llama stack build --template local-tgi --name my-tgi-stack --image-type conda ... ... Build spec configuration saved at ~/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml @@ -293,10 +293,10 @@ You may now run `llama stack configure my-tgi-stack` or `llama stack configure ~ #### Building from config file - In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. -- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. +- The config file will be of contents like the ones in `llama_stack/templates/`. ``` -$ cat llama_stack/distribution/templates/local-ollama-build.yaml +$ cat build.yaml name: local-ollama distribution_spec: @@ -311,7 +311,7 @@ image_type: conda ``` ``` -llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml +llama stack build --config build.yaml ``` #### How to build distribution with Docker image diff --git a/docs/getting_started.md b/docs/getting_started.md index 4f06f5d47..2a90301d0 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -35,11 +35,7 @@ You have two ways to start up Llama stack server: 1. **Starting up server via docker**: - We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. - - [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general) - - This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints. - - [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general) - - This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU. + We provide pre-built Docker image of Llama Stack distribution, which can be found in the following links in the [distributions](../distributions/) folder. > [!NOTE] > For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.