diff --git a/distributions/fireworks/README.md b/distributions/fireworks/README.md index a753de429..5a6da5a0c 100644 --- a/distributions/fireworks/README.md +++ b/distributions/fireworks/README.md @@ -8,7 +8,7 @@ The `llamastack/distribution-` distribution consists of the following provider c | **Provider(s)** | remote::fireworks | meta-reference | meta-reference | meta-reference | meta-reference | -### Start the Distribution (Single Node CPU) +### Docker: Start the Distribution (Single Node CPU) > [!NOTE] > This assumes you have an hosted endpoint at Fireworks with API Key. @@ -30,21 +30,7 @@ inference: api_key: ``` -### (Alternative) llama stack run (Single Node CPU) - -``` -docker run --network host -it -p 5000:5000 -v ./run.yaml:/root/my-run.yaml --gpus=all llamastack/distribution-fireworks --yaml_config /root/my-run.yaml -``` - -Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g. -``` -inference: - - provider_id: fireworks - provider_type: remote::fireworks - config: - url: https://api.fireworks.ai/inference - api_key: -``` +### Conda: llama stack run (Single Node CPU) **Via Conda** @@ -54,6 +40,7 @@ llama stack build --template fireworks --image-type conda llama stack run ./run.yaml ``` + ### Model Serving Use `llama-stack-client models list` to chekc the available models served by Fireworks. diff --git a/docs/source/getting_started/distributions/fireworks.md b/docs/source/getting_started/distributions/fireworks.md new file mode 100644 index 000000000..5a6da5a0c --- /dev/null +++ b/docs/source/getting_started/distributions/fireworks.md @@ -0,0 +1,66 @@ +# Fireworks Distribution + +The `llamastack/distribution-` distribution consists of the following provider configurations. + + +| **API** | **Inference** | **Agents** | **Memory** | **Safety** | **Telemetry** | +|----------------- |--------------- |---------------- |-------------------------------------------------- |---------------- |---------------- | +| **Provider(s)** | remote::fireworks | meta-reference | meta-reference | meta-reference | meta-reference | + + +### Docker: Start the Distribution (Single Node CPU) + +> [!NOTE] +> This assumes you have an hosted endpoint at Fireworks with API Key. + +``` +$ cd distributions/fireworks +$ ls +compose.yaml run.yaml +$ docker compose up +``` + +Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g. +``` +inference: + - provider_id: fireworks + provider_type: remote::fireworks + config: + url: https://api.fireworks.ai/inferenc + api_key: +``` + +### Conda: llama stack run (Single Node CPU) + +**Via Conda** + +```bash +llama stack build --template fireworks --image-type conda +# -- modify run.yaml to a valid Fireworks server endpoint +llama stack run ./run.yaml +``` + + +### Model Serving + +Use `llama-stack-client models list` to chekc the available models served by Fireworks. +``` +$ llama-stack-client models list ++------------------------------+------------------------------+---------------+------------+ +| identifier | llama_model | provider_id | metadata | ++==============================+==============================+===============+============+ +| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.1-70B-Instruct | Llama3.1-70B-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.1-405B-Instruct | Llama3.1-405B-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.2-1B-Instruct | Llama3.2-1B-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.2-3B-Instruct | Llama3.2-3B-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | fireworks0 | {} | ++------------------------------+------------------------------+---------------+------------+ +``` diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index 2872289eb..3d914eb46 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -1,6 +1,7 @@ # Getting Started with Llama Stack ```{toctree} +:hidden: :maxdepth: 2 distributions/index @@ -34,23 +35,23 @@ Running inference of the underlying Llama model is one of the most critical requ - **Do you have access to a machine with powerful GPUs?** If so, we suggest: - `distribution-meta-reference-gpu`: - - [Docker]() - - [Conda]() + - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution) + - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution) - `distribution-tgi`: - - [Docker]() - - [Conda]() + - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#docker-start-the-distribution-single-node-gpu) + - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#conda-tgi-server-llama-stack-run) - **Are you running on a "regular" desktop machine?** If so, we suggest: - `distribution-ollama`: - - [Docker]() - - [Conda]() + - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#docker-start-a-distribution-single-node-gpu) + - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#conda-ollama-run-llama-stack-run) - **Do you have access to a remote inference provider like Fireworks, Togther, etc.?** If so, we suggest: - - `distribution-fireworks`: - - [Docker]() - - [Conda]() - `distribution-together`: + - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#docker-start-the-distribution-single-node-cpu) + - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#conda-llama-stack-run-single-node-cpu) + - `distribution-fireworks`: - [Docker]() - [Conda]()