fireworks

2025-10-14 22:33:48 +00:00 · 2024-10-29 15:26:52 -07:00 · 2024-10-29 15:26:52 -07:00 · 7d953d5ee5
commit 7d953d5ee5
parent 39872ca4b4
3 changed files with 79 additions and 25 deletions
--- a/distributions/fireworks/README.md
+++ b/distributions/fireworks/README.md
@ -8,7 +8,7 @@ The `llamastack/distribution-` distribution consists of the following provider c
 | **Provider(s)** 	| remote::fireworks   	| meta-reference 	| meta-reference 	| meta-reference 	| meta-reference 	|


-### Start the Distribution (Single Node CPU)
+### Docker: Start the Distribution (Single Node CPU)

 > [!NOTE]
 > This assumes you have an hosted endpoint at Fireworks with API Key.
@ -30,21 +30,7 @@ inference:
      api_key: <optional api key>
 ```

-### (Alternative) llama stack run (Single Node CPU)
-
-```
-docker run --network host -it -p 5000:5000 -v ./run.yaml:/root/my-run.yaml --gpus=all llamastack/distribution-fireworks --yaml_config /root/my-run.yaml
-```
-
-Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g.
-```
-inference:
-  - provider_id: fireworks
-    provider_type: remote::fireworks
-    config:
-      url: https://api.fireworks.ai/inference
-      api_key: <enter your api key>
-```
+### Conda: llama stack run (Single Node CPU)

 **Via Conda**

@ -54,6 +40,7 @@ llama stack build --template fireworks --image-type conda
 llama stack run ./run.yaml
 ```

+
 ### Model Serving

 Use `llama-stack-client models list` to chekc the available models served by Fireworks.
--- a/docs/source/getting_started/distributions/fireworks.md
+++ b/docs/source/getting_started/distributions/fireworks.md
@ -0,0 +1,66 @@
+# Fireworks Distribution
+
+The `llamastack/distribution-` distribution consists of the following provider configurations.
+
+
+| **API**         	| **Inference** 	| **Agents**     	| **Memory**                                       	| **Safety**     	| **Telemetry**  	|
+|-----------------	|---------------	|----------------	|--------------------------------------------------	|----------------	|----------------	|
+| **Provider(s)** 	| remote::fireworks   	| meta-reference 	| meta-reference 	| meta-reference 	| meta-reference 	|
+
+
+### Docker: Start the Distribution (Single Node CPU)
+
+> [!NOTE]
+> This assumes you have an hosted endpoint at Fireworks with API Key.
+
+```
+$ cd distributions/fireworks
+$ ls
+compose.yaml  run.yaml
+$ docker compose up
+```
+
+Make sure in you `run.yaml` file, you inference provider is pointing to the correct Fireworks URL server endpoint. E.g.
+```
+inference:
+  - provider_id: fireworks
+    provider_type: remote::fireworks
+    config:
+      url: https://api.fireworks.ai/inferenc
+      api_key: <optional api key>
+```
+
+### Conda: llama stack run (Single Node CPU)
+
+**Via Conda**
+
+```bash
+llama stack build --template fireworks --image-type conda
+# -- modify run.yaml to a valid Fireworks server endpoint
+llama stack run ./run.yaml
+```
+
+
+### Model Serving
+
+Use `llama-stack-client models list` to chekc the available models served by Fireworks.
+```
+$ llama-stack-client models list
+------------------------------+------------------------------+---------------+------------+
+| identifier                   | llama_model                  | provider_id   | metadata   |
+==============================+==============================+===============+============+
+| Llama3.1-8B-Instruct         | Llama3.1-8B-Instruct         | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.1-70B-Instruct        | Llama3.1-70B-Instruct        | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.1-405B-Instruct       | Llama3.1-405B-Instruct       | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.2-1B-Instruct         | Llama3.2-1B-Instruct         | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.2-3B-Instruct         | Llama3.2-3B-Instruct         | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.2-11B-Vision-Instruct | Llama3.2-11B-Vision-Instruct | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+| Llama3.2-90B-Vision-Instruct | Llama3.2-90B-Vision-Instruct | fireworks0    | {}         |
+------------------------------+------------------------------+---------------+------------+
+```
--- a/docs/source/getting_started/index.md
+++ b/docs/source/getting_started/index.md
@ -1,6 +1,7 @@
 # Getting Started with Llama Stack

 ```{toctree}
+:hidden:
 :maxdepth: 2

 distributions/index
@ -34,23 +35,23 @@ Running inference of the underlying Llama model is one of the most critical requ
 - **Do you have access to a machine with powerful GPUs?**
 If so, we suggest:
  - `distribution-meta-reference-gpu`:
-    - [Docker]()
-    - [Conda]()
+    - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution)
+    - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/meta-reference-gpu.html#docker-start-the-distribution)
  - `distribution-tgi`:
-    - [Docker]()
-    - [Conda]()
+    - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#docker-start-the-distribution-single-node-gpu)
+    - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/tgi.html#conda-tgi-server-llama-stack-run)

 - **Are you running on a "regular" desktop machine?**
 If so, we suggest:
  - `distribution-ollama`:
-    - [Docker]()
-    - [Conda]()
+    - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#docker-start-a-distribution-single-node-gpu)
+    - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/ollama.html#conda-ollama-run-llama-stack-run)

 - **Do you have access to a remote inference provider like Fireworks, Togther, etc.?** If so, we suggest:
-  - `distribution-fireworks`:
-    - [Docker]()
-    - [Conda]()
  - `distribution-together`:
+    - [Docker](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#docker-start-the-distribution-single-node-cpu)
+    - [Conda](https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/together.html#conda-llama-stack-run-single-node-cpu)
+  - `distribution-fireworks`:
    - [Docker]()
    - [Conda]()