# What does this PR do?
The `nvidia` distro was previously collapsed into the `starter` distro.
However, the `nvidia` distro was setup specifically to use NVIDIA NeMo
microservices as providers for all APIs and not just inference, which
means it was doing quite a bit more than what the `starter` distro
covers today.
We should work with our friends at NVIDIA to determine the best place to
maintain this distro long-term, but for now this restores the `nvidia`
distro and its docs back to where they were so that things continue to
work for their users.
## Test Plan
I ensure the `nvidia` distro could build, and run at least to the point
of complaining that I didn't provide the necessary API keys.
```
uv run llama stack build --template nvidia --image-type venv
uv run llama stack run llama_stack/templates/nvidia/run.yaml
```
I also made sure the docs website built and looks reasonable, with the
`nvidia` distro docs at the same URL it was previously (because it has
incoming links from official NVIDIA NeMo docs, among other places).
```
uv run --group docs sphinx-autobuild docs/source docs/build/html --write-all
```
Signed-off-by: Ben Browning <bbrownin@redhat.com>
# What does this PR do?
* Removes a bunch of distros
* Removed distros were added into the "starter" distribution
* Doc for "starter" has been added
* Partially reverts https://github.com/meta-llama/llama-stack/pull/2482
since inference providers are disabled by default and can be turned on
manually via env variable.
* Disables safety in starter distro
Closes: https://github.com/meta-llama/llama-stack/issues/2502.
~Needs: https://github.com/meta-llama/llama-stack/pull/2482 for Ollama
to work properly in the CI.~
TODO:
- [ ] We can only update `install.sh` when we get a new release.
- [x] Update providers documentation
- [ ] Update notebooks to reference starter instead of ollama
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
adds nvidia template for creating a distribution using inference adapter
for NVIDIA NIMs.
## Test Plan
Please describe:
Build llama stack distribution for nvidia using the template, docker and
conda.
```bash
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client configure --endpoint http://localhost:5000
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5000
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Llama3.1-8B-Instruct │ nvidia │ meta/llama-3.1-8b-instruct │ {} │
│ meta-llama/Llama-3.2-3B-Instruct │ nvidia │ meta/llama-3.2-3b-instruct │ {} │
└──────────────────────────────────┴─────────────┴────────────────────────────┴──────────┘
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client inference chat-completion --message "hello, write me a 2 sentence poem"
ChatCompletionResponse(
completion_message=CompletionMessage(
content='Here is a 2 sentence poem:\n\nThe sun sets slow and paints the sky, \nA gentle hue of pink that makes me sigh.',
role='assistant',
stop_reason='end_of_turn',
tool_calls=[]
),
logprobs=None
)
```
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>