This commit is contained in:
Xi Yan 2024-11-09 09:00:18 -08:00
parent 0c14761453
commit cc61fd8083

View file

@ -244,16 +244,6 @@ $ llama stack build --template meta-reference-gpu --image-type conda
```
$ llama stack run ~/.llama/distributions/llamastack-meta-reference-gpu/meta-reference-gpu-run.yaml
```
:::
:::{tab-item} tgi
1. Install the `llama` CLI. See [CLI Reference](https://llama-stack.readthedocs.io/en/latest/cli_reference/index.html)
2. Build the `tgi` distribution
```bash
llama stack build --template tgi --image-type conda
```
Note: If you wish to use pgvector or chromadb as memory provider. You may need to update generated `run.yaml` file to point to the desired memory provider. See [Memory Providers](https://llama-stack.readthedocs.io/en/latest/api_providers/memory_api.html) for more details. Or comment out the pgvector or chromadb memory provider in `run.yaml` file to use the default inline memory provider, keeping only the following section:
```
@ -267,6 +257,17 @@ memory:
db_path: ~/.llama/runtime/faiss_store.db
```
:::
:::{tab-item} tgi
1. Install the `llama` CLI. See [CLI Reference](https://llama-stack.readthedocs.io/en/latest/cli_reference/index.html)
2. Build the `tgi` distribution
```bash
llama stack build --template tgi --image-type conda
```
3. Start a TGI server endpoint
4. Make sure in your `run.yaml` file, your `conda_env` is pointing to the conda environment and inference provider is pointing to the correct TGI server endpoint. E.g.