distro readmes with model serving instructions (#339)

* readme updates

* quantied compose

* dell tgi

* config update

* readme

* update model serving readmes

* update

* update

* config
This commit is contained in:
Xi Yan 2024-10-28 17:47:14 -07:00 committed by GitHub
parent a70a4706fc
commit ae671eaf7a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 136 additions and 4 deletions

View file

@ -84,3 +84,19 @@ memory:
```
3. Run `docker compose up` with the updated `run.yaml` file.
### Serving a new model
You may change the `config.model` in `run.yaml` to update the model currently being served by the distribution. Make sure you have the model checkpoint downloaded in your `~/.llama`.
```
inference:
- provider_id: meta0
provider_type: meta-reference
config:
model: Llama3.2-11B-Vision-Instruct
quantization: null
torch_seed: null
max_seq_len: 4096
max_batch_size: 1
```
Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

View file

@ -36,6 +36,15 @@ providers:
- provider_id: meta0
provider_type: meta-reference
config: {}
# Uncomment to use pgvector
# - provider_id: pgvector
# provider_type: remote::pgvector
# config:
# host: 127.0.0.1
# port: 5432
# db: postgres
# user: postgres
# password: mysecretpassword
agents:
- provider_id: meta0
provider_type: meta-reference