llama-stack/llama_stack/providers/remote
Kai Wu d2b7c5aeae
add quantized model ollama support (#471)
# What does this PR do?
add more quantized model support for ollama.


- [ ] Addresses issue (#issue)


## Test Plan
Tested with ollama docker that run llama3.2 3b 4bit model.
```
root@docker-desktop:/# ollama ps
NAME           ID              SIZE      PROCESSOR    UNTIL
llama3.2:3b    a80c4f17acd5    3.5 GB    100% CPU     3 minutes from now
```
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-18 18:55:23 -08:00
..
agents impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
datasetio/huggingface move hf addapter->remote (#459) 2024-11-14 22:41:19 -05:00
inference add quantized model ollama support (#471) 2024-11-18 18:55:23 -08:00
memory unregister for memory banks and remove update API (#458) 2024-11-14 17:12:11 -08:00
safety Remove the "ShieldType" concept (#430) 2024-11-12 12:37:24 -08:00
telemetry impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00