llama-stack/llama_stack
Kai Wu d2b7c5aeae
add quantized model ollama support (#471)
# What does this PR do?
add more quantized model support for ollama.


- [ ] Addresses issue (#issue)


## Test Plan
Tested with ollama docker that run llama3.2 3b 4bit model.
```
root@docker-desktop:/# ollama ps
NAME           ID              SIZE      PROCESSOR    UNTIL
llama3.2:3b    a80c4f17acd5    3.5 GB    100% CPU     3 minutes from now
```
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-18 18:55:23 -08:00
..
apis No more model_id warnings 2024-11-15 12:20:18 -08:00
cli Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
distribution get stack run config based on template name (#477) 2024-11-18 18:05:05 -08:00
providers add quantized model ollama support (#471) 2024-11-18 18:55:23 -08:00
scripts Add a pre-commit for distro_codegen but it does not work yet 2024-11-18 15:21:13 -08:00
templates More documentation fixes 2024-11-18 17:06:13 -08:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00