llama-stack/llama_stack/providers/remote/inference/ollama
Kai Wu d2b7c5aeae
add quantized model ollama support (#471)
# What does this PR do?
add more quantized model support for ollama.


- [ ] Addresses issue (#issue)


## Test Plan
Tested with ollama docker that run llama3.2 3b 4bit model.
```
root@docker-desktop:/# ollama ps
NAME           ID              SIZE      PROCESSOR    UNTIL
llama3.2:3b    a80c4f17acd5    3.5 GB    100% CPU     3 minutes from now
```
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-18 18:55:23 -08:00
..
__init__.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
config.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
ollama.py add quantized model ollama support (#471) 2024-11-18 18:55:23 -08:00