llama-stack/llama_stack/providers/remote/inference
Kai Wu d2b7c5aeae
add quantized model ollama support (#471)
# What does this PR do?
add more quantized model support for ollama.


- [ ] Addresses issue (#issue)


## Test Plan
Tested with ollama docker that run llama3.2 3b 4bit model.
```
root@docker-desktop:/# ollama ps
NAME           ID              SIZE      PROCESSOR    UNTIL
llama3.2:3b    a80c4f17acd5    3.5 GB    100% CPU     3 minutes from now
```
## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-18 18:55:23 -08:00
..
bedrock Inference to use provider resource id to register and validate (#428) 2024-11-12 20:02:00 -08:00
databricks Inference to use provider resource id to register and validate (#428) 2024-11-12 20:02:00 -08:00
fireworks Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
ollama add quantized model ollama support (#471) 2024-11-18 18:55:23 -08:00
sample migrate model to Resource and new registration signature (#410) 2024-11-08 16:12:57 -08:00
tgi Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
together Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
vllm Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00