llama-stack/llama_toolchain/inference
Hardik Shah 156bfa0e15
Added Ollama as an inference impl (#20)
* fix non-streaming api in inference server

* unit test for inline inference

* Added non-streaming ollama inference impl

* add streaming support for ollama inference with tests

* addressing comments

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2024-07-31 22:08:37 -07:00
..
api Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
quantization Initial commit 2024-07-23 08:32:33 -07:00
__init__.py Initial commit 2024-07-23 08:32:33 -07:00
api_instance.py Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
client.py Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
event_logger.py Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
generation.py Initial commit 2024-07-23 08:32:33 -07:00
inference.py Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
model_parallel.py Initial commit 2024-07-23 08:32:33 -07:00
ollama.py Added Ollama as an inference impl (#20) 2024-07-31 22:08:37 -07:00
parallel_utils.py Initial commit 2024-07-23 08:32:33 -07:00
server.py Initial commit 2024-07-23 08:32:33 -07:00