forked from phoenix-oss/llama-stack-mirror
* fix non-streaming api in inference server * unit test for inline inference * Added non-streaming ollama inference impl * add streaming support for ollama inference with tests * addressing comments --------- Co-authored-by: Hardik Shah <hjshah@fb.com> |
||
|---|---|---|
| .. | ||
| api | ||
| quantization | ||
| __init__.py | ||
| api_instance.py | ||
| client.py | ||
| event_logger.py | ||
| generation.py | ||
| inference.py | ||
| model_parallel.py | ||
| ollama.py | ||
| parallel_utils.py | ||
| server.py | ||