mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
* fix non-streaming api in inference server * unit test for inline inference * Added non-streaming ollama inference impl * add streaming support for ollama inference with tests * addressing comments --------- Co-authored-by: Hardik Shah <hjshah@fb.com> |
||
|---|---|---|
| .. | ||
| api | ||
| quantization | ||
| __init__.py | ||
| api_instance.py | ||
| client.py | ||
| event_logger.py | ||
| generation.py | ||
| inference.py | ||
| model_parallel.py | ||
| ollama.py | ||
| parallel_utils.py | ||
| server.py | ||