mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-02 20:40:36 +00:00
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference. |
||
---|---|---|
.. | ||
quantization | ||
__init__.py | ||
config.py | ||
generation.py | ||
inference.py | ||
model_parallel.py | ||
parallel_utils.py |