mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-12 04:50:39 +00:00
# What does this PR do? ## Test Plan Ran a stress test on chat completion endpoint locally: For 10 concurrent users over 3 minutes: Before: <img width="1440" height="201" alt="image" src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6" /> After: <img width="1434" height="204" alt="image" src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866" /> (Will send scripts in a future PR.) |
||
---|---|---|
.. | ||
bedrock | ||
common | ||
datasetio | ||
inference | ||
kvstore | ||
memory | ||
responses | ||
scoring | ||
sqlstore | ||
telemetry | ||
tools | ||
vector_io | ||
__init__.py | ||
pagination.py | ||
scheduler.py |