llama-stack-mirror/docs/source/distributions
ehhuang e980436a2e
chore: introduce write queue for inference_store (#3383)
# What does this PR do?
Adds a write worker queue for writes to inference store. This avoids
overwhelming request processing with slow inference writes.

## Test Plan

Benchmark:
```
cd /docs/source/distributions/k8s-benchmark
# start mock server
python openai-mock-server.py --port 8000
# start stack server
LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml
# run benchmark script
uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct
```
## RPS from 21 -> 57
2025-09-10 11:57:42 -07:00
..
eks fix: update k8s templates (#2645) 2025-07-08 15:57:01 -07:00
k8s fix(k8s): unwedge run.yaml to add files 2025-09-09 23:02:26 -07:00
k8s-benchmark chore: introduce write queue for inference_store (#3383) 2025-09-10 11:57:42 -07:00
ondevice_distro chore: remove absolute paths (#3263) 2025-08-27 12:04:25 -07:00
remote_hosted_distro refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00
self_hosted_distro docs: add VLM NIM example (#3277) 2025-08-29 16:23:52 -07:00
building_distro.md fix(docs): update llama stack build CLI doc (#3050) 2025-08-06 09:32:09 -07:00
configuration.md feat: Add Kubernetes auth provider to use SelfSubjectReview and kubernetes api server (#2559) 2025-09-08 11:25:10 +02:00
customizing_run_yaml.md docs: clarify run.yaml files are starting points for customization (#2746) 2025-07-14 09:53:13 -07:00
importing_as_library.md chore: remove absolute paths (#3263) 2025-08-27 12:04:25 -07:00
index.md docs: part 1 - fix warnings in documentation generation (#2861) 2025-07-30 10:50:10 -07:00
list_of_distributions.md fix: Restore the nvidia distro (#2639) 2025-07-07 15:50:05 -07:00
starting_llama_stack_server.md refactor: remove Conda support from Llama Stack (#2969) 2025-08-02 15:52:59 -07:00