mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-29 19:34:19 +00:00
# What does this PR do? This is to avoid errors like the following when running inference integration tests: ``` ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url ``` It's also good to have a default, which is consistent with vLLM API server. ## Test Plan Integration tests can run without the error above. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> |
||
---|---|---|
.. | ||
bedrock | ||
cerebras | ||
ci-tests | ||
dell | ||
dev | ||
experimental-post-training | ||
fireworks | ||
groq | ||
hf-endpoint | ||
hf-serverless | ||
meta-reference-gpu | ||
meta-reference-quantized-gpu | ||
nvidia | ||
ollama | ||
open-benchmark | ||
passthrough | ||
remote-vllm | ||
sambanova | ||
tgi | ||
together | ||
vllm-gpu | ||
__init__.py | ||
template.py |