llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Sébastien Han	7cf1e24c4e	feat(logging): implement category-based logging (#1362 ) # What does this PR do? This commit introduces a new logging system that allows loggers to be assigned a category while retaining the logger name based on the file name. The log format includes both the logger name and the category, producing output like: ``` INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by tavily-search ``` Key features include: - Category-based logging: Loggers can be assigned a category (e.g., "core", "server") when programming. The logger can be loaded like this: `logger = get_logger(name=__name__, category="server")` - Environment variable control: Log levels can be configured per-category using the `LLAMA_STACK_LOGGING` environment variable. For example: `LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for the "server" and "core" categories. - `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all categories and third-party libraries. This provides fine-grained control over logging levels while maintaining a clean and informative log format. The formatter uses the rich library which provides nice colors better stack traces like so: ``` ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146> exception=UnboundLocalError("local variable 'loop' referenced before assignment")> ╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮ │ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │ │ │ │ 175 │ │ except asyncio.CancelledError: │ │ 176 │ │ │ pass │ │ 177 │ │ finally: │ │ ❱ 178 │ │ │ loop.stop() │ │ 179 │ │ │ 180 │ loop = asyncio.get_running_loop() │ │ 181 │ loop.create_task(shutdown()) │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ UnboundLocalError: local variable 'loop' referenced before assignment ``` Co-authored-by: Ashwin Bharambe <@ashwinb> Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration: INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis: - agents ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-07 11:34:30 -08:00
Ashwin Bharambe	290cc843fc	test: first unit test for resolver (#1475 ) Starting to create unit tests to cover critical (and mostly undocumented) provider resolution and routing logic. ## Test Plan Unit tests	2025-03-07 10:20:51 -08:00
Ben Browning	9c4074ed49	fix: Gracefully handle no choices in remote vLLM response (#1424 ) # What does this PR do? This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. ## Test Plan As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v tests/integration/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" \ --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-03-05 15:07:54 -05:00
Ben Browning	00570fde31	chore: Get sqlite_vec and vector_store unit tests passing (#1413 )	2025-03-05 13:20:13 -05:00
Ashwin Bharambe	dd0db8038b	refactor(test): unify vector_io tests and make them configurable (#1398 ) ## Test Plan `LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2 --inference-model='' --vision-inference-model=''` ``` test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED ``` Same thing with: - LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss - LLAMA_STACK_CONFIG=fireworks (Note that ergonomics will soon be improved re: cmd-line options and env variables)	2025-03-04 13:37:45 -08:00
Ashwin Bharambe	4ca58eb987	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00

1 2

56 commits