llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
ehhuang	ab2b46e528	feat: log start, complete time to Agent steps (#1116 )	2025-02-14 17:48:06 -08:00
Hardik Shah	ab210ec59e	Update README.md	2025-02-14 15:45:08 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Xi Yan	b27c41fe39	fix: disable sqlite-vec test (#1090 ) # What does this PR do? - sqlite_vec not added to all template yet, disable test for now to unblock release cut [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan <img width="846" alt="image" src="https://github.com/user-attachments/assets/fa896497-f37c-4cdf-bc62-21893afbd392" /> [//]: # (## Documentation)	2025-02-13 18:40:16 -08:00
ehhuang	225dd38e5c	test: add test for Agent.create_turn non-streaming response (#1078 ) Summary: This tests the fix to the SDK in https://github.com/meta-llama/llama-stack-client-python/pull/141 Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-13 16:17:50 -08:00
Yuan Tang	efdd60014d	test: Enable logprobs top_k tests for remote::vllm (#1080 ) top_k supported was added in https://github.com/meta-llama/llama-stack/pull/1074. The tests should be enabled as well. Verified that tests pass for remote::vllm: ``` LLAMA_STACK_BASE_URL=http://localhost:5003 pytest -v tests/client-sdk/inference/test_text_inference.py -k " test_completion_log_probs_non_streaming or test_completion_log_probs_streaming" ================================================================ test session starts ================================================================ platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 14 items / 12 deselected / 2 selected tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] =================================================== 2 passed, 12 deselected, 1 warning in 10.03s ==================================================== ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 13:44:57 -05:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Francisco Arceo	119fe8742a	feat: Adding sqlite-vec as a vectordb (#1040 ) # What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-12 10:50:03 -08:00
Xi Yan	66d7e15c93	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 ) # What does this PR do? Problem - Using script: https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085 - This hits an issue on server with `code_interpreter` not found, as we do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`. This is a general issue where model always tries to output `code_interpreter` in `ToolCall` even when we do not have `code_interpreter` available for execution. Reproduce Deeper Problem in chat-completion - Use script: https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603 1. We currently always populate `code_interpreter` in `ToolCall` in ChatCompletionResponse if the model's response begins with `<\|python_tag\|>`. See `c5f5958498/models/llama3/api/chat_format.py (L200-L213)` <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> 2. This happens even if we do not pass the `code_interpreter` as a `tools` in ChatCompletionRequest. This PR Explicitly make sure that the tools returned in `ChatCompletionResponse.tool_calls` is always a tool requested by `ChatCompletionRequest.tools`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Before <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> <img width="997" alt="image" src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110" /> After <img width="856" alt="image" src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9" /> <img width="1000" alt="image" src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef" /> Unit Test ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/ ``` <img width="1002" alt="image" src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779" /> Streaming - Chat Completion <img width="902" alt="image" src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a" /> - Agent <img width="916" alt="image" src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-11 18:31:35 -08:00
ehhuang	96c88397da	fix: agent config validation (#1053 ) Summary: Fixes AgentConfig init bug introduced with ToolConfig. Namely, the below doesn't work ``` agent_config = AgentConfig( **common_params, tool_config=ToolConfig( tool_choice="required", ), ) ``` bvecause tool_choice was defaulted to 'auto' leading to validation check failing. Test Plan: added unittests LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-11 14:48:42 -08:00
Sébastien Han	b34c1dd8ad	test: replace blocked image URLs with GitHub-hosted (#1025 ) # What does this PR do? The previous image URLs were sometimes blocked by Cloudflare, causing test failures for some users. This update replaces them with a GitHub-hosted image (`dog.png`) from the `llama-stack` repository, ensuring more reliable access during testing. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ ollama run llama3.2-vision:latest --keep-alive 2m & $ uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================ test session starts ============================================= platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 39 items / 36 deselected / 3 selected llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] PASSED ========================== 3 passed, 36 deselected, 2 warnings in 62.23s (0:01:02) ========================== ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-10 22:38:11 -05:00
Yuan Tang	b981b49bfa	test: Use JSON tool prompt format for remote::vllm provider (#1019 ) # What does this PR do? This PR removes the warnings when running tests for `remote::vllm` provider: ``` Detected the chat template content format to be 'openai'. You can set `--chat-template-content-format` to override this. ``` ## Test Plan All tests passed without the warning messages shown above. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-08 20:42:57 -08:00
Yuan Tang	413099ef6a	test: Make text-based chat completion tests run 10x faster (#1016 ) # What does this PR do? This significantly shortens the test time (about 10x faster) since most of the time is spent on outputing the tokens "there are several planets in our solar system that have...". We want to have an answer quicker, especially when testing even larger models. ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py -k "test_text_chat_completion_non_streaming or test_text_chat_completion_streaming" ================================================================== test session starts =================================================================== platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.7.0 collected 12 items / 8 deselected / 4 selected tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 25%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 75%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [100%] ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-08 11:49:46 -08:00
Yuan Tang	c97e05f75e	test: Split inference tests to text and vision (#1008 ) # What does this PR do? This PR splits the inference tests into text and vision to make testing on vLLM provider easier as mentioned in https://github.com/meta-llama/llama-stack/pull/951 since serving multiple models (e.g. Llama-3.2-11B-Vision-Instruct and Llama-3.1-8B-Instruct) on a single port using the OpenAI API is [not supported yet](https://docs.vllm.ai/en/v0.5.5/serving/faq.html) so it's a bit tricky to test both at the same time. ## Test Plan All previously passing tests related to text still pass: `LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py` All vision tests passed via `LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_vision_inference.py`. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-07 09:35:49 -08:00
ehhuang	a9950ce806	test: remove flaky agent test (#1006 ) Summary: Test Plan:	2025-02-07 09:35:38 -08:00
ehhuang	d0d568c5ba	test: fix flaky agent test (#1002 ) Summary: Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8 all tests passed	2025-02-06 20:19:38 -08:00
Hardik Shah	28a0fe57cc	fix: Update rag examples to use fresh faiss index every time (#998 ) # What does this PR do? In several examples we use the same faiss index , which means running it multiple times fills up the index with duplicates which eventually degrades the model performance on RAG as multiple copies of the same irrelevant chunks might be picked up several times. Fix is to ensure we create a new index each time. Resolves issue in this discussion - https://github.com/meta-llama/llama-stack/discussions/995 ## Test Plan Re-ran the getting started guide multiple times to see the same output Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-02-06 16:12:29 -08:00
Xi Yan	06e5af1435	update test	2025-02-06 16:11:20 -08:00
ehhuang	3922999118	sys_prompt support in Agent (#938 ) # What does this PR do? The current default system prompt for llama3.2 tends to overindex on tool calling and doesn't work well when the prompt does not require tool calling. This PR adds an option to override the default system prompt, and organizes tool-related configs into a new config object. - [ ] Addresses issue (#issue) ## Test Plan LLAMA_STACK_CONFIG=together pytest \-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-05 21:11:32 -08:00
Ihar Hrachyshka	5c8e35a9e2	docs, tests: replace datasets.rst with memory_optimizations.rst (#968 ) datasets.rst was removed from torchtune repo. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? Replace a missing 404 document with another one that exists. (Removed it from the list when memory_optimizations.rst was already pulled.) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-05 11:25:56 -05:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Ashwin Bharambe	75abe48cd0	completions can randomly blurt out something else	2025-02-01 16:01:21 -08:00
Ashwin Bharambe	1ac0d8306b	Remove test parameterization for safety tests, too much noise	2025-02-01 08:38:44 -08:00
Ashwin Bharambe	f0ba367877	Update client-sdk test config option handling	2025-01-31 15:30:07 -08:00
Hardik Shah	589a6911ba	fix rag tests (#918 ) make more deterministic	2025-01-31 15:29:29 -08:00
Matthew Farrellee	2f11c7c203	add test for user message w/ image.data content (#906 ) # What does this PR do? a test exists for image.url content, but not image.data content. this adds the former. ## Test Plan `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_inference.py` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2025-01-30 17:35:27 -08:00
Hardik Shah	97eb3eecea	Fix Agents to support code and rag simultaneously (#908 ) # What does this PR do? Fixes a bug where agents were not working when both rag and code-interpreter were added as tools. ## Test Plan Added a new client_sdk test which tests for this scenario ``` LLAMA_STACK_CONFIG=together pytest -s -v tests/client-sdk -k 'test_rag_and_code_agent' ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-30 17:09:34 -08:00
Sixian Yi	836f47a82d	log probs - mark pytests as xfail for unsupported providers + add support for together (#883 ) # What does this PR do? 1) As per @mattf's suggestion, we want to mark the pytest as xfail for providers that do not support the functionality. In this diff, we xfail the logProbs inference tests for providers who does not support log probs. ( log probs is only supported by together, fireworks and vllm) 2) Added logProbs support for together according to their developer [doc](https://docs.together.ai/docs/logprobs). ## Test Plan 1) Together & Fireworks ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` ``` tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED ========================================================================================== 15 passed, 2 warnings in 19.46s =========================================================================================== ``` ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` All tests passed 2) Ollama - LogProbs tests are marked as xfailed. ``` tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 23:41:25 -08:00
Sixian Yi	ba453c3487	Report generation minor fixes (#884 ) # What does this PR do? fixed report generation: 1) do not initialize a new client in report.py - instead get it from pytest fixture 2) Add "provider" for "safety" and "agents" section 3) add logprobs functionality in "inference" section ## Test Plan See the regenerated report ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 04:58:12 -08:00
Hardik Shah	632e60439a	Fix report generation for url endpoints (#876 ) Earlier, we would have some unknown magic to identify the path for remote endpoints when testing client_sdk/tests. Removed that and now you have to explicitly pass a path	2025-01-24 13:15:44 -08:00
Dinesh Yeduguru	a78f1fc70d	make default tool prompt format none in agent config (#863 ) # What does this PR do? Previously the tests hard coded the tool prompt format to be json which will cause it to fail when using 3.2/3.3 family of models. This change make the default to be none for the agent config and just remove the specification in the tests. ## Test Plan LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py	2025-01-23 14:44:59 -08:00
Sixian Yi	82a28f3a24	update doc for client-sdk testing (#849 ) As title ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-23 00:17:16 -08:00
Ashwin Bharambe	3d14a3d46f	Kill colons	2025-01-22 22:59:30 -08:00
Ashwin Bharambe	35c71d5bbe	Update OpenAPI generator to output discriminator (#848 ) oneOf should have discriminators so Stainless can generate better code ## Test Plan Going to generate the SDK now and check.	2025-01-22 22:15:23 -08:00
Ashwin Bharambe	f3d8864c36	Rename builtin::memory -> builtin::rag	2025-01-22 20:22:51 -08:00
Sixian Yi	597869a2aa	add distro report (#847 ) # What does this PR do? Generate distro reports to cover inference, agents, and vector_io. ## Test Plan Report generated through `/opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/ --report` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-22 19:20:49 -08:00
Ashwin Bharambe	23f1980f9c	Fix meta-reference GPU implementation for inference	2025-01-22 18:31:59 -08:00
Ashwin Bharambe	f4b0f2af8b	If initialization fails for library client, error the test	2025-01-22 18:12:15 -08:00
Ashwin Bharambe	72a1b27d01	nitpick	2025-01-22 18:09:46 -08:00
Sixian Yi	f4f47970e5	[client sdk test] add options for inference_model, safety_shield, embedding_model (#843 ) # What does this PR do? Default inference_model for testing: "meta-llama/Llama-3.1-8B-Instruct" Default vision inference_model for testing: "meta-llama/Llama-3.2-11B-Vision-Instruct" ## Test Plan `/opt/miniconda3/envs/stack/bin/pytest -s -v --inference-model=meta-llama/Llama-3.2-3B-Instruct tests/client-sdk/agents` `/opt/miniconda3/envs/stack/bin/pytest -s -v --embedding-model=all-MiniLM-L6-v2 tests/client-sdk/vector_io` `/opt/miniconda3/envs/stack/bin/pytest -s -v --safety-shield=meta-llama/Llama-Guard-3-1B tests/client-sdk/safety` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-22 15:35:19 -08:00
Ashwin Bharambe	4dd4f09fc5	Rename a test and add some comments	2025-01-22 15:28:45 -08:00
Hardik Shah	deab4f57dd	Improved report generation for providers (#844 ) # What does this PR do? Automates the model list check by querying the distro. Added support for both remote hosted and templates. ## Test Plan Run on a remote hosted distro via `LLAMA_STACK_BASE_URL="https://llamastack-preview.fireworks.ai" pytest -s -v tests/client-sdk --report` Run on a template via `LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk --report`	2025-01-22 15:27:09 -08:00
Ashwin Bharambe	07b87365ab	[inference api] modify content types so they follow a more standard structure (#841 ) Some small updates to the inference types to make them more standard Specifically: - image data is now located in a "image" subkey - similarly tool call data is located in a "tool_call" subkey The pattern followed is `dict(type="foo", foo=<...>)`	2025-01-22 12:16:18 -08:00
Ashwin Bharambe	a63a43c646	[memory refactor][6/n] Update naming and routes (#839 ) Making a few small naming changes as per feedback: - RAGToolRuntime methods are called `insert` and `query` to keep them more general - The tool names are changed to non-namespaced forms `insert_into_memory` and `query_from_memory` - The REST endpoints are more REST-ful	2025-01-22 10:39:13 -08:00
Ashwin Bharambe	63f37f9b7c	[memory refactor][4/n] Update the client-sdk test for RAG (#834 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Update client-sdk tests	2025-01-22 10:15:19 -08:00
Ashwin Bharambe	78a481bb22	[memory refactor][2/n] Update faiss and make it pass tests (#830 ) See https://github.com/meta-llama/llama-stack/issues/827 for the broader design. Second part: - updates routing table / router code - updates the faiss implementation ## Test Plan ``` pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384 ```	2025-01-22 10:02:15 -08:00
Sixian Yi	35a00d004a	bug fix for distro report generation (#836 ) # What does this PR do? Minor bug fix and simplify code - [ ] Addresses issue (#issue) ## Test Plan See the updated `llama_stack/templates/fireworks/report.md` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:44:06 -08:00
Sixian Yi	edf56884a7	add pytest option to generate a functional report for distribution (#833 ) # What does this PR do? add pytest option (`--report`) to support generating a functional report for llama stack distribution ## Test Plan ``` export LLAMA_STACK_CONFIG=./llama_stack/templates/fireworks/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/ --report ``` See a report file was generated under `./llama_stack/templates/fireworks/report.md` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:18:23 -08:00
Sixian Yi	e41873f268	[ez] structured output for /completion ollama & enable tests (#822 ) # What does this PR do? 1) enabled structured output for ollama /completion API. It seems we missed this one. 2) fixed ollama structured output test in client sdk - ollama does not support list format for structured output 3) enable structured output unit test as the result was stable on Llama-3.1-8B-Instruct and ollama, fireworks, together. ## Test Plan 1) Run `test_completion_structured_output` on /completion API with 3 providers: ollama, fireworks, together. pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output ``` (base) sxyi@sxyi-mbp llama-stack % pytest -s -v llama_stack/providers/tests/inference --config=ci_test_config.yaml /Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ================================================================================================ test session starts ================================================================================================= platform darwin -- Python 3.13.0, pytest-8.3.4, pluggy-1.5.0 -- /Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13 cachedir: .pytest_cache metadata: {'Python': '3.13.0', 'Platform': 'macOS-15.1.1-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'md': '0.2.0', 'dependency': '0.6.0', 'md-report': '0.6.3', 'anyio': '4.6.2.post1'}} rootdir: /Users/sxyi/llama-stack configfile: pyproject.toml plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, md-0.2.0, dependency-0.6.0, md-report-0.6.3, anyio-4.6.2.post1 asyncio: mode=Mode.STRICT, default_loop_scope=None collected 85 items / 82 deselected / 3 selected llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-ollama] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-fireworks] PASSED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct-together] PASSED ==================================================================================== 3 passed, 82 deselected, 8 warnings in 5.67s ==================================================================================== ``` 2) ` LLAMA_STACK_CONFIG="./llama_stack/templates/ollama/run.yaml" /opt/miniconda3/envs/stack/bin/pytest -s -v tests/client-sdk/inference` Before: ``` ________________________________________________________________________________________ test_completion_structured_output __________________________________________________________________________________________ tests/client-sdk/inference/test_inference.py:174: in test_completion_structured_output answer = AnswerFormat.model_validate_json(response.content) E pydantic_core._pydantic_core.ValidationError: 1 validation error for AnswerFormat E Invalid JSON: expected value at line 1 column 2 [type=json_invalid, input_value=' The year he retired, he...5\n\nThe best answer is', input_type=str] E For further information visit https://errors.pydantic.dev/2.10/v/json_invalid ``` After: test consistently passes ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-21 21:10:24 -08:00
Xi Yan	3e7496e835	fix vllm base64 image inference (#815 ) # What does this PR do? - fix base64 based image url for vllm - add a test case for base64 based image_url - fixes issue: https://github.com/meta-llama/llama-stack/issues/571 ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url ``` <img width="991" alt="image" src="https://github.com/user-attachments/assets/d56381ba-6777-4d23-9da9-81f73ce93566" /> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-17 17:07:28 -08:00

... 2 3 4 5 6

296 commits