llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 02:17:31 +00:00

Author	SHA1	Message	Date
Yuan Tang	8ff27b58fa	chore: Consistent naming for VectorIO providers (#1023 ) # What does this PR do? This changes all VectorIO providers classes to follow the pattern `<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All API endpoints for VectorIOs are currently consistent with `/vector-io`. Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. ## Test Plan I don't have a way to test all providers. This is a simple renaming so things should work as expected. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 13:15:49 -05:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Yuan Tang	f9ca441974	chore: Link to Groq docs in the warning message for preview model (#1060 ) This should be `llama-3.2-3b` instead of `llama-3.2-3b-instruct`.	2025-02-13 12:14:57 -05:00
Sébastien Han	418645696a	fix: improve signal handling and update dependencies (#1044 ) # What does this PR do? This commit enhances the signal handling mechanism in the server by improving the `handle_signal` (previously handle_sigint) function. It now properly retrieves the signal name, ensuring clearer logging when a termination signal is received. Additionally, it cancels all running tasks and waits for their completion before stopping the event loop, allowing for a more graceful shutdown. Support for handling SIGTERM has also been added alongside SIGINT. Before the changes, handle_sigint used asyncio.run(run_shutdown()). However, asyncio.run() is meant to start a new event loop, and calling it inside an existing one (like when running Uvicorn) raises an error. The fix replaces asyncio.run(run_shutdown()) with an async function scheduled on the existing loop using loop.create_task(shutdown()). This ensures that the shutdown coroutine runs within the current event loop instead of trying to create a new one. Furthermore, this commit updates the project dependencies. `fastapi` and `uvicorn` have been added to the development dependencies in `pyproject.toml` and `uv.lock`, ensuring that the necessary packages are available for development and execution. Closes: https://github.com/meta-llama/llama-stack/issues/1043 Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run a server and send SIGINT: ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '******' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss. INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss. INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available. INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2... INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API safety POST /v1/safety/run-shield Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [65372] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ^CINFO: Shutting down INFO: Finished server process [65372] Received signal SIGINT (2). Exiting gracefully... INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 08:07:59 -08:00
Ben Browning	dd1a366347	fix: logprobs support in remote-vllm provider (#1074 ) # What does this PR do? The remote-vllm provider was not passing logprobs options from CompletionRequest or ChatCompletionRequests through to the OpenAI client parameters. I manually verified this, as well as observed this provider failing `TestInference::test_completion_logprobs`. This was filed as issue #1073. This fixes that by passing the `logprobs.top_k` value through to the parameters we pass into the OpenAI client. Additionally, this fixes a bug in `test_text_inference.py` where it mistakenly assumed chunk.delta were of type `ContentDelta` for completion requests. The deltas are of type `ContentDelta` for chat completion requests, but for basic completion requests the deltas are of type string. This test was likely failing for other providers that did properly support logprobs because of this latter issue in the test, which was hit while fixing the above issue with the remote-vllm provider. (Closes #1073) ## Test Plan First, you need a vllm running. I ran one locally like this: ``` vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json ``` Next, run test_text_inference.py against this vllm using the remote vllm provider like this: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` Before my change, the test failed with this error: ``` llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs assert 1 <= len(response.logprobs) <= 5 E TypeError: object of type 'NoneType' has no len() ``` After my change, the test passes. [//]: # (## Documentation) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-13 11:00:00 -05:00
Francisco Arceo	119fe8742a	feat: Adding sqlite-vec as a vectordb (#1040 ) # What does this PR do? This PR adds `sqlite_vec` as an additional inline vectordb. Tested with `ollama` by adding the `vector_io` object in `./llama_stack/templates/ollama/run.yaml` : ```yaml vector_io: - provider_id: sqlite_vec provider_type: inline::sqlite_vec config: kvstore: type: sqlite namespace: null db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/ollama}/sqlite_vec.db ``` I also updated the `./tests/client-sdk/vector_io/test_vector_io.py` test file with: ```python INLINE_VECTOR_DB_PROVIDERS = ["faiss", "sqlite_vec"] ``` And parameterized the relevant tests. [//]: # (If resolving an issue, uncomment and update the line below) # Closes https://github.com/meta-llama/llama-stack/issues/1005 ## Test Plan I ran the tests with: ```bash INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py ``` Which outputs: ```python ... PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED ``` In addition, I ran the `rag_with_vector_db.py` [example](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py) using the script below with `uv run rag_example.py`. <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python #!/usr/bin/env python3 import os import uuid from termcolor import cprint # Set environment variables os.environ['INFERENCE_MODEL'] = 'llama3.2:3b-instruct-fp16' os.environ['LLAMA_STACK_CONFIG'] = 'ollama' # Import libraries after setting environment variables from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types import Document def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() model_id = 'llama3.2:3b-instruct-fp16' # Define the list of document URLs and create Document objects urls = [ "chat.rst", "llama3.rst", "memory_optimizations.rst", "lora_finetune.rst", ] documents = [ Document( document_id=f"num-{i}", content=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}", mime_type="text/plain", metadata={}, ) for i, url in enumerate(urls) ] # (Optional) Use the documents as needed with your client here client.vector_dbs.register( provider_id='sqlite_vec', vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) # Create agent configuration agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", enable_session_persistence=False, toolgroups=[ { "name": "builtin::rag", "args": { "vector_db_ids": [vector_db_id], } } ], ) # Instantiate the Agent agent = Agent(client, agent_config) # List of user prompts user_prompts = [ "What are the top 5 topics that were explained in the documentation? Only list succinct bullet points.", "Was anything related to 'Llama3' discussed, if so what?", "Tell me how to use LoRA", "What about Quantization?", ] # Create a session for the agent session_id = agent.create_session("test-session") # Process each prompt and display the output for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) # Log and print events from the response for log in EventLogger().log(response): log.print() if __name__ == "__main__": main() ``` </details> Which outputs a large summary of RAG generation. # Documentation Will handle documentation updates in follow-up PR. # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-12 10:50:03 -08:00
Yuan Tang	5e97dd9919	feat: Support tool calling for streaming chat completion in remote vLLM provider (#1063 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Closes https://github.com/meta-llama/llama-stack/issues/1046. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 14 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 7%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 14%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 21%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 28%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 35%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 42%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 57%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 64%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 71%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 78%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 85%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] PASSED [ 92%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] PASSED [100%] =============================================== 12 passed, 2 xfailed, 1 warning in 366.56s (0:06:06) ================================================ ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-12 06:17:21 -08:00
Sébastien Han	bf11cc0450	chore: update return type to Optional[str] (#982 )	2025-02-11 22:10:28 -08:00
Xi Yan	66d7e15c93	perf: ensure ToolCall in ChatCompletionResponse is subset of ChatCompletionRequest.tools (#1041 ) # What does this PR do? Problem - Using script: https://gist.github.com/thoraxe/6163b2145ce7b1c24c6026b64cf90085 - This hits an issue on server with `code_interpreter` not found, as we do not pass "builtin::code_interpreter" in AgentConfig's `toolgroups`. This is a general issue where model always tries to output `code_interpreter` in `ToolCall` even when we do not have `code_interpreter` available for execution. Reproduce Deeper Problem in chat-completion - Use script: https://gist.github.com/yanxi0830/163a9ad7b5db10556043fbfc7ecd7603 1. We currently always populate `code_interpreter` in `ToolCall` in ChatCompletionResponse if the model's response begins with `<\|python_tag\|>`. See `c5f5958498/models/llama3/api/chat_format.py (L200-L213)` <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> 2. This happens even if we do not pass the `code_interpreter` as a `tools` in ChatCompletionRequest. This PR Explicitly make sure that the tools returned in `ChatCompletionResponse.tool_calls` is always a tool requested by `ChatCompletionRequest.tools`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Before <img width="913" alt="image" src="https://github.com/user-attachments/assets/328d313d-0a0b-495c-8715-61cca9ccc4a6" /> <img width="997" alt="image" src="https://github.com/user-attachments/assets/d3e82b62-b142-4939-954c-62843bec7110" /> After <img width="856" alt="image" src="https://github.com/user-attachments/assets/2c70ce55-c8d0-45ea-b10f-f70adc50d3d9" /> <img width="1000" alt="image" src="https://github.com/user-attachments/assets/b5e81826-c35b-4052-bf81-7afff93ce2ef" /> Unit Test ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/ ``` <img width="1002" alt="image" src="https://github.com/user-attachments/assets/04808517-eded-4122-97f5-7e5142de9779" /> Streaming - Chat Completion <img width="902" alt="image" src="https://github.com/user-attachments/assets/f477bc86-bd38-4729-b49e-a0a6ed3f835a" /> - Agent <img width="916" alt="image" src="https://github.com/user-attachments/assets/f4cc3417-23cd-46b1-953d-3a2271e79bbb" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-11 18:31:35 -08:00
Yuan Tang	dd37e58868	feat: Support tool calling for non-streaming chat completion in remote vLLM provider (#1034 ) # What does this PR do? This PR adds support for tool calling for non-streaming chat completion. Prior to this, tool calls were not passed to chat completion requests and the tools object needs to be restructured properly to be compatible with vLLM provider. ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/inference/test_text_inference.py ================================================================= test session starts ================================================================= platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 12 items tests/client-sdk/inference/test_text_inference.py::test_text_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 8%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 16%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote:...) [ 25%] tests/client-sdk/inference/test_text_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::vll...) [ 33%] tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 41%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet do humans live on?-Earth] PASSED [ 50%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-Which planet has rings around it with a name starting with letter S?-Saturn] PASSED [ 58%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED [ 66%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED [ 75%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED [ 83%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] FAILED [ 91%] tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED [100%] ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-11 21:08:29 -05:00
Ihar Hrachyshka	24385cfd03	fix: filter out remote::sample providers when listing (#1057 ) # What does this PR do? Before: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ \| remote::sample \| \| +------------------------+-----------------------------------------------------------------------+ ``` After: ```  llama stack list-providers agents +------------------------+-----------------------------------------------------------------------+ \| Provider Type \| PIP Package Dependencies \| +------------------------+-----------------------------------------------------------------------+ \| inline::meta-reference \| matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis \| +------------------------+-----------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-11 16:12:46 -08:00
Dinesh Yeduguru	d8a20e034b	feat: make telemetry attributes be dict[str,PrimitiveType] (#1055 ) # What does this PR do? Make attributes in telemetry be only primitive types and avoid arbitrary nesting. ## Test Plan ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py -k "test_builtin_tool_web_search" # Verified that attributes still show up correclty in jaeger ```	2025-02-11 15:10:17 -08:00
ehhuang	96c88397da	fix: agent config validation (#1053 ) Summary: Fixes AgentConfig init bug introduced with ToolConfig. Namely, the below doesn't work ``` agent_config = AgentConfig( **common_params, tool_config=ToolConfig( tool_choice="required", ), ) ``` bvecause tool_choice was defaulted to 'auto' leading to validation check failing. Test Plan: added unittests LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-11 14:48:42 -08:00
Yuan Tang	d954f2752e	fix: Added missing `tool_config` arg in SambaNova `chat_completion()` (#1042 ) # What does this PR do? `tool_config` is missing from the signature but is used in `ChatCompletionRequest()`. ## Test Plan This is a small fix. I don't have SambaNova to test the change but I doubt that this is currently working. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-10 21:20:50 -08:00
Sébastien Han	b34c1dd8ad	test: replace blocked image URLs with GitHub-hosted (#1025 ) # What does this PR do? The previous image URLs were sometimes blocked by Cloudflare, causing test failures for some users. This update replaces them with a GitHub-hosted image (`dog.png`) from the `llama-stack` repository, ensuring more reliable access during testing. Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` $ ollama run llama3.2-vision:latest --keep-alive 2m & $ uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================ test session starts ============================================= platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 39 items / 36 deselected / 3 selected llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] PASSED ========================== 3 passed, 36 deselected, 2 warnings in 62.23s (0:01:02) ========================== ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-10 22:38:11 -05:00
Bill Murdock	3856927ee8	fix: Update Qdrant support post-refactor (#1022 ) # What does this PR do? I tried running the Qdrant provider and found some bugs. See #1021 for details. @terrytangyuan wrote there: > Please feel free to submit your changes in a PR. I fixed similar issues for pgvector provider. This might be an issue introduced from a refactoring. So I am submitting this PR. Closes #1021 ## Test Plan Here are the highlights for what I did to test this: References: - https://llama-stack.readthedocs.io/en/latest/getting_started/index.html - https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py - https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md#build-configure-and-run-llama-stack Install and run Qdrant server: ``` podman pull qdrant/qdrant mkdir qdrant-data podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant ``` Install and run Llama Stack from the venv-support PR (mainly because I didn't want to install conda): ``` brew install cmake # Should just need this once git clone https://github.com/meta-llama/llama-models.git gh repo clone cdoern/llama-stack cd llama-stack gh pr checkout 1018 # This is the checkout that introduces venv support for build/run. Otherwise you have to use conda. Eventually this wil be part of main, hopefully. uv sync --extra dev uv pip install -e . source .venv/bin/activate uv pip install qdrant_client LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack build --template ollama --image-type venv ``` ``` edit llama_stack/templates/ollama/run.yaml ``` in that editor under: ``` vector_io: ``` add: ``` - provider_id: qdrant provider_type: remote::qdrant config: {} ``` see https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/vector_io/qdrant/config.py#L14 for config options (but I didn't need any) ``` LLAMA_STACK_DIR=$(pwd) LLAMA_MODELS_DIR=../llama-models llama stack run ollama --image-type venv \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env SAFETY_MODEL=$SAFETY_MODEL \ --env OLLAMA_URL=$OLLAMA_URL ``` Then I tested it out in a notebook. Key highlights included: ``` qdrant_provider = None for provider in client.providers.list(): if provider.api == "vector_io" and provider.provider_id == "qdrant": qdrant_provider = provider qdrant_provider assert qdrant_provider is not None, "QDrant is not a provider. You need to edit the run yaml file you use in your `llama stack run` call" vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" client.vector_dbs.register( vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, provider_id=qdrant_provider.provider_id, ) ``` Other than that, I just followed what was in https://llama-stack.readthedocs.io/en/latest/getting_started/index.html It would be good to have automated tests for this in the future, but that would be a big undertaking. Signed-off-by: Bill Murdock <bmurdock@redhat.com>	2025-02-10 18:08:33 -05:00
Ashwin Bharambe	10bda65b94	Nuke use_proxy from code execution	2025-02-07 09:55:55 -08:00
Sébastien Han	316c43fdaf	refactor(ollama): model availability check (#986 ) # What does this PR do? Moved model availability check logic into a dedicated check_model_availability function. Eliminated redundant code by reusing the helper function in both embedding and non-embedding model registration. Signed-off-by: Sébastien Han <seb@redhat.com> ## Test Plan Run Ollama and serve 2 models to get most the unit test pass: ``` ollama run llama3.2:3b-instruct-fp16 --keepalive 2m & ollama run llama3.1:8b --keepalive 2m & ``` Run the unit test: ``` uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================ test session starts ============================================= platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 65 items / 60 deselected / 5 selected llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED [ 20%] llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED [ 40%] llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] FAILED [ 60%] llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] FAILED [ 80%] llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED [100%] ================================================== FAILURES ================================================== _______________________ TestModelRegistration.test_register_with_llama_model[-ollama] ________________________ llama_stack/providers/tests/inference/test_model_registration.py:54: in test_register_with_llama_model _ = await models_impl.register_model( llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper result = await method(self, args, kwargs) llama_stack/distribution/routers/routing_tables.py:245: in register_model registered_model = await self.register_object(model) llama_stack/distribution/routers/routing_tables.py:192: in register_object registered_obj = await register_object_with_provider(obj, p) llama_stack/distribution/routers/routing_tables.py:53: in register_object_with_provider return await p.register_model(obj) llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper result = await method(self, args, **kwargs) llama_stack/providers/remote/inference/ollama/ollama.py:368: in register_model await check_model_availability(model.provider_resource_id) llama_stack/providers/remote/inference/ollama/ollama.py:359: in check_model_availability raise ValueError( E ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-instruct-fp16 __________________ TestModelRegistration.test_initialize_model_during_registering[-ollama] ___________________ llama_stack/providers/tests/inference/test_model_registration.py:85: in test_initialize_model_during_registering mock_load_model.assert_called_once() /opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/mock.py:956: in assert_called_once raise AssertionError(msg) E AssertionError: Expected 'load_model' to have been called once. Called 0 times. -------------------------------------------- Captured stderr call -------------------------------------------- W0207 11:55:26.777000 90854 .venv/lib/python3.13/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. ========================================== short test summary info =========================================== FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] - ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-i... FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] - AssertionError: Expected 'load_model' to have been called once. Called 0 times. =========================== 2 failed, 3 passed, 60 deselected, 2 warnings in 1.84s =========================== ``` We only "care" about the `test_register_nonexistent_model` for this code. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-07 09:52:16 -08:00
Sébastien Han	0b7098493a	test: encode image data as base64 (#1003 ) # What does this PR do? Previously, the test was failing due to a pydantic validation error caused by passing raw binary image data instead of a valid Unicode string. This fix encodes the image data as base64, ensuring it is a valid string format compatible with `ImageContentItem`. Error: ``` ______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________ llama_stack/providers/tests/inference/test_vision_inference.py:31: in <module> class TestVisionModelInference: llama_stack/providers/tests/inference/test_vision_inference.py:37: in TestVisionModelInference ImageContentItem(image=dict(data=PASTA_IMAGE)), E pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem E image.data E Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes] E For further information visit https://errors.pydantic.dev/2.10/v/string_unicode ``` Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Execute the following: ``` ollama run llama3.2-vision --keepalive 2m & uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] FAILED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] FAILED ``` The last two tests are failing because Cloudflare blocked me from accessing https://www.healthypawspetinsurance.com/Images/V3/DogAndPuppyInsurance/Dog_CTA_Desktop_HeroImage.jpg but this has no impact on the current fix. [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-07 09:44:16 -08:00
Sébastien Han	657f24b964	chore: add missing ToolConfig import in groq.py (#983 ) # What does this PR do? Imported `ToolConfig` from the `llama_stack.apis.inference` module to resolve missing reference and ensure proper functionality within the `groq.py` file. Signed-off-by: Sébastien Han <seb@redhat.com> ## Test Plan Without the change, pytest will run with the following error: ``` uv run pytest -v -s -k "ollama" llama_stack/providers/tests/ /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================ test session starts ============================================= platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 379 items / 1 error / 349 deselected / 30 selected =================================================== ERRORS =================================================== __________________ ERROR collecting llama_stack/providers/tests/inference/groq/test_init.py __________________ llama_stack/providers/tests/inference/groq/test_init.py:11: in <module> from llama_stack.providers.remote.inference.groq.groq import GroqInferenceAdapter llama_stack/providers/remote/inference/groq/groq.py:72: in <module> class GroqInferenceAdapter(Inference, ModelRegistryHelper, NeedsRequestProviderData): llama_stack/providers/remote/inference/groq/groq.py:102: in GroqInferenceAdapter tool_config: Optional[ToolConfig] = None, E NameError: name 'ToolConfig' is not defined ========================================== short test summary info =========================================== ERROR llama_stack/providers/tests/inference/groq/test_init.py - NameError: name 'ToolConfig' is not defined !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! =============================== 349 deselected, 22 warnings, 1 error in 0.28s ================================ ``` With the change the test continues to run and fails with a different error: ``` uv run pytest -v -s llama_stack/providers/tests/ /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ============================================ test session starts ============================================= platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/leseb/Documents/AI/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0 asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None collected 342 items / 1 error =================================================== ERRORS =================================================== ______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________ llama_stack/providers/tests/inference/test_vision_inference.py:29: in <module> class TestVisionModelInference: llama_stack/providers/tests/inference/test_vision_inference.py:35: in TestVisionModelInference ImageContentItem(image=dict(data=PASTA_IMAGE)), E pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem E image.data E Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes] E For further information visit https://errors.pydantic.dev/2.10/v/string_unicode ========================================== short test summary info =========================================== ERROR llama_stack/providers/tests/inference/test_vision_inference.py - pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ======================================= 22 warnings, 1 error in 0.25s ======================================== ``` Which is fixed in https://github.com/meta-llama/llama-stack/pull/1003. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-07 09:35:00 -08:00
Sébastien Han	840344975d	test: rm unused exception alias in pytest.raises (#991 ) # What does this PR do? Refactored tests by removing unused exception alias (as exc_info) in pytest.raises, improving code clarity and reducing lint warnings. exc_info was never used. Signed-off-by: Sébastien Han <seb@redhat.com> ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-07 08:04:25 -08:00
Hardik Shah	a84e7669f0	feat: Add a new template for `dell` (#978 ) - Added new template `dell` and its documentation - Update docs - [minor] uv fix i came across - codegen for all templates Tested with ```bash export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=6601 export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank) export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 # build the stack template llama stack build --template=dell # start the TGI inference server podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0 # start chroma-db for vector-io ( aka RAG ) podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname) # build docker llama stack build --template=dell --image-type=container # run llama stack server ( via docker ) podman run -it \ --network host \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ~/.llama:/root/.llama \ # NOTE: mount the llama-stack / llama-model directories if testing local changes -v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL # test the server cd <PATH_TO_LLAMA_STACK_REPO> LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-02-06 14:14:39 -08:00
Yuan Tang	0a0ee5ca96	Fix incorrect handling of chat completion endpoint in remote::vLLM (#951 ) # What does this PR do? Fixes https://github.com/meta-llama/llama-stack/issues/949. ## Test Plan Verified that the correct chat completion endpoint is called after the change. Llama Stack server: ``` INFO: ::1:32838 - "POST /v1/inference/chat-completion HTTP/1.1" 200 OK 18:36:28.187 [END] /v1/inference/chat-completion [StatusCode.OK] (1276.12ms) ``` vLLM server: ``` INFO: ::1:36866 - "POST /v1/chat/completions HTTP/1.1" 200 OK ``` ```bash LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -s -v tests/client-sdk/inference/test_inference.py -k "test_image_chat_completion_base64 or test_image_chat_completion_non_streaming or test_image_chat_completion_streaming" ================================================================== test session starts =================================================================== platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/yutang/.conda/envs/distribution-myenv/bin/python3.10 cachedir: .pytest_cache rootdir: /home/yutang/repos/llama-stack configfile: pyproject.toml plugins: anyio-4.8.0 collected 16 items / 12 deselected / 4 selected tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-url] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64[meta-llama/Llama-3.2-11B-Vision-Instruct-data] PASSED ``` Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-06 10:45:19 -08:00
Sébastien Han	403292fcf6	test: replace memory with vector_io fixture (#984 ) # What does this PR do? Replaced references to `memory` with `vector_io` in `DEFAULT_PROVIDER_COMBINATIONS` and adjusted corresponding fixture imports to ensure proper configuration for vector I/O during tests. This change aligns with the new testing structure. Followup of https://github.com/meta-llama/llama-stack/pull/830 when the memory fixture was removed. Signed-off-by: Sébastien Han <seb@redhat.com> ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-06 10:12:59 -08:00
ehhuang	3922999118	sys_prompt support in Agent (#938 ) # What does this PR do? The current default system prompt for llama3.2 tends to overindex on tool calling and doesn't work well when the prompt does not require tool calling. This PR adds an option to override the default system prompt, and organizes tool-related configs into a new config object. - [ ] Addresses issue (#issue) ## Test Plan LLAMA_STACK_CONFIG=together pytest \-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-05 21:11:32 -08:00
Yuan Tang	a79a083e39	Fix broken pgvector provider and memory leaks (#947 ) This PR fixes the broken pgvector provider as well as wraps all cursor object creations with context manager to ensure that they get properly closed to avoid potential memory leaks. ``` > pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "pgvector" --env EMBEDDING_DIMENSION=384 --env PGVECTOR_PORT=7432 --env PGVECTOR_DB=db --env PGVECTOR_USER=user --env PGVECTOR_PASSWORD=pass -v -s --tb=short --disable-warnings llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_list[-pgvector] PASSED llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_banks_register[-pgvector] PASSED llama_stack/providers/tests/vector_io/test_vector_io.py::TestVectorIO::test_query_documents[-pgvector] The scores are: [0.8168284974053789, 0.8080469278964486, 0.8050996198466661] PASSED ``` --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-05 09:32:05 -08:00
Ihar Hrachyshka	5c8e35a9e2	docs, tests: replace datasets.rst with memory_optimizations.rst (#968 ) datasets.rst was removed from torchtune repo. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? Replace a missing 404 document with another one that exists. (Removed it from the list when memory_optimizations.rst was already pulled.) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-05 11:25:56 -05:00
ehhuang	c9ab72fa82	Support sys_prompt behavior in inference (#937 ) # What does this PR do? The current default system prompt for llama3.2 tends to overindex on tool calling and doesn't work well when the prompt does not require tool calling. This PR adds an option to override the default system prompt, and organizes tool-related configs into a new config object. - [ ] Addresses issue (#issue) ## Test Plan python -m unittest llama_stack.providers.tests.inference.test_prompt_adapter ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937). * #938 * __->__ #937	2025-02-03 23:35:16 -08:00
Ashwin Bharambe	f98efe68c9	Misc fixes (#944 ) - Make sure torch + torchvision go together as deps, otherwise bad stuff happens - Add a pre-commit for requirements.txt	2025-02-03 14:08:47 -08:00
Yuan Tang	83a51c7bfb	Properly close PGVector DB connection during shutdown() (#931 ) The connection to the DB was not closed during shutdown. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 21:23:13 -08:00
Jeff Tang	587753da2f	LocalInferenceImpl update for LS 0.1 (#911 ) # What does this PR do? To work with the updated iOSCalendarAssistantWithLocalInf [here](https://github.com/meta-llama/llama-stack-apps/compare/ios_local). In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. - [ ] Addresses issue (#issue) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-02 09:49:40 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Matthew Farrellee	e21c8b6d80	add image support to NVIDIA inference provider (#907 ) # What does this PR do? add support to the NVIDIA Inference provider for image inputs ## Test Plan 1. Run local [Llama 3.2 11b vision instruct](https://build.nvidia.com/meta/llama-3.2-11b-vision-instruct?snippet_tab=Docker) NIM 2. Start a stack, e.g. `llama stack run llama_stack/templates/nvidia/run.yaml --env NVIDIA_BASE_URL=http://localhost:8000` 3. Run image tests, e.g. `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_inference.py --vision-inference-model meta-llama/Llama-3.2-11B-Vision-Instruct -k image` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests.	2025-02-01 09:02:27 -08:00
Ashwin Bharambe	95786d5bdc	Update client-sdk test config option handling Fix test	2025-01-31 15:37:25 -08:00
Hardik Shah	a7b929f17e	Sec fixes as raised by bandit (#917 ) minor fixes to hashlib and jinja	2025-01-31 13:44:26 -08:00
Dmitry Rogozhkin	7ea14ae62e	feat: enable xpu support for meta-reference stack (#558 ) This commit adds support for XPU and CPU devices into meta-reference stack for text models. On creation stack automatically identifies which device to use checking available accelerate capabilities in the following order: CUDA, then XPU, finally CPU. This behaviour can be overwritten with the `DEVICE` environment variable. In this case explicitly specified device will be used. Tested with: ``` torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference ``` Results: * Tested on: system with single CUDA device, system with single XPU device and on pure CPU system * Results: all test pass except `test_completion_logprobs` * `test_completion_logprobs` fails in the same way as on a baseline, i.e. unrelated with this change: `AssertionError: Unexpected top_k=3` Requires: https://github.com/meta-llama/llama-models/pull/233 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-01-31 12:11:49 -08:00
Hardik Shah	97eb3eecea	Fix Agents to support code and rag simultaneously (#908 ) # What does this PR do? Fixes a bug where agents were not working when both rag and code-interpreter were added as tools. ## Test Plan Added a new client_sdk test which tests for this scenario ``` LLAMA_STACK_CONFIG=together pytest -s -v tests/client-sdk -k 'test_rag_and_code_agent' ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-01-30 17:09:34 -08:00
Xi Yan	94051cfe9e	fix ImageContentItem to take base64 string as image.data (#909 ) # What does this PR do? - Discussion in https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819 - image.data should accept base64 string as input instead of binary bytes, change prompt_adapter to account for that. ## Test Plan ``` pytest -v tests/client-sdk/inference/test_inference.py ``` with test in https://github.com/meta-llama/llama-stack/pull/906 ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-30 15:58:23 -08:00
snova-edwardm	7fe2592795	SambaNova supports Llama 3.3 (#905 ) # What does this PR do? - Fix typo - Support Llama 3.3 70B ## Test Plan Run the following scripts and obtain the test results Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 1.26s ============================================ ``` Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 0.52s ============================================ ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [N] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [N] Wrote necessary unit or integration tests.	2025-01-30 09:24:46 -08:00
Sixian Yi	836f47a82d	log probs - mark pytests as xfail for unsupported providers + add support for together (#883 ) # What does this PR do? 1) As per @mattf's suggestion, we want to mark the pytest as xfail for providers that do not support the functionality. In this diff, we xfail the logProbs inference tests for providers who does not support log probs. ( log probs is only supported by together, fireworks and vllm) 2) Added logProbs support for together according to their developer [doc](https://docs.together.ai/docs/logprobs). ## Test Plan 1) Together & Fireworks ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` ``` tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED ========================================================================================== 15 passed, 2 warnings in 19.46s =========================================================================================== ``` ``` export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml /opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py ``` All tests passed 2) Ollama - LogProbs tests are marked as xfailed. ``` tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet) ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 23:41:25 -08:00
Aidan Do	39c34dd25f	[#432 ] Groq Provider tool call tweaks (#811 ) # What does this PR do? Follow up for @ashwinb's comments in https://github.com/meta-llama/llama-stack/pull/630 - [x] Contributes to issue (#432) ## Test Plan <details> <summary>Environment</summary> ```shell export GROQ_API_KEY=<api-key> # Create environment if not already conda create --name llamastack-groq python=3.10 conda activate llamastack-groq wget https://raw.githubusercontent.com/aidando73/llama-stack/9165502582cd7cb178bc1dcf89955b45768ab6c1/build.yaml wget https://raw.githubusercontent.com/meta-llama/llama-stack/918172c7fa92522c9ebc586bdb4f386b1d9ea224/run.yaml # Build pip install -e . && llama stack build --config ./build.yaml --image-type conda # Activate built environment conda activate llamastack-groq # Test deps pip install pytest pytest_html pytest_asyncio ``` </details> <details> <summary>Unit tests</summary> ```shell # Setup conda activate llamastack-groq pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py -vv -k groq -s # Result llama_stack/providers/tests/inference/groq/test_groq_utils.py ....................... ========================================= 23 passed, 11 warnings in 0.06s ========================================= ``` </details> <details> <summary>Integration tests</summary> ```shell # Tests pytest llama_stack/providers/tests/inference/test_text_inference.py -k groq -s # Results ___________________________ TestInference.test_chat_completion_with_tool_calling[-groq] ___________________________ llama_stack/providers/tests/inference/test_text_inference.py:403: in test_chat_completion_with_tool_calling assert len(message.tool_calls) > 0 E assert 0 > 0 E + where 0 = len([]) E + where [] = CompletionMessage(role='assistant', content='<function=get_weather>{"location": "San Francisco, CA"}', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]).tool_calls ============================================= short test summary info ============================================= FAILED llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-groq] - assert 0 > 0 ======================== 1 failed, 3 passed, 5 skipped, 99 deselected, 7 warnings in 2.13s ======================== ``` (One failure as expected from 3.2 3B - re: https://github.com/meta-llama/llama-stack/pull/630#discussion_r1914056503) </details> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-01-29 12:02:12 -08:00
Matthew Farrellee	1a5c17a92f	align with CompletionResponseStreamChunk.delta as str (instead of TextDelta) (#900 ) # What does this PR do? fix type mismatch in /v1/inference/completion ## Test Plan `llama stack run ./llama_stack/templates/nvidia/run.yaml` `LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v tests/client-sdk/inference/test_inference.py` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-29 09:25:50 -08:00
Ashwin Bharambe	f2feb7d15c	Fix Chroma adapter (#893 ) Chroma method had the wrong signature. ## Test Plan Start Chroma: `chroma run --path /tmp/foo/chroma2 --host localhost --port 6001` Modify run.yaml to include Chroma server pointing to localhost:6001 and run `llama stack run` Then: ```bash LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v agents/test_agents.py -k rag ``` passes	2025-01-28 13:19:47 -08:00
Zhonglin Han	229f0d5f7c	Agent response format (#660 ) # What does this PR do? Add response format for agents structured output. - [ ] Using structured output for agents (interior_design app as an example) (#issue) https://github.com/meta-llama/llama-stack-apps/issues/122 ## Test Plan E2E test plan with llama-stack-apps interior_design Please describe: Test ran: - provide instructions so it can be reproduced. Start your distro: llama stack run llama_stack/templates/fireworks/run.yaml --env FIREWORKS_API_KEY=<API_KEY> Run api test: ```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces``` ## Sources Results: https://github.com/meta-llama/llama-stack-client-python/pull/72 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-28 05:05:38 -08:00
snova-edwardm	aa65610e75	Sambanova - LlamaGuard (#886 ) # What does this PR do? - Fix loading SambaNovaImpl issue - Add LlamaGuard model support for inference ## Test Plan Run the following unit test scripts and results ### Embedding ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models) llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models) =================================================================================================================== 2 skipped, 1 warning in 0.32s =================================================================================================================== ``` ### Vision ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED =================================================================================================================== 3 passed, 1 warning in 2.68s ==================================================================================================================== ``` ### Text ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED =================================================================================================================== 1 passed, 1 warning in 0.46s ==================================================================================================================== ``` ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY} ``` ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED =================================================================================================================== 1 passed, 1 warning in 0.48s ==================================================================================================================== ``` ## Before submitting - [] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [Y] Wrote necessary unit or integration tests.	2025-01-27 15:46:30 -08:00
Dinesh Yeduguru	3c1a2c3d66	Fix telemetry init (#885 ) # What does this PR do? When you re-initialize the library client in a notebook, we were seeing this error: ``` Getting traces for session_id=5c8d1969-0957-49d2-b852-32cbb8ef8caf --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) [<ipython-input-11-d74bb6cdd3ab>](https://localhost:8080/#) in <cell line: 0>() 7 agent_logs = [] 8 ----> 9 for span in client.telemetry.query_spans( 10 attribute_filters=[ 11 {"key": "session_id", "op": "eq", "value": session_id}, 10 frames [/usr/local/lib/python3.11/dist-packages/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py](https://localhost:8080/#) in query_traces(self, attribute_filters, limit, offset, order_by) 246 ) -> QueryTracesResponse: 247 return QueryTracesResponse( --> 248 data=await self.trace_store.query_traces( 249 attribute_filters=attribute_filters, 250 limit=limit, AttributeError: 'TelemetryAdapter' object has no attribute 'trace_store' ``` This is happening because the we were skipping some required steps for the object state as part of the global _TRACE_PROVIDER check. This PR moves the initialization of the object state out of the TRACE_PROVIDER init.	2025-01-27 11:20:28 -08:00
Ashwin Bharambe	087a83f673	Bump key for faiss	2025-01-24 12:08:36 -08:00
Ashwin Bharambe	d78027f3b5	Move runpod provider to the correct directory Also cleanup the test code to avoid skipping tests. Let failures be known and public.	2025-01-23 12:25:12 -08:00
snova-edwardm	22dc684da6	Sambanova inference provider (#555 ) # What does this PR do? This PR adds SambaNova as one of the Provider - Add SambaNova as a provider ## Test Plan Test the functional command ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key> ``` Test the distribution template: ``` # Docker LLAMA_STACK_PORT=5001 docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ llamastack/distribution-sambanova \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY # Conda llama stack build --template sambanova --image-type conda llama stack run ./run.yaml \ --port $LLAMA_STACK_PORT \ --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY ``` ## Source [SambaNova API Documentation](https://cloud.sambanova.ai/apis) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [Y ] Wrote necessary unit or integration tests. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-01-23 12:20:28 -08:00
Marut Pandya	e2b5456e48	Add Runpod Provider + Distribution (#362 ) Add Runpod as a inference provider for openAI compatible managed endpoints. Testing - Configured llama stack from scratch, set `remote::runpod` as a inference provider. - Added Runpod Endpoint URL and API key. - Started llama-stack server - llama stack run my-local-stack --port 3000 ``` curl http://localhost:5000/inference/chat_completion \ -H "Content-Type: application/json" \ -d '{ "model": "Llama3.1-8B-Instruct", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the moon"} ], "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512} }' ``` --------- Signed-off-by: pandyamarut <pandyamarut@gmail.com>	2025-01-23 12:19:02 -08:00

1 2 3 4 5 ...

456 commits