chore: remove vision model URL workarounds and simplify client creation

The vision models are now available at the standard URL, so the workaround code has been removed. This also simplifies the codebase by eliminating the need for per-model client caching. - Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct models - Convert _get_client method to _client property for cleaner API - Remove unnecessary lru_cache decorator and functools import - Simplify client creation logic to use single base URL for all models
2025-12-23 04:02:25 +00:00 · 2025-07-16 05:21:25 -04:00 · 2025-07-16 05:21:25 -04:00 · 8cc3fe7669
commit 8cc3fe7669
parent 95fdc8ea94
2 changed files with 15 additions and 35 deletions
--- a/tests/unit/providers/nvidia/test_supervised_fine_tuning.py
+++ b/tests/unit/providers/nvidia/test_supervised_fine_tuning.py
@ -54,7 +54,8 @@ class TestNvidiaPostTraining(unittest.TestCase):
        self.mock_client.chat.completions.create = unittest.mock.AsyncMock()
        self.inference_mock_make_request = self.mock_client.chat.completions.create
        self.inference_make_request_patcher = patch(
-            "llama_stack.providers.remote.inference.nvidia.nvidia.NVIDIAInferenceAdapter._get_client",
+            "llama_stack.providers.remote.inference.nvidia.nvidia.NVIDIAInferenceAdapter._client",
+            new_callable=unittest.mock.PropertyMock,
            return_value=self.mock_client,
        )
        self.inference_make_request_patcher.start()