Fix for issue that occured when proxying to ollama

In the text_completion() function, it previously threw an exception at: raw_response = response._hidden_params.get("original_response", None) Due to response being an coroutine object to an ollama_acompletion call, so I added an asyncio.iscoroutine() check for the response and handle it by calling response = asyncio.run(response) I also had to fix atext_completion(), where init_response was an instance of TextCompletionResponse. Since this case was not handled by the if-elif that checks if init_response is a coroutine, a dict or a ModelResponse instance, response was unbound which threw an exception on the "return response" line. Note that a regular pyright based linter detects that response is possibly unbound, and that the same code pattern is used in multiple other places in main.py. I would suggest that you either change these cases: init_response = await loop.run_in_executor(... if isinstance(init_response, ... response = init_response elif asyncio.iscoroutine(init_response): response = await init_response To either just: response = await loop.run_in_executor( if asyncio.iscoroutine(response): response = await response Or at the very least, include an else statement and set response = init_response, so that response is never unbound when the code proceeds.
2023-12-17 17:27:47 +02:00 · 2023-12-17 17:27:47 +02:00 · a419d59542
commit a419d59542
parent c703fb2f2c
1 changed files with 6 additions and 5 deletions
--- a/litellm/main.py
+++ b/litellm/main.py
@ -2016,11 +2016,9 @@ async def atext_completion(*args, **kwargs):
                response = text_completion(*args, **kwargs)
            else:
                # Await normally
-                init_response = await loop.run_in_executor(None, func_with_context)
-                if isinstance(init_response, dict) or isinstance(init_response, ModelResponse): ## CACHING SCENARIO 
-                    response = init_response
-                elif asyncio.iscoroutine(init_response):
-                    response = await init_response
+                response = await loop.run_in_executor(None, func_with_context)
+                if asyncio.iscoroutine(response):
+                    response = await response
        else: 
            # Call the synchronous function using run_in_executor
            response =  await loop.run_in_executor(None, func_with_context)
@ -2196,6 +2194,9 @@ def text_completion(
        response = TextCompletionStreamWrapper(completion_stream=response, model=model)
        return response

+    if asyncio.iscoroutine(response):
+        response = asyncio.run(response)
+
    transformed_logprobs = None
    # only supported for TGI models
    try: