forked from phoenix/litellm-mirror
Fix for issue that occured when proxying to ollama
In the text_completion() function, it previously threw an exception at: raw_response = response._hidden_params.get("original_response", None) Due to response being an coroutine object to an ollama_acompletion call, so I added an asyncio.iscoroutine() check for the response and handle it by calling response = asyncio.run(response) I also had to fix atext_completion(), where init_response was an instance of TextCompletionResponse. Since this case was not handled by the if-elif that checks if init_response is a coroutine, a dict or a ModelResponse instance, response was unbound which threw an exception on the "return response" line. Note that a regular pyright based linter detects that response is possibly unbound, and that the same code pattern is used in multiple other places in main.py. I would suggest that you either change these cases: init_response = await loop.run_in_executor(... if isinstance(init_response, ... response = init_response elif asyncio.iscoroutine(init_response): response = await init_response To either just: response = await loop.run_in_executor( if asyncio.iscoroutine(response): response = await response Or at the very least, include an else statement and set response = init_response, so that response is never unbound when the code proceeds.
This commit is contained in:
parent
c703fb2f2c
commit
a419d59542
1 changed files with 6 additions and 5 deletions
|
@ -2016,11 +2016,9 @@ async def atext_completion(*args, **kwargs):
|
|||
response = text_completion(*args, **kwargs)
|
||||
else:
|
||||
# Await normally
|
||||
init_response = await loop.run_in_executor(None, func_with_context)
|
||||
if isinstance(init_response, dict) or isinstance(init_response, ModelResponse): ## CACHING SCENARIO
|
||||
response = init_response
|
||||
elif asyncio.iscoroutine(init_response):
|
||||
response = await init_response
|
||||
response = await loop.run_in_executor(None, func_with_context)
|
||||
if asyncio.iscoroutine(response):
|
||||
response = await response
|
||||
else:
|
||||
# Call the synchronous function using run_in_executor
|
||||
response = await loop.run_in_executor(None, func_with_context)
|
||||
|
@ -2196,6 +2194,9 @@ def text_completion(
|
|||
response = TextCompletionStreamWrapper(completion_stream=response, model=model)
|
||||
return response
|
||||
|
||||
if asyncio.iscoroutine(response):
|
||||
response = asyncio.run(response)
|
||||
|
||||
transformed_logprobs = None
|
||||
# only supported for TGI models
|
||||
try:
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue