Fix for issue that occured when proxying to ollama

In the text_completion() function, it previously threw an exception at:
raw_response = response._hidden_params.get("original_response", None)

Due to response being an coroutine object to an ollama_acompletion call,
so I added an asyncio.iscoroutine() check for the response and handle it
by calling response = asyncio.run(response)

I also had to fix atext_completion(), where init_response was an instance
of TextCompletionResponse.

Since this case was not handled by the if-elif that checks if init_response
is a coroutine, a dict or a ModelResponse instance, response was unbound
which threw an exception on the "return response" line.

Note that a regular pyright based linter detects that response is possibly
unbound, and that the same code pattern is used in multiple other places
in main.py.

I would suggest that you either change these cases:

init_response = await loop.run_in_executor(...
if isinstance(init_response, ...
    response = init_response
elif asyncio.iscoroutine(init_response):
    response = await init_response

To either just:

response = await loop.run_in_executor(
if asyncio.iscoroutine(response):
    response = await response

Or at the very least, include an else statement and set response = init_response,
so that response is never unbound when the code proceeds.
This commit is contained in:
Joel Eriksson 2023-12-17 17:27:47 +02:00
parent c703fb2f2c
commit a419d59542

View file

@ -2016,11 +2016,9 @@ async def atext_completion(*args, **kwargs):
response = text_completion(*args, **kwargs)
else:
# Await normally
init_response = await loop.run_in_executor(None, func_with_context)
if isinstance(init_response, dict) or isinstance(init_response, ModelResponse): ## CACHING SCENARIO
response = init_response
elif asyncio.iscoroutine(init_response):
response = await init_response
response = await loop.run_in_executor(None, func_with_context)
if asyncio.iscoroutine(response):
response = await response
else:
# Call the synchronous function using run_in_executor
response = await loop.run_in_executor(None, func_with_context)
@ -2196,6 +2194,9 @@ def text_completion(
response = TextCompletionStreamWrapper(completion_stream=response, model=model)
return response
if asyncio.iscoroutine(response):
response = asyncio.run(response)
transformed_logprobs = None
# only supported for TGI models
try: