This adds the vLLM-specific extra_body parameters of prompt_logprobs
and guided_choice to our openai_completion inference endpoint. The
plan here would be to expand this to support all common optional
parameters of any of the OpenAI providers, allowing each provider to
use or ignore these parameters based on whether their server supports them.
Signed-off-by: Ben Browning <bbrownin@redhat.com>
When called via the OpenAI API, ollama is responding with more brief
responses than when called via its native API. This adjusts the
prompting for its OpenAI calls to ask it to be more verbose.
This starts to stub in some integration tests for the
OpenAI-compatible server APIs using an OpenAI client.
Signed-off-by: Ben Browning <bbrownin@redhat.com>