llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-31 01:13:53 +00:00

Author	SHA1	Message	Date
Ben Browning	ac5dc8fae2	Add prompt_logprobs and guided_choice to OpenAI completions This adds the vLLM-specific extra_body parameters of prompt_logprobs and guided_choice to our openai_completion inference endpoint. The plan here would be to expand this to support all common optional parameters of any of the OpenAI providers, allowing each provider to use or ignore these parameters based on whether their server supports them. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-09 15:47:02 -04:00
Ben Browning	fcdeb3d7bf	OpenAI completion prompt can also include tokens The OpenAI completion API supports strings, array of strings, array of tokens, or array of token arrays. So, expand our type hinting to support all of these types. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-09 15:47:02 -04:00
Ben Browning	a6cf8fa12b	OpenAI completion prompt can also be an array The OpenAI completion prompt field can be a string or an array, so update things to use and pass that properly. This also stubs in a basic conversion of OpenAI non-streaming completion requests to Llama Stack completion calls, for those providers that don't actually have an OpenAI backend to allow them to still accept requests via the OpenAI APIs. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-09 15:47:02 -04:00
Ben Browning	de01b1455b	Passthrough inference support for OpenAI-compatible APIs Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-09 15:47:02 -04:00
yyymeta	fb418813fc	fix: passthrough impl response.content.text (#1665 ) # What does this PR do? current passthrough impl returns chatcompletion_message.content as a TextItem() , not a straight string. so it's not compatible with other providers, and causes parsing error downstream. change away from the generic pydantic conversion, and explicitly parse out content.text ## Test Plan setup llama server with passthrough ``` llama-stack-client eval run-benchmark "MMMU_Pro_standard" --model-id meta-llama/Llama-3-8B --output-dir /tmp/ --num-examples 20 ``` works without parsing error	2025-03-17 13:42:08 -07:00
Botao Chen	90ca4d94de	fix: fix passthrough inference provider to make it work for agent (#1577 ) ## What does this PR do? We noticed that the passthrough inference provider doesn't work agent due to the type mis-match between client and server. We manually cast the llama stack client type to llama stack server type to fix the issue. ## test run `python -m examples.agents.hello localhost 8321` within llama-stack-apps <img width="1073" alt="Screenshot 2025-03-11 at 8 43 44 PM" src="https://github.com/user-attachments/assets/bd1bdd31-606a-420c-a249-95f6184cc0b1" /> fix https://github.com/meta-llama/llama-stack/issues/1560	2025-03-12 11:16:17 -07:00
Sébastien Han	803bf0e029	fix: solve ruff B008 warnings (#1444 ) # What does this PR do? The commit addresses the Ruff warning B008 by refactoring the code to avoid calling SamplingParams() directly in function argument defaults. Instead, it either uses Field(default_factory=SamplingParams) for Pydantic models or sets the default to None and instantiates SamplingParams inside the function body when the argument is None. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-06 16:48:35 -08:00
Ashwin Bharambe	81ce39a607	feat(api): Add options for supporting various embedding models (#1192 ) We need to support: - asymmetric embedding models (#934) - truncation policies (#933) - varying dimensional output (#932) ## Test Plan ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ```	2025-02-20 22:27:12 -08:00
Botao Chen	2b995c22eb	feat: inference passthrough provider (#1166 ) ## What does this PR do? In this PR, we implement a passthrough inference provider that works for any endpoints that respect llama stack inference API definition. ## Test Plan config some endpoint that respect llama stack inference API definition and got the inference results successfully <img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM" src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1" />	2025-02-19 21:47:00 -08:00

9 commits