llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 02:03:44 +00:00

Author	SHA1	Message	Date
Matthew Farrellee	aa5bef8e05	feat: expand set of known openai models, allow using openai canonical model names (#2164 ) note: the openai provider exposes the litellm specific model names to the user. this change is compatible with that. the litellm names should be deprecated.	2025-05-14 13:18:15 -07:00
Ashwin Bharambe	473a07f624	fix: revert "feat(provider): adding llama4 support in together inference provider (#2123 )" (#2124 ) This reverts commit `0f878ad87a`. The llama4 models already existed for Together. cc @yogishbaliga @bbrowning	2025-05-08 15:18:16 -07:00
Yogish Baliga	0f878ad87a	feat(provider): adding llama4 support in together inference provider (#2123 ) # What does this PR do? Adding Llama4 model support in TogetherAI provider	2025-05-08 14:27:56 -07:00
Sébastien Han	6371bb1b33	chore(refact)!: simplify config management (#1105 ) # What does this PR do? We are dropping configuration via CLI flag almost entirely. If any server configuration has to be tweak it must be done through the server section in the run.yaml. This is unfortunately a breaking change for whover was using: * `--tls-` `--disable_ipv6` `--port` stays around and get a special treatment since we believe, it's common for user dev to change port for quick experimentations. Closes: https://github.com/meta-llama/llama-stack/issues/1076 ## Test Plan Simply do `llama stack run <config>` nothing should break :) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-07 09:18:12 -07:00
Jorge Piedrahita Ortiz	b2b00a216b	feat(providers): sambanova updated to use LiteLLM openai-compat (#1596 ) # What does this PR do? switch sambanova inference adaptor to LiteLLM usage to simplify integration and solve issues with current adaptor when streaming and tool calling, models and templates updated ## Test Plan pytest -s -v tests/integration/inference/test_text_inference.py --stack-config=sambanova --text-model=sambanova/Meta-Llama-3.3-70B-Instruct pytest -s -v tests/integration/inference/test_vision_inference.py --stack-config=sambanova --vision-model=sambanova/Llama-3.2-11B-Vision-Instruct	2025-05-06 16:50:22 -07:00
Ashwin Bharambe	272d3359ee	fix: remove code interpeter implementation (#2087 ) # What does this PR do? The builtin implementation of code interpreter is not robust and has a really weak sandboxing shell (the `bubblewrap` container). Given the availability of better MCP code interpreter servers coming up, we should use them instead of baking an implementation into the Stack and expanding the vulnerability surface to the rest of the Stack. This PR only does the removal. We will add examples with how to integrate with MCPs in subsequent ones. ## Test Plan Existing tests.	2025-05-01 14:35:08 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Sébastien Han	4412694018	chore: Remove zero-width space characters from OTEL service name env var defaults (#2060 ) # What does this PR do? Replaced `${env.OTEL_SERVICE_NAME:\u200B}` and similar variants with properly formatted `${env.OTEL_SERVICE_NAME:}` across all YAML templates and TelemetryConfig. This prevents silent parsing issues and ensures consistent environment variable resolution. Slipped in https://github.com/meta-llama/llama-stack/pull/2058 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 17:56:46 +02:00
Roland Huß	5a2bfd6ad5	refactor: Replace SQLITE_DB_PATH by SQLITE_STORE_DIR env in templates (#2055 ) # What does this PR do? The telemetry provider configs is the only one who leverages the env var `SQLITE_DB_PATH` for pointing to persistent data in the respective templates, whereas usually `SQLITE_STORE_DIR` is used. This PR modifies the `sqlite_db_path` in various telemetry configuration files to use the environment variable `SQLITE_STORE_DIR` instead of `SQLITE_DB_PATH`. This change ensures that _only_ the SQLITE_STORE_DIR needs to be set to point to a different persistence location for providers. All references to `SQLITE_DB_PATH` have been removed. Another improvement could be to move `sqlite_db_path` to `db_path` in the telemetry provider config, to align with the other provider configurations. That could be done by another PR (if wanted).	2025-04-29 15:28:10 -07:00
Ben Browning	7641a5cd0b	fix: 100% OpenAI API verification for together and fireworks (#1946 ) # What does this PR do? TLDR: Changes needed to get 100% passing tests for OpenAI API verification tests when run against Llama Stack with the `together`, `fireworks`, and `openai` providers. And `groq` is better than before, at 88% passing. This cleans up the OpenAI API support for image message types (specifically `image_url` types) and handling of the `response_format` chat completion parameter. Both of these required a few more Pydantic model definitions in our Inference API, just to move from the not-quite-right stubs I had in place to something fleshed out to match the actual OpenAI API specs. As part of testing this, I also found and fixed a bug in the litellm implementation of openai_completion and openai_chat_completion, so the providers based on those should actually be working now. The method `prepare_openai_completion_params` in `llama_stack/providers/utils/inference/openai_compat.py` was improved to actually recursively clean up input parameters, including handling of lists, dicts, and dumping of Pydantic models to dicts. These changes were required to get to 100% passing tests on the OpenAI API verification against the `openai` provider. With the above, the together.ai provider was passing as well as it is without Llama Stack. But, since we have Llama Stack in the middle, I took the opportunity to clean up the together.ai provider so that it now also passes the OpenAI API spec tests we have at 100%. That means together.ai is now passing our verification test better when using an OpenAI client talking to Llama Stack than it is when hitting together.ai directly, without Llama Stack in the middle. And, another round of work for Fireworks to improve translation of incoming OpenAI chat completion requests to Llama Stack chat completion requests gets the fireworks provider passing at 100%. The server-side fireworks.ai tool calling support with OpenAI chat completions and Llama 4 models isn't great yet, but by pointing the OpenAI clients at Llama Stack's API we can clean things up and get everything working as expected for Llama 4 models. ## Test Plan ### OpenAI API Verification Tests I ran the OpenAI API verification tests as below and 100% of the tests passed. First, start a Llama Stack server that runs the `openai` provider with the `gpt-4o` and `gpt-4o-mini` models deployed. There's not a template setup to do this out of the box, so I added a `tests/verifications/openai-api-verification-run.yaml` to do this. First, ensure you have the necessary API key environment variables set: ``` export TOGETHER_API_KEY="..." export FIREWORKS_API_KEY="..." export OPENAI_API_KEY="..." ``` Then, run a Llama Stack server that serves up all these providers: ``` llama stack run \ --image-type venv \ tests/verifications/openai-api-verification-run.yaml ``` Finally, generate a new verification report against all these providers, both with and without the Llama Stack server in the middle. ``` python tests/verifications/generate_report.py \ --run-tests \ --provider \ together \ fireworks \ groq \ openai \ together-llama-stack \ fireworks-llama-stack \ groq-llama-stack \ openai-llama-stack ``` You'll see that most of the configurations with Llama Stack in the middle now pass at 100%, even though some of them do not pass at 100% when hitting the backend provider's API directly with an OpenAI client. ### OpenAI Completion Integration Tests with vLLM: I also ran the smaller `test_openai_completion.py` test suite (that's not yet merged with the verification tests) on multiple of the providers, since I had to adjust the method signature of openai_chat_completion a bit and thus had to touch lots of these providers to match. Here's the tests I ran there, all passing: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### OpenAI Completion Integration Tests with ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ### OpenAI Completion Integration Tests with together.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" llama stack build --template together --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct-Turbo" ``` ### OpenAI Completion Integration Tests with fireworks.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack build --template fireworks --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.1-8B-Instruct" --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-14 08:56:29 -07:00
ehhuang	031a40bec0	fix: type (#1898 ) # What does this PR do? ## Test Plan	2025-04-08 09:07:25 -07:00
ehhuang	7b4eb0967e	test: verification on provider's OAI endpoints (#1893 ) # What does this PR do? ## Test Plan export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic; LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference --vision-model $MODEL --text-model $MODEL	2025-04-07 23:06:28 -07:00

12 commits