llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-03 19:57:35 +00:00

Author	SHA1	Message	Date
Matthew Farrellee	466ef6f490	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin - remove auto-download of ollama embedding models - add embedding model metadata to dynamic listing w/ unit test - add support and tests for allowed_models - removed inference provider models.py files where dynamic listing is enabled - store embedding metadata in embedding_model_metadata field on inference providers - make model_entries optional on ModelRegistryHelper and LiteLLMOpenAIMixin - make OpenAIMixin a ModelRegistryHelper - skip base64 embedding test for remote::ollama, always returns floats - only use OpenAI client for ollama model listing - remove unused build_model_entry function - remove unused get_huggingface_repo function	2025-09-25 04:56:54 -04:00
Matthew Farrellee	d6c3b36390	chore: update the gemini inference impl to use openai-python for openai-compat functions (#3351 ) # What does this PR do? update the Gemini inference provider to use openai-python for the openai-compat endpoints partially addresses #3349, does not address /inference/completion or /inference/chat-completion ## Test Plan ci	2025-09-06 12:22:20 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Ashwin Bharambe	928a39d17b	feat(providers): Groq now uses LiteLLM openai-compat (#1303 ) Groq has never supported raw completions anyhow. So this makes it easier to switch it to LiteLLM. All our test suite passes. I also updated all the openai-compat providers so they work with api keys passed from headers. `provider_data` ## Test Plan ```bash LLAMA_STACK_CONFIG=groq \ pytest -s -v tests/client-sdk/inference/test_text_inference.py \ --inference-model=groq/llama-3.3-70b-versatile --vision-inference-model="" ``` Also tested (openai, anthropic, gemini) providers. No regressions.	2025-02-27 13:16:50 -08:00
Ashwin Bharambe	63e6acd0c3	feat: add (openai, anthropic, gemini) providers via litellm (#1267 ) # What does this PR do? This PR introduces more non-llama model support to llama stack. Providers introduced: openai, anthropic and gemini. All of these providers use essentially the same piece of code -- the implementation works via the `litellm` library. We will expose only specific models for providers we enable making sure they all work well and pass tests. This setup (instead of automatically enabling _all_ providers and models allowed by LiteLLM) ensures we can also perform any needed prompt tuning on a per-model basis as needed (just like we do it for llama models.) ## Test Plan ```bash #!/bin/bash args=("$@") for model in openai/gpt-4o anthropic/claude-3-5-sonnet-latest gemini/gemini-1.5-flash; do LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/inference/test_text_inference.py \ --embedding-model=all-MiniLM-L6-v2 \ --vision-inference-model="" \ --inference-model=$model "${args[@]}" done ```	2025-02-25 22:07:33 -08:00

5 commits