llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Ashwin Bharambe	c54164556a	fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308 ) The `--image-name __system__` thing was a hack and a bad one at that. The actual intent was to somehow automatically detect the notebook environment so we could avoid unnecessarily confusing things in the llama stack build cmd-line. But I failed which led us to use the backup `__system__` thing. Let's just do the simple thing. Note that `build_venv.sh` I haven't changed for now (so it still honors the __system__ special name just that no new user should use it.) ## Test Plan Open the notebooks from this branch in Colab (see example url below) and ensure the builds work. https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb In the notebook, install llama-stack from this branch directly using: ``` !pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip ``` Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv` afterwards succeeds and the library client initialization also works.	2025-02-27 16:39:04 -08:00
Ashwin Bharambe	928a39d17b	feat(providers): Groq now uses LiteLLM openai-compat (#1303 ) Groq has never supported raw completions anyhow. So this makes it easier to switch it to LiteLLM. All our test suite passes. I also updated all the openai-compat providers so they work with api keys passed from headers. `provider_data` ## Test Plan ```bash LLAMA_STACK_CONFIG=groq \ pytest -s -v tests/client-sdk/inference/test_text_inference.py \ --inference-model=groq/llama-3.3-70b-versatile --vision-inference-model="" ``` Also tested (openai, anthropic, gemini) providers. No regressions.	2025-02-27 13:16:50 -08:00
Xi Yan	fc5aff3ccf	feat: ability to retrieve agents session, turn, step by ids (#1286 ) # What does this PR do? - Fix up rotten implementation for retrieving agent's Session, Turn, Step with actual working implementation. - Update `getting_started` notebook with retrieving by agent session_id. https://github.com/meta-llama/llama-stack/blob/export_agent_dataset/docs/getting_started.ipynb [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Test with script: https://gist.github.com/yanxi0830/657cecee8f1f0e39d322963d9c0f598e <img width="503" alt="image" src="https://github.com/user-attachments/assets/5ea9bc33-83d1-40bc-98e1-b68393158387" /> [//]: # (## Documentation)	2025-02-27 09:45:14 -08:00
Matthew Farrellee	99b6925ad8	feat: add nemo retriever text embedding models to nvidia inference provider (#1218 ) # What does this PR do? add the NeMo Retriever Embedding models from https://docs.nvidia.com/nim/nemo-retriever/text-embedding/latest/support-matrix.html	2025-02-26 21:18:34 -08:00
Shrey	30ef1c3680	feat: Add model context protocol tools with ollama provider (#1283 ) # What does this PR do? Model context protocol (MCP) allows for remote tools to be connected with Agents. The current Ollama provider does not support it. This PR adds necessary code changes to ensure that the integration between Ollama backend and MCP works. This PR is an extension of #816 for Ollama. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] 1. Run llama-stack server with the command: ``` llama stack build --template ollama --image-type conda llama stack run ./templates/ollama/run.yaml \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://localhost:11434 ``` 2. Run the sample client agent with MCP tool: ``` from llama_stack_client.lib.agents.agent import Agent from llama_stack_client.lib.agents.event_logger import EventLogger from llama_stack_client.types.agent_create_params import AgentConfig from llama_stack_client.types.shared_params.url import URL from llama_stack_client import LlamaStackClient from termcolor import cprint ## Start the local MCP server # git clone https://github.com/modelcontextprotocol/python-sdk # Follow instructions to get the env ready # cd examples/servers/simple-tool # uv run mcp-simple-tool --transport sse --port 8000 # Connect to the llama stack server base_url="http://localhost:8321" model_id="meta-llama/Llama-3.2-3B-Instruct" client = LlamaStackClient(base_url=base_url) # Register MCP tools client.toolgroups.register( toolgroup_id="mcp::filesystem", provider_id="model-context-protocol", mcp_endpoint=URL(uri="http://localhost:8000/sse")) # Define an agent with MCP toolgroup agent_config = AgentConfig( model=model_id, instructions="You are a helpful assistant", toolgroups=["mcp::filesystem"], input_shields=[], output_shields=[], enable_session_persistence=False, ) agent = Agent(client, agent_config) user_prompts = [ "Fetch content from https://www.google.com and print the response" ] # Run a session with the agent session_id = agent.create_session("test-session") for prompt in user_prompts: cprint(f"User> {prompt}", "green") response = agent.create_turn( messages=[ { "role": "user", "content": prompt, } ], session_id=session_id, ) for log in EventLogger().log(response): log.print() ``` # Documentation The file docs/source/distributions/self_hosted_distro/ollama.md is updated to indicate the MCP tool runtime availability. Signed-off-by: Shreyanand <shanand@redhat.com>	2025-02-26 15:38:18 -08:00
ehhuang	c8a20b8ed0	feat: allow specifying specific tool within toolgroup (#1239 ) Summary: E.g. `builtin::rag::knowledge_search` Test Plan: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/ --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-02-26 14:07:05 -08:00
Ashwin Bharambe	6b075e5075	feat: automatically update documentation version based on pyproject.toml source of truth	2025-02-26 13:42:12 -08:00
Botao Chen	9a3db9a290	feat: update the post training notebook (#1280 ) ## What does this PR do? - add 'open in colab' icon that links to the notebook - update the pip install llama-stack pkg part ## test preview <img width="938" alt="Screenshot 2025-02-26 at 1 25 34 PM" src="https://github.com/user-attachments/assets/951b7f0f-a15e-4618-ad02-07c77c65a5ad" /> <img width="934" alt="Screenshot 2025-02-26 at 1 25 38 PM" src="https://github.com/user-attachments/assets/de872530-84b9-4f8b-ae93-06aa7d2e5bd8" />	2025-02-26 13:39:16 -08:00
Reid	abfc4b3bce	fix: the pre-commit new line issue (#1272 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `3783861877` ``` diff --git a/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb b/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb index c55c8da..3979088 100644 --- a/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb +++ b/docs/notebooks/Alpha_Llama_Stack_Post_Training.ipynb @@ -6431,4 +6431,4 @@ }, "nbformat": 4, "nbformat_minor": 0 -} \ No newline at end of file +} Error: Process completed with exit code 1. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-26 04:25:41 -05:00
Botao Chen	123fb9eb24	feat: [post training] support save hf safetensor format checkpoint (#845 ) ## context Now, in llama stack, we only support inference / eval a finetuned checkpoint with meta-reference as inference provider. This is sub-optimal since meta-reference is pretty slow. Our vision is that developer can inference / eval a finetuned checkpoint produced by post training apis with all the inference providers on the stack. To achieve this, we'd like to define an unified output checkpoint format for post training providers. So that, all the inference provider can respect that format for customized model inference. By spotting check how [ollama](https://github.com/ollama/ollama/blob/main/docs/import.md) and [fireworks](https://docs.fireworks.ai/models/uploading-custom-models) do inference on a customized model, we defined the output checkpoint format as /adapter/adapter_config.json and /adapter/adapter_model.safetensors (as we only support LoRA post training now, we begin from adapter only checkpoint) ## test we kick off a post training job and configured checkpoint format as 'huggingface'. Output files ![Screenshot 2025-02-24 at 11 54 33 PM](https://github.com/user-attachments/assets/fb45a5d7-f288-4d30-82f8-b7a8da2859be) we did a proof of concept with ollama to see if ollama can inference our finetuned checkpoint 1. create Modelfile like <img width="799" alt="Screenshot 2025-01-22 at 5 04 18 PM" src="https://github.com/user-attachments/assets/7fca9ac3-a294-44f8-aab1-83852c600609" /> 2. create a customized model with `ollama create llama_3_2_finetuned` and run inference successfully ![Screenshot 2025-02-24 at 11 55 17 PM](https://github.com/user-attachments/assets/1abe7c52-c6a7-491a-b07c-b7a8e3fd1ddd) This is just a proof of concept with ollama cmd line. As next step, we'd like to wrap loading / inference customized model logic in the inference provider implementation.	2025-02-25 23:29:08 -08:00
Reid	55eb257459	chore: update the zero_to_hero_guide doc link (#1220 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It should changed by `8585b95a28`, so show `404` when click it. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-25 17:16:02 -08:00
Vladislav Bronzov	967cff4533	feat: Add Groq distribution template (#1173 ) # What does this PR do? Create a distribution template using Groq as inference provider. Link to issue: https://github.com/meta-llama/llama-stack/issues/958 ## Test Plan Run `python llama_stack/scripts/distro_codegen.py` to generate run.yaml and build.yaml Test the newly created template by running `llama stack build --template <template-name>` `llama stack run <template-name>`	2025-02-25 14:16:56 -08:00
Kelly Brown	99c1d4c456	docs: Remove $ from client CLI ref to add valid copy and paste ability (#1260 ) Description: This PR removes the "$" symbol from the client CLI reference so that users have the ability to use the copy and paste code function without copying over the "$" symbol. Ik the "$" are good for showing user permissions, but I noticed they're not really used in other parts of the docs, and it makes the the copy and paste code blocks user flow easier. Very small nit PR, this is not a huge deal if PR is not needed.	2025-02-25 13:50:00 -08:00
raghotham	0885f959f1	fix: update index.md to include 0.1.4 (#1259 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-02-25 13:34:29 -08:00
Hardik Shah	30f79fafcb	fix: Update Llama_Stack_Benchmark_Evals.ipynb (#1246 ) Update eval notebook to use `--image-name __system__`	2025-02-24 18:22:42 -08:00
Hardik Shah	a1fe3c30dd	fix: Update getting_started.ipynb (#1245 ) update to install properly in system python in colab	2025-02-24 18:22:32 -08:00
Ashwin Bharambe	d6356f822a	fix: remove UV_SYSTEM_PYTHON from getting started notebook since llama stack build detects notebook environment	2025-02-24 10:05:02 -08:00
Reid	1842eeb96f	docs: small fixes (#1224 )	2025-02-24 07:59:58 -05:00
Yuan Tang	17162b9978	docs: Add vLLM to the list of inference providers in concepts and providers pages (#1227 ) This increases visibility of the vLLM provider.	2025-02-23 20:16:30 -08:00
Francisco Arceo	19ae4b35d9	docs: Adding Provider sections to docs (#1195 ) # What does this PR do? Adding Provider sections to docs (some of these will be empty and need updating). This PR is still a draft while I seek feedback from other contributors. I opened it to make the structure visible in the linked GitHub Issue. # Closes https://github.com/meta-llama/llama-stack/issues/1189 - Providers Overview Page ![Screenshot 2025-02-21 at 12 15 09 PM](https://github.com/user-attachments/assets/e83e5a17-0d96-4de0-8251-68161799a054) - SQLite-Vec specific page ![Screenshot 2025-02-21 at 12 15 34 PM](https://github.com/user-attachments/assets/14773900-fc8f-49e9-832a-b060b7ca010a) ## Test Plan N/A [//]: # (## Documentation) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-02-22 11:59:34 -08:00
ehhuang	25fddccfd8	feat: tool outputs metadata (#1155 ) Summary: Allows tools to output metadata. This is useful for evaluating tool outputs, e.g. RAG tool will output document IDs, which can be used to score recall. Will need to make a similar change on the client side to support ClientTool outputting metadata. Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py	2025-02-21 13:15:31 -08:00
Xi Yan	0fe071764f	feat(1/n): api: unify agents for handling server & client tools (#1178 ) # Problem Our current Agent framework has discrepancies in definition on how we handle server side and client side tools. 1. Server Tools: a single Turn is returned including `ToolExecutionStep` in agenst 2. Client Tools: `create_agent_turn` is called in loop with client agent lib yielding the agent chunk `ad6ffc63df/src/llama_stack_client/lib/agents/agent.py (L186-L211)` This makes it inconsistent to work with server & client tools. It also complicates the logs to telemetry to get information about agents turn / history for observability. #### Principle The same `turn_id` should be used to represent the steps required to complete a user message including client tools. ## Solution 1. `AgentTurnResponseEventType.turn_awaiting_input` status to indicate that the current turn is not completed, and awaiting tool input 2. `continue_agent_turn` endpoint to update agent turn with client's tool response. # What does this PR do? - Skeleton API as example ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] - Just API update, no functionality change ``` llama stack run + client-sdk test ``` <img width="842" alt="image" src="https://github.com/user-attachments/assets/7ac56b5f-f424-4632-9476-7e0f57555bc3" /> [//]: # (## Documentation)	2025-02-21 11:48:27 -08:00
Ashwin Bharambe	992f865b2e	chore: move embedding deps to RAG tool where they are needed (#1210 ) `EMBEDDING_DEPS` were wrongly associated with `vector_io` providers. They are needed by https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142 and related code and is used by the RAG tool and as such should only be needed by the `inline::rag-runtime` provider.	2025-02-21 11:33:41 -08:00
Ashwin Bharambe	11697f85c5	fix: pull ollama embedding model if necessary (#1209 ) Embedding models are tiny and can be pulled on-demand. Let's do that so the user doesn't have to do "yet another thing" to get themselves set up. Thanks @hardikjshah for the suggestion. Also fixed a build dependency miss (TODO: distro_codegen needs to actually check that the build template contains all providers mentioned for the run.yaml file) ## Test Plan First run `ollama rm all-minilm:latest`. Run `llama stack build --template ollama && llama stack run ollama --env INFERENCE_MODEL=llama3.2:3b-instruct-fp16`. See that it outputs a "Pulling embedding model `all-minilm:latest`" output and the stack starts up correctly. Verify that `ollama list` shows the model is correctly downloaded.	2025-02-21 10:35:56 -08:00
Reid	c9c4a3c921	feat: model remove cmd (#1128 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] add a subcommand, help to clean the unneeded model: ``` $ llama model --help usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ... Work with llama models options: -h, --help show this help message and exit $ llama model remove --help usage: llama model remove [-h] -m MODEL [-f] Remove the downloaded llama model options: -h, --help show this help message and exit -m MODEL, --model MODEL Specify the llama downloaded model name -f, --force Used to forcefully remove the llama model from the storage without further confirmation $ llama model remove -m Llama3.2-1B-Instruct:int4-qlora-eo8 Are you sure you want to remove Llama3.2-1B-Instruct:int4-qlora-eo8? (y/n): n Removal aborted. $ llama model remove -mLlama3.2-1B-Instruct:int4-qlora-eo8-f Llama3.2-1B-Instruct:int4-qlora-eo8 removed. ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-21 08:05:12 -08:00
Ashwin Bharambe	81ce39a607	feat(api): Add options for supporting various embedding models (#1192 ) We need to support: - asymmetric embedding models (#934) - truncation policies (#933) - varying dimensional output (#932) ## Test Plan ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ```	2025-02-20 22:27:12 -08:00
Ashwin Bharambe	6f9d622340	fix(api): update embeddings signature so inputs and outputs list align (#1161 ) See Issue #922 The change is slightly backwards incompatible but no callsite (in our client codebases or stack-apps) every passes a depth-2 `List[List[InterleavedContentItem]]` (which is now disallowed.) ## Test Plan ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ``` Also ran `tests/client-sdk/inference/test_embeddings.py`	2025-02-20 21:43:13 -08:00
Matthew Farrellee	832c535aaf	feat(providers): add NVIDIA Inference embedding provider and tests (#935 ) # What does this PR do? add /v1/inference/embeddings implementation to NVIDIA provider open topics - - asymmetric models. NeMo Retriever includes asymmetric models, which are models that embed differently depending on if the input is destined for storage or lookup against storage. the /v1/inference/embeddings api does not allow the user to indicate the type of embedding to perform. see https://github.com/meta-llama/llama-stack/issues/934 - truncation. embedding models typically have a limited context window, e.g. 1024 tokens is common though newer models have 8k windows. when the input is larger than this window the endpoint cannot perform its designed function. two options: 0. return an error so the user can reduce the input size and retry; 1. perform truncation for the user and proceed (common strategies are left or right truncation). many users encounter context window size limits and will struggle to write reliable programs. this struggle is especially acute without access to the model's tokenizer. the /v1/inference/embeddings api does not allow the user to delegate truncation policy. see https://github.com/meta-llama/llama-stack/issues/933 - dimensions. "Matryoshka" embedding models are available. they allow users to control the number of embedding dimensions the model produces. this is a critical feature for managing storage constraints. embeddings of 1024 dimensions what achieve 95% recall for an application may not be worth the storage cost if a 512 dimensions can achieve 93% recall. controlling embedding dimensions allows applications to determine their recall and storage tradeoffs. the /v1/inference/embeddings api does not allow the user to control the output dimensions. see https://github.com/meta-llama/llama-stack/issues/932 ## Test Plan - `llama stack run llama_stack/templates/nvidia/run.yaml` - `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/inference/test_embedding.py --embedding-model baai/bge-m3` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-20 16:59:48 -08:00
Ashwin Bharambe	9436dd570d	feat: register embedding models for ollama, together, fireworks (#1190 ) # What does this PR do? We have support for embeddings in our Inference providers, but so far we haven't done the final step of actually registering the known embedding models and making sure they are extremely easy to use. This is one step towards that. ## Test Plan Run existing inference tests. ```bash $ cd llama_stack/providers/tests/inference $ pytest -s -v -k fireworks test_embeddings.py \ --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k together test_embeddings.py \ --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784 $ pytest -s -v -k ollama test_embeddings.py \ --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784 ``` The value of the EMBEDDING_DIMENSION isn't actually used in these tests, it is merely used by the test fixtures to check if the model is an LLM or Embedding.	2025-02-20 15:39:08 -08:00
ehhuang	1166afdf76	fix: some telemetry APIs don't currently work (#1188 ) Summary: This bug is surfaced by using the http LS client. The issue is that non-scalar values in 'GET' method are `body` params in fastAPI, but our spec generation script doesn't respect that. We fix by just making them POST method instead. Test Plan: Test API call with newly sync'd client (https://github.com/meta-llama/llama-stack-client-python/pull/149) <img width="1114" alt="image" src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2" />	2025-02-20 14:09:25 -08:00
Xi Yan	ea1faae50e	chore!: deprecate eval/tasks (#1186 ) # What does this PR do? - Fully deprecate eval/tasks [//]: # (If resolving an issue, uncomment and update the line below) Closes #1088 NOTE: this will be a breaking change. We have introduced the new API in 0.1.3 . Notebook has been updated to use the new endpoints. ## Test Plan ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="611" alt="image" src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f" /> cc @SLR722 for awareness [//]: # (## Documentation)	2025-02-20 14:06:21 -08:00
Ashwin Bharambe	07ccf908f7	ModelAlias -> ProviderModelEntry	2025-02-20 14:02:36 -08:00
Kevin Cogan	561295af76	docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script (#1129 ) # What does this PR do? This PR improves the documentation in several ways: - Fixed incorrect link in `tools.md` to ensure all references point to the correct resources. - Added instructions for running the `code-interpreter` agent in a Podman container, helping users configure and execute the tool in containerized environments. - Introduced an unregister command for single and multiple vector databases, making it easier to manage vector DBs. - Provided a simple example script for using the `code-interpreter` agent, giving users a practical reference for implementation. These updates enhance the clarity, usability, and completeness of the documentation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan The following steps were performed to verify the accuracy of the changes: 1. Validated all fixed link by checking their destinations to ensure correctness. 2. Ran the `code-interpreter` agent in a Podman container following the new instructions to confirm functionality. 3. Executed the vector database unregister commands and verified that both single and multiple databases were correctly removed. 4. Tested the new example script for `code-interpreter`, ensuring it runs without errors. All changes were reviewed and tested successfully, improving the documentation's accuracy and ease of use. [//]: # (## Documentation)	2025-02-20 13:52:14 -08:00
Vladimir Ivić	f7161611c6	feat: adding endpoints for files and uploads (#1070 ) Summary: Adds spec definitions for file uploads operations. This API focuses around two high level operations: * Initiating and managing upload session * Accessing uploaded file information Usage examples: To start a file upload session: ``` curl -X POST https://localhost:8321/v1/files \ -d '{ "key": "image123.jpg', "bucket": "images", "mime_type": "image/jpg", "size": 12345 }' # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 0, "size": 12345 } ``` To upload file content to an existing session ``` curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \ --data-binary @<path_to_local_file> # Returns { "key": "image123.jpg", "bucket": "images", "mime_type": "image/jpg", "bytes": 12345, "created_at": 1737492240 } # Implementing on server side (Flask example for simplicity): @app.route('/uploads/{upload_id}', methods=['POST']) def upload_content_to_session(upload_id): try: # Get the binary file data from the request body file_data = request.data # Save the file to disk save_path = f"./uploads/{upload_id}" with open(save_path, 'wb') as f: f.write(file_data) return {__uploaded_file_json__}, 200 except Exception as e: return 500 ``` To read information about an existing upload session ``` curl -i -X GET "https://localhost:8321/v1/files/session:<session_id> # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 1024, "size": 12345 } ``` To list buckets ``` GET /files # Returns { "data": [ {"name": "bucket1"}, {"name": "bucket2"}, ] } ``` To list all files in a bucket ``` GET /files/{bucket} # Returns { "data": [ { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, }, { "key": "persian_cat.jpg", "mime_type": "image/jpg", "bucket": "cats", "bytes": 39924, "created_at": 1727493440, }, ] } ``` To get specific file info ``` GET /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ``` To delete specific file ``` DELETE /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ```	2025-02-20 13:09:00 -08:00
Ben Browning	fbec826883	docs: Add note about distro_codegen.py and provider dependencies (#1175 ) # What does this PR do? This expands upon the existing distro_codegen.py text in the new API provider documentation to include a note about not including provider-specific dependencies in the code path that builds the distribution's template. Our distro_codegen pre-commit hook will catch this case anyway, but this attempts to inform provider authors ahead of time about that. ## Test Plan I built the docs website locally via the following: ``` pip install docs/requirements.txt sphinx-build -M html docs/source docs_output ``` Then, I opened that newly generated `docs_output/html/contributing/new_api_provider.html` in my browser and confirmed everything rendered correctly. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-02-20 09:23:46 -08:00
Sixian Yi	531940aea9	script for running client sdk tests (#895 ) # What does this PR do? Create a script for running all client-sdk tests on Async Library client, with the option to generate report ## Test Plan ``` python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report ``` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-19 22:38:06 -08:00
Yuan Tang	25cdab5b28	docs: Remove unused python-openapi and json-strong-typing in openapi_generator (#1167 ) This is no longer required to generated API reference after `5e7904ef6c`	2025-02-19 22:06:29 -08:00
Ashwin Bharambe	d39f8de619	Pin sphinx	2025-02-19 20:20:46 -08:00
Ashwin Bharambe	89fdb2c9e9	Try a different css file API for sphinx	2025-02-19 20:14:40 -08:00
Sébastien Han	26503ca1a4	docs: fix Python llama_stack_client SDK links (#1150 ) # What does this PR do? It seems that the llama_stack_client repo and the main repo were originally the same, causing links to point to local references. We’ve now updated them to use the correct llama_stack_client repo links. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-19 19:05:14 -08:00
Ben Browning	e9b8259cf9	fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123 ) # What does this PR do? Before this change, `distro_codegen.py` would only work if the user manually installed multiple provider-specific dependencies (see #1122). Now, users can run `distro_codegen.py` without any provider-specific dependencies because we avoid importing the entire provider implementations just to get the config needed to build the provider template. Concretely, this mostly means moving the MODEL_ALIASES (and related variants) definitions to a new models.py class within the provider implementation for those providers that require additional dependencies. It also meant moving a couple of imports from top-level imports to inside `get_adapter_impl` for some providers, which follows the pattern used by multiple existing providers. To ensure we don't regress and accidentally add new imports that cause distro_codegen.py to fail, the stubbed-in pre-commit hook for distro_codegen.py was uncommented and slightly tweaked to run via `uv run python ...` to ensure it runs with only the project's default dependencies and to run automatically instead of manually. Lastly, this updates distro_codegen.py itself to keep track of paths it might have changed and to only `git diff` those specific paths when checking for changed files instead of doing a diff on the entire working tree. The latter was overly broad and would require a user have no other unstaged changes in their working tree, even if those unstaged changes were unrelated to generated code. Now it only flags uncommitted changes for paths distro_codegen.py actually writes to. Our generated code was also out-of-date, presumably because of these issues, so this commit also has some updates to the generated code purely because it was out of sync, and the pre-commit hook now enforces things to be updated. (Closes #1122) ## Test Plan I manually tested distro_codegen.py and the pre-commit hook to verify those work as expected, flagging any uncommited changes and catching any imports that attempt to pull in provider-specific dependencies. However, I do not have valid api keys to the impacted provider implementations, and am unable to easily run the inference tests against each changed provider. There are no functional changes to the provider implementations here, but I'd appreciate a second set of eyes on the changed import statements and moving of MODEL_ALIASES type code to a separate models.py to ensure I didn't make any obvious errors. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-02-19 18:39:20 -08:00
Alessandro Sangiorgi	9e03df983e	fix(rag-example): add provider_id to avoid llama_stack_client 400 error (#1114 ) # What does this PR do? Add provider_id to avoid errors using the rag example with llama_stack_client `llama_stack_client.BadRequestError: Error code: 400 - {'detail': 'Invalid value: No provider specified and multiple providers available. Please specify a provider_id.'}` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Co-authored-by: Xi Yan <yanxi970830@gmail.com>	2025-02-19 15:37:25 -08:00
Ashwin Bharambe	034ece0011	Ensure that deprecations for fields follow through to OpenAPI	2025-02-19 13:54:04 -08:00
Ashwin Bharambe	31a5ba5268	Add title to the json schemas	2025-02-19 13:26:39 -08:00
Ashwin Bharambe	5e7904ef6c	Kill the older strong_typing code	2025-02-19 12:24:21 -08:00
ehhuang	8de7cf103b	feat: support tool_choice = {required, none, <function>} (#1059 ) Summary: titled Test Plan: added tests and LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-02-18 23:25:15 -05:00
Xi Yan	8585b95a28	rename	2025-02-18 16:02:44 -08:00
Reid	4e76d312fa	fix: modify the model id title for model list (#1095 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Re-check and based on the doc, the download model id, actually is model descriptor(also without `meta-llama/`). https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html ``` $ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor Fetching 8 files: 0%\| \| 0/8 [00:00<?, ?it/s] LICENSE.txt: 100%\|█████████████████████████████████████████████████████████████████████████████████████████████████████████\| 7.71k/7.71k [00:00<00:00, 10.5MB/s] $ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/ usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<--- $ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found $ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8 Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/?Policy...): ^CTraceback (most recent call last): $ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE] llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:26:41 -08:00
Reid	89d37687dd	chore: remove --no-list-templates option (#1121 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] From the code and the usage, seems cannot see that need to use `--no-list-templates` to handle, and also make the user confused from the help text, so try to remove it. ``` $ llama stack build --no-list-templates > Enter a name for your Llama Stack (e.g. my-local-stack): $ llama stack build > Enter a name for your Llama Stack (e.g. my-local-stack): before: $ llama stack build --help --list-templates, --no-list-templates Show the available templates for building a Llama Stack distribution (default: False) after: --list-templates Show the available templates for building a Llama Stack distribution ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-18 10:13:46 -08:00
Yuan Tang	6b1773d530	docs: Fix incorrect link and command for generating API reference (#1124 )	2025-02-15 22:05:23 -05:00

1 2 3 4 5 ...

393 commits