llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-27 18:50:41 +00:00

Author	SHA1	Message	Date
ehhuang	31a3ae60f4	feat: openai files api (#2321 ) # What does this PR do? * Adds the OpenAI compatible Files API * Modified doc gen script to support multipart parameter ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/2321). * #2330 * __->__ #2321	2025-06-02 11:45:53 -07:00
ehhuang	047303e339	feat: introduce APIs for retrieving chat completion requests (#2145 ) # What does this PR do? This PR introduces APIs to retrieve past chat completion requests, which will be used in the LS UI. Our current `Telemetry` is ill-suited for this purpose as it's untyped so we'd need to filter by obscure attribute names, making it brittle. Since these APIs are 'provided by stack' and don't need to be implemented by inference providers, we introduce a new InferenceProvider class, containing the existing inference protocol, which is implemented by inference providers. The APIs are OpenAI-compliant, with an additional `input_messages` field. ## Test Plan This PR just adds the API and marks them provided_by_stack. S tart stack server -> doesn't crash	2025-05-18 21:43:19 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Ben Browning	8dfce2f596	feat: OpenAI Responses API (#1989 ) # What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-04-28 14:06:00 -07:00
Ashwin Bharambe	b8f1561956	feat: introduce llama4 support (#1877 ) As title says. Details in README, elsewhere.	2025-04-05 11:53:35 -07:00
Xi Yan	5287b437ae	feat(api): (1/n) datasets api clean up (#1573 ) ## PR Stack - https://github.com/meta-llama/llama-stack/pull/1573 - https://github.com/meta-llama/llama-stack/pull/1625 - https://github.com/meta-llama/llama-stack/pull/1656 - https://github.com/meta-llama/llama-stack/pull/1657 - https://github.com/meta-llama/llama-stack/pull/1658 - https://github.com/meta-llama/llama-stack/pull/1659 - https://github.com/meta-llama/llama-stack/pull/1660 Client SDK - https://github.com/meta-llama/llama-stack-client-python/pull/203 CI - `1391130488` <img width="1042" alt="image" src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca" /> -- the test_rag_agent_with_attachments is flaky and not related to this PR ## Doc <img width="789" alt="image" src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9" /> ## Client Usage ```python client.datasets.register( source={ "type": "uri", "uri": "lsfs://mydata.jsonl", }, schema="jsonl_messages", # optional dataset_id="my_first_train_data" ) # quick prototype debugging client.datasets.register( data_reference={ "type": "rows", "rows": [ "messages": [...], ], }, schema="jsonl_messages", ) ``` ## Test Plan - CI: `1387805545` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py ``` ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ```	2025-03-17 16:55:45 -07:00
Ashwin Bharambe	5547ef953c	feat: enhance OpenAPI spec to include Error types (#1320 ) # What does this PR do? An API spec must talk about Error handling. This was a pretty glaring omission so far. This PR begins to address it by adding a set of standard error responses we can attach to all our API calls. At a future point, we can add specific error types where necessary (although we should not hurry to do that; it is best done very late.) ## Test Plan Checked that Stainless SDK generation succeeds.	2025-02-28 11:16:12 -08:00
Vladimir Ivić	f7161611c6	feat: adding endpoints for files and uploads (#1070 ) Summary: Adds spec definitions for file uploads operations. This API focuses around two high level operations: * Initiating and managing upload session * Accessing uploaded file information Usage examples: To start a file upload session: ``` curl -X POST https://localhost:8321/v1/files \ -d '{ "key": "image123.jpg', "bucket": "images", "mime_type": "image/jpg", "size": 12345 }' # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 0, "size": 12345 } ``` To upload file content to an existing session ``` curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \ --data-binary @<path_to_local_file> # Returns { "key": "image123.jpg", "bucket": "images", "mime_type": "image/jpg", "bytes": 12345, "created_at": 1737492240 } # Implementing on server side (Flask example for simplicity): @app.route('/uploads/{upload_id}', methods=['POST']) def upload_content_to_session(upload_id): try: # Get the binary file data from the request body file_data = request.data # Save the file to disk save_path = f"./uploads/{upload_id}" with open(save_path, 'wb') as f: f.write(file_data) return {__uploaded_file_json__}, 200 except Exception as e: return 500 ``` To read information about an existing upload session ``` curl -i -X GET "https://localhost:8321/v1/files/session:<session_id> # Returns { “id”: <session_id> “url”: “https://localhost:8321/v1/files/session:<session_id>”, "offset": 1024, "size": 12345 } ``` To list buckets ``` GET /files # Returns { "data": [ {"name": "bucket1"}, {"name": "bucket2"}, ] } ``` To list all files in a bucket ``` GET /files/{bucket} # Returns { "data": [ { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, }, { "key": "persian_cat.jpg", "mime_type": "image/jpg", "bucket": "cats", "bytes": 39924, "created_at": 1727493440, }, ] } ``` To get specific file info ``` GET /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ``` To delete specific file ``` DELETE /files/{bucket}/{key} { "key": "shiba.jpg", "bucket": "dogs", "mime_type": "image/jpg", "bytes": 82334, "created_at": 1737492240, } ```	2025-02-20 13:09:00 -08:00
Ashwin Bharambe	314ee09ae3	chore: move all Llama Stack types from llama-models to llama-stack (#1098 ) llama-models should have extremely minimal cruft. Its sole purpose should be didactic -- show the simplest implementation of the llama models and document the prompt formats, etc. This PR is the complement to https://github.com/meta-llama/llama-models/pull/279 ## Test Plan Ensure all `llama` CLI `model` sub-commands work: ```bash llama model list llama model download --model-id ... llama model prompt-format -m ... ``` Ran tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/ LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/ LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/ ``` Create a fresh venv `uv venv && source .venv/bin/activate` and run `llama stack build --template fireworks --image-type venv` followed by `llama stack run together --image-type venv` <-- the server runs Also checked that the OpenAPI generator can run and there is no change in the generated files as a result. ```bash cd docs/openapi_generator sh run_openapi_generator.sh ```	2025-02-14 09:10:59 -08:00
Xi Yan	8b655e3cd2	fix!: update eval-tasks -> benchmarks (#1032 ) # What does this PR do? - Update `/eval-tasks` to `/benchmarks` - ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Ben Browning <ben324@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu <reid201711@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 16:40:58 -08:00
Xi Yan	2fa9e3c941	fix: make backslash work in GET /models/{model_id:path} (#1068 )	2025-02-13 08:46:43 -08:00
Ashwin Bharambe	b17277b06a	Fix the OpenAPI HTML	2025-02-04 10:38:49 -08:00
Hardik Shah	a7b929f17e	Sec fixes as raised by bandit (#917 ) minor fixes to hashlib and jinja	2025-01-31 13:44:26 -08:00
Xi Yan	15dcc4ea5e	openapi gen return type fix for streaming/non-streaming (#910 ) # What does this PR do? We need to change ```yaml /v1/inference/chat-completion: post: responses: '200': description: >- If stream=False, returns a ChatCompletionResponse with the full completion. If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk content: text/event-stream: schema: oneOf: - $ref: '#/components/schemas/ChatCompletionResponse' - $ref: '#/components/schemas/ChatCompletionResponseStreamChunk' ``` into ```yaml /v1/inference/chat-completion: post: responses: '200': description: >- If stream=False, returns a ChatCompletionResponse with the full completion. If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk content: text/event-stream: schema: $ref: '#/components/schemas/ChatCompletionResponseStreamChunk' application/json: schema: $ref: '#/components/schemas/ChatCompletionResponse' ``` ## Test Plan Python - tested in SDK sync: https://github.com/meta-llama/llama-stack-client-python/pull/108 Node - tested w/ https://gist.github.com/yanxi0830/b782f4b91e21dcccdfef8898ce55157e (SDK udpate follow up) ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-30 18:03:02 -08:00
Ashwin Bharambe	0d96070af9	Update OpenAPI generator to add param and field documentation (#896 ) We desperately need to document our APIs. This is the basic requirement of having a Spec :) This PR updates the OpenAPI generator so documentation for request parameters and object fields can be properly added to the OpenAPI specs. From there, this should get picked by Stainless, etc. ## Test Plan: Updated client-sdk (See https://github.com/meta-llama/llama-stack-client-python/pull/104) and then ran: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py ```	2025-01-29 10:04:30 -08:00
Ashwin Bharambe	9f709387e2	Kill X-LlamaStack-{Client-Version, Provider-Data} from OpenAPI spec ClientVersion: We don't need each SDK method to support this parameter because you wouldn't be passing a different client version each time you make an API call. ProviderData: although in this case, you _could_ be passing different API keys depending on which SDK call you make, it makes for a confusing experience. It is best to initialize the LlamaStackClient with all the keys which are then passed in each request.	2025-01-28 13:30:23 -08:00
Ashwin Bharambe	ec3ebb5bcf	Use ruamel.yaml to format the OpenAPI spec (#892 ) Stainless ends up reformatting the YAML when we paste it in the Studio. We cannot have that happen if we are going to ever partially automate stainless config updates. Try ruamel.yaml, specifically `block_seq_indent` to avoid that.	2025-01-28 11:27:40 -08:00
Ashwin Bharambe	b78e6675ea	llama-stack version alpha -> v1	2025-01-15 05:58:09 -08:00
Ashwin Bharambe	ffc6bd4805	Add X-LlamaStack-Client-Version, rename ProviderData -> Provider-Data (#735 ) Add another header so client SDKs can identify their versions which can be used for immediate detection of possible compatibility issues. A semver mismatch against the wrong server should be immediately flagged and requests should be denied. Also change `X-LlamaStack-ProviderData` to `X-LlamaStack-Provider-Data` since that hyphenation is better.	2025-01-09 11:51:36 -08:00
Xi Yan	d97cfaa9d9	[docs] add openapi spec to docs (#508 ) # What does this PR do? - modify openapi generator to add coming soon tag for unimplemented api - sphinx-redocs extension for openapi spec to readthedocs page ## Test Plan https://github.com/user-attachments/assets/b4c7eebc-2361-4198-a987-dbfbcff914cf ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-22 17:54:32 -08:00
Ashwin Bharambe	8ed79ad0f3	Fix the pyopenapi generator avoid potential circular imports	2024-11-18 23:37:52 -08:00
Ashwin Bharambe	0dc7f5fa89	Add version to REST API url (#478 ) # What does this PR do? Adds a `/alpha/` prefix to all the REST API urls. Also makes them all use hyphens instead of underscores as is more standard practice. (This is based on feedback from our partners.) ## Test Plan The Stack itself does not need updating. However, client SDKs and documentation will need to be updated.	2024-11-18 22:44:14 -08:00
Ashwin Bharambe	bba6edd06b	Fix OpenAPI generation to have text/event-stream for streamable methods	2024-11-14 12:51:38 -08:00
Ashwin Bharambe	ec4fc800cc	[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92 ) This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack. Most important bits: * Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis. * We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are: - routing model A to ollama and model B to a remote provider like Together - routing shield A to local impl while shield B to a remote provider like Bedrock - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis * Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.	2024-09-23 14:22:22 -07:00
Xi Yan	2c1ad10710	move openapi from rfcs->docs	2024-09-18 16:09:17 -07:00

25 commits