# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for providing actual implementation of OpenAI compatible prompts in Responses API. This is the follow up PR with actual implementation after introducing #3942 The need of this functionality was initiated in #3514. > Note, https://github.com/llamastack/llama-stack/pull/3514 is divided on three separate PRs. Current PR is the third of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3321 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Manual testing, CI workflow with added unit tests Comprehensive manual testing with new implementation: **Test Prompts with Images with text on them in Responses API:** I used this image for testing purposes: [iphone 17 image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de) 1. Upload an image: ``` curl -X POST http://localhost:8321/v1/files \ -H "Content-Type: multipart/form-data" \ -F "file=@/Users/ianmiller/iphone.jpeg" \ -F "purpose=assistants" ``` `{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%` 2. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.", "variables": ["product_name", "description", "product_photo"] }' ``` `{"prompt":"You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%` 3. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Accept: application/json, text/event-stream" \ -H "Content-Type: application/json" \ -d '{ "input": "Please analyze this product", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62", "version": "1", "variables": { "product_name": { "type": "input_text", "text": "iPhone 17 Pro Max" }, "product_photo": { "type": "input_image", "file_id": "file-d6d375f238e14f21952cc40246bc8504", "detail": "high" } } } }' ``` `{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"### Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n- **Display & Design:**\n - The 6.9-inch display is large, ideal for streaming and productivity.\n - Anti-reflective technology and 120Hz refresh rate enhance viewing experience, providing smoother visuals and reducing glare.\n - Titanium frame suggests a premium build, offering durability and a sleek appearance.\n\n- **Performance:**\n - The Apple A19 Pro chip promises significant performance improvements, likely leading to faster processing and efficient multitasking.\n - 12GB RAM is substantial for a smartphone, ensuring smooth operation for demanding apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup (wide, ultra-wide, telephoto) is designed for versatile photography needs, capturing high-resolution photos and videos.\n - The 24MP front camera will appeal to selfie enthusiasts and content creators needing quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support indicates future-proof wireless capabilities, providing faster and more reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech Enthusiasts:** Individuals interested in cutting-edge technology and performance.\n- **Content Creators:** Users who need a robust camera system for photo and video production.\n- **Luxury Consumers:** Those who prefer premium materials and top-of-the-line specs.\n- **Professionals:** Users who require efficient multitasking and productivity features.\n\n**Pricing Recommendations:**\n\n- Given the premium specifications, a higher price point is expected. Consider pricing competitively within the high-end smartphone market while justifying cost through unique features like the titanium frame and advanced connectivity options.\n- Positioning around the $1,200 to $1,500 range would align with expectations for top-tier devices, catering to its target audience while ensuring profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of innovative features and premium design, aimed at users seeking high performance and superior aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone 17 Pro Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%` **Test Prompts with PDF files in Responses API:** I used this PDF file for testing purposes: [invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf) 1. Upload PDF: ``` curl -X POST http://localhost:8321/v1/files \ -H "Content-Type: multipart/form-data" \ -F "file=@/Users/ianmiller/invoicesample.pdf" \ -F "purpose=assistants" ``` `{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%` 2. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis", "variables": ["invoice_doc"] }' ``` `{"prompt":"You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%` 3. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Content-Type: application/json" \ -d '{ "input": "Please provide a detailed analysis of this invoice", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc", "version": "1", "variables": { "invoice_doc": { "type": "input_file", "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e", "filename": "invoicesample.pdf" } } } }' ``` `{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's a detailed analysis of the invoice provided:\n\n### Seller Information\n- **Business Name:** The invoice features a logo with \"Sunny Farm\" indicating the business identity.\n- **Address:** 123 Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n### Transaction Details\n- **Invoice Number:** #20130304\n- **Date of Transaction:** Not explicitly mentioned, likely inferred from the invoice number or needs clarification.\n\n### Items Purchased\n1. **Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal: $5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n - Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3 kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity: 2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n - Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n- **Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10% of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n### Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor sit amet...\" which is typically used as filler text. This might indicate a section intended for terms, conditions, or additional notes that haven’t been completed.\n\n### Visual and Design Elements\n- The invoice uses a simple and clear layout, featuring the business logo prominently and stating essential information such as contact and transaction details in a structured manner.\n- There is a \"Thank You\" note at the bottom, which adds a professional and courteous touch.\n\n### Considerations\n- Ensure the date of the transaction is clear if there are any future references needed.\n- Replace filler text with relevant terms and conditions or any special instructions pertaining to the transaction.\n\nThis invoice appears standard, representing a small business transaction with clearly itemized products and applicable taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%` **Test simple text Prompt in Responses API:** 1. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.", "variables": ["name", "company", "role", "tone"] }' ``` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%` 2. Create response: ``` curl -X POST http://localhost:8321/v1/responses \ -H "Accept: application/json, text/event-stream" \ -H "Content-Type: application/json" \ -d '{ "input": "What is the capital of Ireland?", "model": "openai/gpt-4o", "store": true, "prompt": { "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef", "version": "1", "variables": { "name": { "type": "input_text", "text": "Alice" }, "company": { "type": "input_text", "text": "Dummy Company" }, "role": { "type": "input_text", "text": "Geography expert" }, "tone": { "type": "input_text", "text": "professional and helpful" } } } }' ``` `{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The capital of Ireland is Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy Company","type":"input_text"},"role":{"text":"Geography expert","type":"input_text"},"tone":{"text":"professional and helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%` |
||
|---|---|---|
| .github | ||
| benchmarking/k8s-benchmark | ||
| client-sdks/stainless | ||
| containers | ||
| docs | ||
| scripts | ||
| src | ||
| tests | ||
| .coveragerc | ||
| .dockerignore | ||
| .gitattributes | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| CHANGELOG.md | ||
| CODE_OF_CONDUCT.md | ||
| CONTRIBUTING.md | ||
| coverage.svg | ||
| LICENSE | ||
| MANIFEST.in | ||
| pyproject.toml | ||
| README.md | ||
| SECURITY.md | ||
| uv.lock | ||
Llama Stack
Quick Start | Documentation | Colab Notebook | Discord
🚀 One-Line Installer 🚀
To try Llama Stack locally, run:
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash
Overview
Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides
- Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals.
- Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
- Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
- Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
- Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
Llama Stack Benefits
- Flexible Options: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
- Consistent Experience: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
- Robust Ecosystem: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.
By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.
API Providers
Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Please checkout for full list
| API Provider Builder | Environments | Agents | Inference | VectorIO | Safety | Post Training | Eval | DatasetIO |
|---|---|---|---|---|---|---|---|---|
| Meta Reference | Single Node | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SambaNova | Hosted | ✅ | ✅ | |||||
| Cerebras | Hosted | ✅ | ||||||
| Fireworks | Hosted | ✅ | ✅ | ✅ | ||||
| AWS Bedrock | Hosted | ✅ | ✅ | |||||
| Together | Hosted | ✅ | ✅ | ✅ | ||||
| Groq | Hosted | ✅ | ||||||
| Ollama | Single Node | ✅ | ||||||
| TGI | Hosted/Single Node | ✅ | ||||||
| NVIDIA NIM | Hosted/Single Node | ✅ | ✅ | |||||
| ChromaDB | Hosted/Single Node | ✅ | ||||||
| Milvus | Hosted/Single Node | ✅ | ||||||
| Qdrant | Hosted/Single Node | ✅ | ||||||
| Weaviate | Hosted/Single Node | ✅ | ||||||
| SQLite-vec | Single Node | ✅ | ||||||
| PG Vector | Single Node | ✅ | ||||||
| PyTorch ExecuTorch | On-device iOS | ✅ | ✅ | |||||
| vLLM | Single Node | ✅ | ||||||
| OpenAI | Hosted | ✅ | ||||||
| Anthropic | Hosted | ✅ | ||||||
| Gemini | Hosted | ✅ | ||||||
| WatsonX | Hosted | ✅ | ||||||
| HuggingFace | Single Node | ✅ | ✅ | |||||
| TorchTune | Single Node | ✅ | ||||||
| NVIDIA NEMO | Hosted | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| NVIDIA | Hosted | ✅ | ✅ | ✅ |
Note
: Additional providers are available through external packages. See External Providers documentation.
Distributions
A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:
| Distribution | Llama Stack Docker | Start This Distribution |
|---|---|---|
| Starter Distribution | llamastack/distribution-starter | Guide |
| Meta Reference | llamastack/distribution-meta-reference-gpu | Guide |
| PostgreSQL | llamastack/distribution-postgres-demo |
Documentation
Please checkout our Documentation page for more details.
- CLI references
- llama (server-side) CLI Reference: Guide for using the
llamaCLI to work with Llama models (download, study prompts), and building/starting a Llama Stack distribution. - llama (client-side) CLI Reference: Guide for using the
llama-stack-clientCLI, which allows you to query information about the distribution.
- llama (server-side) CLI Reference: Guide for using the
- Getting Started
- Quick guide to start a Llama Stack server.
- Jupyter notebook to walk-through how to use simple text and vision inference llama_stack_client APIs
- The complete Llama Stack lesson Colab notebook of the new Llama 3.2 course on Deeplearning.ai.
- A Zero-to-Hero Guide that guide you through all the key components of llama stack with code samples.
- Contributing
- Adding a new API Provider to walk-through how to add a new API provider.
Llama Stack Client SDKs
| Language | Client SDK | Package |
|---|---|---|
| Python | llama-stack-client-python | |
| Swift | llama-stack-client-swift | |
| Typescript | llama-stack-client-typescript | |
| Kotlin | llama-stack-client-kotlin |
Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from python, typescript, swift, and kotlin programming languages to quickly build your applications.
You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.
🌟 GitHub Star History
Star History
✨ Contributors
Thanks to all of our amazing contributors!