Composable building blocks to build Llama Apps https://llama-stack.readthedocs.io
Find a file
Ian Miller 0757d5a917
feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for providing actual implementation of OpenAI
compatible prompts in Responses API. This is the follow up PR with
actual implementation after introducing #3942

The need of this functionality was initiated in #3514.

> Note, https://github.com/llamastack/llama-stack/pull/3514 is divided
on three separate PRs. Current PR is the third of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3321

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing, CI workflow with added unit tests

Comprehensive manual testing with new implementation:

**Test Prompts with Images with text on them in Responses API:**

I used this image for testing purposes: [iphone 17
image](https://github.com/user-attachments/assets/9e2ee821-e394-4bbd-b1c8-d48a3fa315de)

1. Upload an image:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/iphone.jpeg" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-d6d375f238e14f21952cc40246bc8504","bytes":556241,"created_at":1761750049,"expires_at":1793286049,"filename":"iphone.jpeg","purpose":"assistants"}%`

2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a product analysis expert. Analyze the following product:\n\nProduct Name: {{product_name}}\nDescription: {{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed analysis including quality assessment, target audience, and pricing recommendations.",
    "variables": ["product_name", "description", "product_photo"]
  }'
```

`{"prompt":"You are a product analysis expert. Analyze the following
product:\n\nProduct Name: {{product_name}}\nDescription:
{{description}}\n\nImage: {{product_photo}}\n\nProvide a detailed
analysis including quality assessment, target audience, and pricing
recommendations.","version":1,"prompt_id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":["product_name","description","product_photo"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please analyze this product",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62",
      "version": "1",
      "variables": {
        "product_name": {
          "type": "input_text",
          "text": "iPhone 17 Pro Max"
        },
         "product_photo": {
          "type": "input_image",
          "file_id": "file-d6d375f238e14f21952cc40246bc8504",
          "detail": "high"
        }
      }
    }
  }'
```


`{"created_at":1761750427,"error":null,"id":"resp_f897f914-e3b8-4783-8223-3ed0d32fcbc6","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"###
Product Analysis: iPhone 17 Pro Max\n\n**Quality Assessment:**\n\n-
**Display & Design:**\n - The 6.9-inch display is large, ideal for
streaming and productivity.\n - Anti-reflective technology and 120Hz
refresh rate enhance viewing experience, providing smoother visuals and
reducing glare.\n - Titanium frame suggests a premium build, offering
durability and a sleek appearance.\n\n- **Performance:**\n - The Apple
A19 Pro chip promises significant performance improvements, likely
leading to faster processing and efficient multitasking.\n - 12GB RAM is
substantial for a smartphone, ensuring smooth operation for demanding
apps and games.\n\n- **Camera System:**\n - The triple 48MP camera setup
(wide, ultra-wide, telephoto) is designed for versatile photography
needs, capturing high-resolution photos and videos.\n - The 24MP front
camera will appeal to selfie enthusiasts and content creators needing
quality front-facing shots.\n\n- **Connectivity:**\n - Wi-Fi 7 support
indicates future-proof wireless capabilities, providing faster and more
reliable internet connectivity.\n\n**Target Audience:**\n\n- **Tech
Enthusiasts:** Individuals interested in cutting-edge technology and
performance.\n- **Content Creators:** Users who need a robust camera
system for photo and video production.\n- **Luxury Consumers:** Those
who prefer premium materials and top-of-the-line specs.\n-
**Professionals:** Users who require efficient multitasking and
productivity features.\n\n**Pricing Recommendations:**\n\n- Given the
premium specifications, a higher price point is expected. Consider
pricing competitively within the high-end smartphone market while
justifying cost through unique features like the titanium frame and
advanced connectivity options.\n- Positioning around the $1,200 to
$1,500 range would align with expectations for top-tier devices,
catering to its target audience while ensuring
profitability.\n\nOverall, the iPhone 17 Pro Max showcases a blend of
innovative features and premium design, aimed at users seeking high
performance and superior
aesthetics.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_66f4d844-4d9e-4102-80fc-eb75b34b6dbd","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_7be2208cb82cdbc35356354dae1f335d1e9b7baeca21ea62","variables":{"product_name":{"text":"iPhone
17 Pro
Max","type":"input_text"},"product_photo":{"detail":"high","type":"input_image","file_id":"file-d6d375f238e14f21952cc40246bc8504","image_url":null}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":830,"output_tokens":394,"total_tokens":1224,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test Prompts with PDF files in Responses API:**

I used this PDF file for testing purposes:
[invoicesample.pdf](https://github.com/user-attachments/files/22958943/invoicesample.pdf)

1. Upload PDF:

```
curl -X POST http://localhost:8321/v1/files \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/Users/ianmiller/invoicesample.pdf" \
  -F "purpose=assistants"
```


`{"object":"file","id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","bytes":149568,"created_at":1761750730,"expires_at":1793286730,"filename":"invoicesample.pdf","purpose":"assistants"}%`


2. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are an accounting and financial analysis expert. Analyze the following invoice document:\n\nInvoice Document: {{invoice_doc}}\n\nProvide a comprehensive analysis",
    "variables": ["invoice_doc"]
  }'
```

`{"prompt":"You are an accounting and financial analysis expert. Analyze
the following invoice document:\n\nInvoice Document:
{{invoice_doc}}\n\nProvide a comprehensive
analysis","version":1,"prompt_id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":["invoice_doc"],"is_default":false}%`


3. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Please provide a detailed analysis of this invoice",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc",
      "version": "1",
      "variables": {
        "invoice_doc": {
          "type": "input_file",
          "file_id": "file-7fbb1043a4bb468cab60ffe4b8631d8e",
          "filename": "invoicesample.pdf"
        }
      }
    }
  }'
```


`{"created_at":1761750881,"error":null,"id":"resp_da866913-db06-4702-8000-174daed9dbbb","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"Here's
a detailed analysis of the invoice provided:\n\n### Seller
Information\n- **Business Name:** The invoice features a logo with
\"Sunny Farm\" indicating the business identity.\n- **Address:** 123
Somewhere St, Melbourne VIC 3000\n- **Contact Information:** Phone
number (03) 1234 5678\n\n### Buyer Information\n- **Name:** Denny
Gunawan\n- **Address:** 221 Queen St, Melbourne VIC 3000\n\n###
Transaction Details\n- **Invoice Number:** #20130304\n- **Date of
Transaction:** Not explicitly mentioned, likely inferred from the
invoice number or needs clarification.\n\n### Items Purchased\n1.
**Apple**\n - Price: $5.00/kg\n - Quantity: 1 kg\n - Subtotal:
$5.00\n\n2. **Orange**\n - Price: $1.99/kg\n - Quantity: 2 kg\n -
Subtotal: $3.98\n\n3. **Watermelon**\n - Price: $1.69/kg\n - Quantity: 3
kg\n - Subtotal: $5.07\n\n4. **Mango**\n - Price: $9.56/kg\n - Quantity:
2 kg\n - Subtotal: $19.12\n\n5. **Peach**\n - Price: $2.99/kg\n -
Quantity: 1 kg\n - Subtotal: $2.99\n\n### Financial Summary\n-
**Subtotal for Items:** $36.00\n- **GST (Goods and Services Tax):** 10%
of $36.00, which amounts to $3.60\n- **Total Amount Due:** $39.60\n\n###
Notes\n- The invoice includes a placeholder text: \"Lorem ipsum dolor
sit amet...\" which is typically used as filler text. This might
indicate a section intended for terms, conditions, or additional notes
that haven’t been completed.\n\n### Visual and Design Elements\n- The
invoice uses a simple and clear layout, featuring the business logo
prominently and stating essential information such as contact and
transaction details in a structured manner.\n- There is a \"Thank You\"
note at the bottom, which adds a professional and courteous
touch.\n\n### Considerations\n- Ensure the date of the transaction is
clear if there are any future references needed.\n- Replace filler text
with relevant terms and conditions or any special instructions
pertaining to the transaction.\n\nThis invoice appears standard,
representing a small business transaction with clearly itemized products
and applicable
taxes.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_39f3b39e-4684-4444-8e4d-e7395f88c9dc","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_72e2a184a86f32a568b6afb5455dca5c16bf3cc3f80092dc","variables":{"invoice_doc":{"type":"input_file","file_data":null,"file_id":"file-7fbb1043a4bb468cab60ffe4b8631d8e","file_url":null,"filename":"invoicesample.pdf"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":529,"output_tokens":513,"total_tokens":1042,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`

**Test simple text Prompt in Responses API:**

1. Create prompt:

```
 curl -X POST http://localhost:8321/v1/prompts \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":["name","company","role","tone"],"is_default":false}%`

2. Create response:

```
curl -X POST http://localhost:8321/v1/responses \
  -H "Accept: application/json, text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "What is the capital of Ireland?",
    "model": "openai/gpt-4o",
    "store": true,
    "prompt": {
      "id": "pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef",
      "version": "1",
      "variables": {
        "name": {
          "type": "input_text",
          "text": "Alice"
        },
        "company": {
          "type": "input_text",
          "text": "Dummy Company"
        },
        "role": {
          "type": "input_text",
          "text": "Geography expert"
        },
        "tone": {
          "type": "input_text",
          "text": "professional and helpful"
        }
      }
    }
  }'

```


`{"created_at":1761751097,"error":null,"id":"resp_1b037b95-d9ae-4ad0-8e76-d953897ecaef","model":"openai/gpt-4o","object":"response","output":[{"content":[{"text":"The
capital of Ireland is
Dublin.","type":"output_text","annotations":[]}],"role":"assistant","type":"message","id":"msg_8e7c72b6-2aa2-4da6-8e57-da4e12fa3ce2","status":"completed"}],"parallel_tool_calls":false,"previous_response_id":null,"prompt":{"id":"pmpt_f340a3164a4f65d975c774ffe38ea42d15e7ce4a835919ef","variables":{"name":{"text":"Alice","type":"input_text"},"company":{"text":"Dummy
Company","type":"input_text"},"role":{"text":"Geography
expert","type":"input_text"},"tone":{"text":"professional and
helpful","type":"input_text"}},"version":"1"},"status":"completed","temperature":null,"text":{"format":{"type":"text"}},"top_p":null,"tools":[],"truncation":null,"usage":{"input_tokens":47,"output_tokens":7,"total_tokens":54,"input_tokens_details":{"cached_tokens":0},"output_tokens_details":{"reasoning_tokens":0}},"instructions":null}%`
2025-11-19 11:48:11 -08:00
.github fix(ci): prefix stainless branches with fork author (#4187) 2025-11-19 10:09:12 -08:00
benchmarking/k8s-benchmark feat(prompts): attach prompts to storage stores in run configs (#3893) 2025-10-27 11:12:12 -07:00
client-sdks/stainless fix: Pydantic validation error with list-type metadata in vector search (#3797) (#4173) 2025-11-19 10:16:34 -08:00
containers fix(ci): use --no-cache instead of --no-cache-dir (#4081) 2025-11-05 12:14:02 -08:00
docs fix(docs): fix glob vulnerability (#4193) 2025-11-19 11:01:52 -08:00
scripts feat(tests): add TypeScript client integration test support (#4185) 2025-11-19 10:07:53 -08:00
src feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965) 2025-11-19 11:48:11 -08:00
tests feat(responses)!: implement support for OpenAI compatible prompts in Responses API (#3965) 2025-11-19 11:48:11 -08:00
.coveragerc chore: move src/llama_stack/ui to src/llama_stack_ui (#4068) 2025-11-04 15:21:49 -08:00
.dockerignore chore: use dockerfile for building containers (#3839) 2025-10-20 10:23:01 -07:00
.gitattributes chore: mark recordings as generated files (#3816) 2025-10-15 11:06:42 -07:00
.gitignore feat(tests): add TypeScript client integration test support (#4185) 2025-11-19 10:07:53 -08:00
.pre-commit-config.yaml test(api): pre-commit check to ensure API does not import llama_stack (#4160) 2025-11-17 13:23:43 -08:00
CHANGELOG.md docs: Update changelog (#3343) 2025-09-08 10:01:41 +02:00
CODE_OF_CONDUCT.md Initial commit 2024-07-23 08:32:33 -07:00
CONTRIBUTING.md feat(openapi): switch to fastapi-based generator (#3944) 2025-11-14 15:53:53 -08:00
coverage.svg test: Measure and track code coverage (#2636) 2025-07-18 18:08:36 +02:00
LICENSE Update LICENSE (#47) 2024-08-29 07:39:50 -07:00
MANIFEST.in chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
pyproject.toml refactor(storage): make { kvstore, sqlstore } as llama stack "internal" APIs (#4181) 2025-11-18 13:15:16 -08:00
README.md chore(docs): Remove Llama 4 support details from README (#4178) 2025-11-17 15:17:04 -08:00
SECURITY.md Create SECURITY.md 2024-10-08 13:30:40 -04:00
uv.lock chore: remove pyyaml and starlette duplication in pyproject (#4172) 2025-11-17 12:09:02 -08:00

Llama Stack

PyPI version PyPI - Downloads License Discord Unit Tests Integration Tests

Quick Start | Documentation | Colab Notebook | Discord

🚀 One-Line Installer 🚀

To try Llama Stack locally, run:

curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash

Overview

Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

  • Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals.
  • Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
  • Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
  • Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
  • Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
Llama Stack

Llama Stack Benefits

  • Flexible Options: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
  • Consistent Experience: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
  • Robust Ecosystem: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.

By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.

API Providers

Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack. Please checkout for full list

API Provider Builder Environments Agents Inference VectorIO Safety Post Training Eval DatasetIO
Meta Reference Single Node
SambaNova Hosted
Cerebras Hosted
Fireworks Hosted
AWS Bedrock Hosted
Together Hosted
Groq Hosted
Ollama Single Node
TGI Hosted/Single Node
NVIDIA NIM Hosted/Single Node
ChromaDB Hosted/Single Node
Milvus Hosted/Single Node
Qdrant Hosted/Single Node
Weaviate Hosted/Single Node
SQLite-vec Single Node
PG Vector Single Node
PyTorch ExecuTorch On-device iOS
vLLM Single Node
OpenAI Hosted
Anthropic Hosted
Gemini Hosted
WatsonX Hosted
HuggingFace Single Node
TorchTune Single Node
NVIDIA NEMO Hosted
NVIDIA Hosted

Note

: Additional providers are available through external packages. See External Providers documentation.

Distributions

A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

Distribution Llama Stack Docker Start This Distribution
Starter Distribution llamastack/distribution-starter Guide
Meta Reference llamastack/distribution-meta-reference-gpu Guide
PostgreSQL llamastack/distribution-postgres-demo

Documentation

Please checkout our Documentation page for more details.

Llama Stack Client SDKs

Language Client SDK Package
Python llama-stack-client-python PyPI version
Swift llama-stack-client-swift Swift Package Index
Typescript llama-stack-client-typescript NPM version
Kotlin llama-stack-client-kotlin Maven version

Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from python, typescript, swift, and kotlin programming languages to quickly build your applications.

You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.

🌟 GitHub Star History

Star History

Star History Chart

Contributors

Thanks to all of our amazing contributors!