docs: add an AI frameworks with common OpenAI API compatibility section to AI Application Examples

This change attempts to build off of the existing "Agents vs. OpenAI Responses API" section of "AI Application Examples", and get into how several of the popular AI frameworks provide some form of OpenAI API compatibility, and how this fact can allow one to deploy such application on Llama Stack. This change also - introduces a simple LangChain/LangGraph example that runs on LLama Stack via use of OpenAI API compatibility API - circles back to the Responses API, and introduces a page of external references to examples - makes it clear that other OpenAI API compatibile AI frameworks can be added as the community has time to dive into them.
2025-12-18 20:48:41 +00:00 · 2025-08-24 20:54:23 -04:00 · 2025-08-24 20:54:23 -04:00 · 20fd5ff54c
commit 20fd5ff54c
parent 7394828c7a
7 changed files with 342 additions and 0 deletions
--- a/docs/source/building_applications/index.md
+++ b/docs/source/building_applications/index.md
@ -12,6 +12,7 @@ Here are some key topics that will help you build effective agents:
 - **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
 - **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
 - **[Agents vs Responses API](responses_vs_agents)**: Learn the differences between the Agents API and Responses API, and when to use each one.
 - **[OpenAI API](more_on_openai_compatibility)**: Learn how Llama Stack's OpenAI API Compatibility also allows for use of other AI frameworks on the platform.
 - **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
 - **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
 - **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
@ -25,6 +26,7 @@ rag
 agent
 agent_execution_loop
 responses_vs_agents
 more_on_openai_compatibility
 tools
 evals
 telemetry
--- a/docs/source/building_applications/langchain_langgraph/index.md
+++ b/docs/source/building_applications/langchain_langgraph/index.md
@ -0,0 +1,36 @@
 # OpenAI, LangChain, and LangGraph via Llama Stack
 One popular AI framework that exposes Open AI API compatibility is LangChain, with its [OpenAI Provider](https://python.langchain.com/docs/integrations/providers/openai/).
 With LangChain's OpenAI API compatibility, and using the Llama Stack OpenAI-compatible endpoint URL (`http://localhost:8321/v1/openapi/v1`, for example, if you are running Llama Stack
 locally) as the Open AI API provider, you can run your existing LangChain AI applications in your Llama Stack environment.
 There is also LangGraph, an associated by separate extension to the LangChain framework, to consider.  While LangChain is excellent for creating
 linear sequences of operations (chains), LangGraph allows for more dynamic workflows (graphs) with loops, branching, and persistent state.
 This makes LangGraph ideal for sophisticated agent-based systems where the flow of control is not predetermined.
 You can use your existing LangChain components in combination with LangGraph components to create more complex,
 multi-agent applications.
 As this LangChain/LangGraph section of the Llama Stack docs iterates and expands, a variety of samples that vary both in
 - How complex the application is
 - What aspects of Llama Stack are leveraged in conjunction with the application
 will be provided, as well as references to third party sites with samples.
 Local examples:
 - **[Starter](langchain_langgraph)**:  Explore a simple, graph-based agentic application that exposes a simple tool to add numbers together.
 External sites:
 - **[Responses](more_on_responses)**:  A deeper dive into the newer OpenAI Responses API (vs. the Chat Completion API).
 ```{toctree}
 :hidden:
 :maxdepth: 1
 langchain_langgraph
 more_on_responses
 ```
--- a/docs/source/building_applications/langchain_langgraph/langchain_langgraph.md
+++ b/docs/source/building_applications/langchain_langgraph/langchain_langgraph.md
@ -0,0 +1,120 @@
 # Example: A multi-node LangGraph Agent Application that registers a simple tool that adds two numbers together
 ### Setup
 #### Activate model
 ```bash
 ollama run llama3.2:3b-instruct-fp16 --keepalive 60m
 ```
 Note: this blocks the terminal as `ollama run` allows you to chat with the model.  So use
 ```bash
 /bye
 ```
 to return to your command prompt.  To confirm the model is in fact running, you can run
 ```bash
 ollama ps
 ```
 #### Start up Llama Stack
 ```bash
 OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack build --distro starter --image-type venv --run
 ```
 #### Install dependencies
 In order to install LangChain, LangGraph, OpenAI, and their related dependencies, run
 ```bash
 uv pip install langgraph langchain openai langchain_openai langchain_community
 ```
 ### Application details
 To run the application, from the root of your Llama Stack git repository clone, execute:
 ```bash
 python docs/source/building_applications/langgraph-agent-add.py
 ```
 and you should see this in the output:
 ```bash
 HUMAN: What is 16 plus 9?
 AI:
 TOOL: 25
 ```
 The sample also adds some debug that illustrates the use of the Open AI Chat Completion API, as the `response.response_metadata`
 field equates to the [Chat Completion Object](https://platform.openai.com/docs/api-reference/chat/object).
 ```bash
 LLM returned Chat Completion object: {'token_usage': {'completion_tokens': 23, 'prompt_tokens': 169, 'total_tokens': 192, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'llama3.2:3b-instruct-fp16', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-51307b80-a1a1-4092-b005-21ea9cde29a0', 'service_tier': None, 'finish_reason': 'tool_calls', 'logprobs': None}
 ```
 This is analogous to the object returned by the Llama Stack Client's `chat.completions.create` call.
 The example application leverages a series of LangGraph and LangChain API.  The two keys ones are:
 1. [ChatOpenAI](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#chatopenai) is the primary LangChain Open AI compatible chat API.  The standard parameters fort his API supply the Llama Stack OpenAI provider endpoint, followed by a model registered with Llama Stack.
 2. [StateGraph](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.state.StateGraph) provides the LangGraph API for building the nodes and edges of the graph that define (potential) steps of the agentic workflow.
 Additional LangChain API are leveraged in order to:
 - register or [bind](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.bind_tools) the tool used by the LangGraph agentic workflow
 - [process](https://api.python.langchain.com/en/latest/agents/langchain.agents.format_scratchpad.openai_tools.format_to_openai_tool_messages.html) any messages generated by the workflow
 - supply [user](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html) and [tool](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolMessage.html) prompts
 Ultimately, this agentic workflow application performs the simple task of adding numbers together.
 ```{literalinclude} ./langgraph-agent-add.py
 :language: python
 ```
 #### Minor application tweak - the OpenAI Responses API
 It is very easy to switch from the default OpenAI Chat Completion API to the newer OpenAI Responses API.  Simply modify
 the `ChatOpenAI` instantiator with the additional `use_responses_api=True` flag:
 ```python
 llm = ChatOpenAI(
    model="ollama/llama3.2:3b-instruct-fp16",
    openai_api_key="none",
    openai_api_base="http://localhost:8321/v1/openai/v1",
    use_responses_api=True).bind_tools(tools)
 ```
 For convenience, here is the entire sample with that change to the constructor:
 ```{literalinclude} ./langgraph-agent-add-via-responses.py
 :language: python
 ```
 If you are examining the Llama Stack server logs while running the application, you'll see use of the `/v1/openai/v1/responses` REST APIs instead of `/v1/openai/v1/chat/completions` REST APIs.
 In the sample application's output, the debug statement displaying the response from the LLM will now illustrate that instead of the Chat Completion Object,
 the LLM returns a [Response Object from the Responses API](https://platform.openai.com/docs/api-reference/responses/object).
 ```bash
 LLM returned Responses object: {'id': 'resp-9dbaa1e1-7ba4-45cd-978e-e84448aee278', 'created_at': 1756140326.0, 'model': 'ollama/llama3.2:3b-instruct-fp16', 'object': 'response', 'status': 'completed', 'model_name': 'ollama/llama3.2:3b-instruct-fp16'}
 ```
 This is analogous to the object returned by the Llama Stack Client's `responses.create` call.
 The Responses API is considered the next generation of OpenAI's core agentic API primitive.  
 For a detailed comparison with and migration suggestions from the Chat API, visit the [Open AI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
 ### Comparing Llama Stack Agents and LangGraph Agents
 Expressing the agent workflow as a LangGraph `StateGraph` is an alternative approach to the Llama Stack agent execution
 loop as discussed in [this prior section](../agent_execution_loop.md).
 To summarize some of the key takeaways detailed earlier:
 - LangGraph does not offer the easy integration with Llama Stack's API providers, like say the shields / safety mechanisms, that Llama Stack Agents benefit from
 - Llama stack agents provide a simpler, predefined, sequence of steps incorporated in a loop is the standard execution pattern (similar to LangChain), where multiple LLM calls
 and tool invocations are possible.
 - LangGraph execution order is more flexible, with edges allowing for conditional branching along with loops. Each node is a LLM call
 or tool call.  Also, a mutable state is passed between nodes, enabling complex, multi-turn interactions and adaptive behavior, i.e. workflow orchestration.
--- a/docs/source/building_applications/langchain_langgraph/langgraph-agent-add-via-responses.py
+++ b/docs/source/building_applications/langchain_langgraph/langgraph-agent-add-via-responses.py
@ -0,0 +1,72 @@
 # Copyright (c) Meta Platforms, Inc. and affiliates.
 # All rights reserved.
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 from langgraph.graph import StateGraph, END
 from langchain_core.messages import HumanMessage, ToolMessage
 from langchain.agents import tool
 from langchain_openai import ChatOpenAI
 from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
 # --- Tool ---
@tool
 def add_numbers(x: int, y: int) -> int:
    """Add two integers together."""
    return x + y
 tools = [add_numbers]
 # --- LLM that supports function-calling ---
 llm = ChatOpenAI(
    model="ollama/llama3.2:3b-instruct-fp16",
    openai_api_key="none",
    openai_api_base="http://localhost:8321/v1/openai/v1",
    use_responses_api=True
 ).bind_tools(tools)
 # --- Node that runs the agent ---
 def agent_node(state):
    messages = state["messages"]
    if "scratchpad" in state:
        messages += format_to_openai_tool_messages(state["scratchpad"])
    response = llm.invoke(messages)
    print(f"LLM returned Responses object: {response.response_metadata}")
    response.content
    return {
        "messages": messages + [response],
        "intermediate_step": response,
    }
 # --- Node that executes tool call ---
 def tool_node(state):
    tool_call = state["intermediate_step"].tool_calls[0]
    result = add_numbers.invoke(tool_call["args"])
    return {
        "messages": state["messages"] + [
            ToolMessage(tool_call_id=tool_call["id"], content=str(result))
        ]
    }
 # --- Build LangGraph ---
 graph = StateGraph(dict)
 graph.add_node("agent", agent_node)
 graph.add_node("tool", tool_node)
 graph.set_entry_point("agent")
 graph.add_edge("agent", "tool")
 graph.add_edge("tool", END)
 compiled_graph = graph.compile()
 # --- Run it ---
 initial_state = {
    "messages": [HumanMessage(content="What is 16 plus 9?")]
 }
 final_state = compiled_graph.invoke(initial_state)
 # --- Output ---
 for msg in final_state["messages"]:
    print(f"{msg.type.upper()}: {msg.content}")
--- a/docs/source/building_applications/langchain_langgraph/langgraph-agent-add.py
+++ b/docs/source/building_applications/langchain_langgraph/langgraph-agent-add.py
@ -0,0 +1,70 @@
 # Copyright (c) Meta Platforms, Inc. and affiliates.
 # All rights reserved.
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 from langgraph.graph import StateGraph, END
 from langchain_core.messages import HumanMessage, ToolMessage
 from langchain.agents import tool
 from langchain_openai import ChatOpenAI
 from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
 # --- Tool ---
@tool
 def add_numbers(x: int, y: int) -> int:
    """Add two integers together."""
    return x + y
 tools = [add_numbers]
 # --- LLM that supports function-calling ---
 llm = ChatOpenAI(
    model="ollama/llama3.2:3b-instruct-fp16",
    openai_api_key="none",
    openai_api_base="http://localhost:8321/v1/openai/v1"
 ).bind_tools(tools)
 # --- Node that runs the agent ---
 def agent_node(state):
    messages = state["messages"]
    if "scratchpad" in state:
        messages += format_to_openai_tool_messages(state["scratchpad"])
    response = llm.invoke(messages)
    print(f"LLM returned Chat Completion object: {response.response_metadata}")
    return {
        "messages": messages + [response],
        "intermediate_step": response,
    }
 # --- Node that executes tool call ---
 def tool_node(state):
    tool_call = state["intermediate_step"].tool_calls[0]
    result = add_numbers.invoke(tool_call["args"])
    return {
        "messages": state["messages"] + [
            ToolMessage(tool_call_id=tool_call["id"], content=str(result))
        ]
    }
 # --- Build LangGraph ---
 graph = StateGraph(dict)
 graph.add_node("agent", agent_node)
 graph.add_node("tool", tool_node)
 graph.set_entry_point("agent")
 graph.add_edge("agent", "tool")
 graph.add_edge("tool", END)
 compiled_graph = graph.compile()
 # --- Run it ---
 initial_state = {
    "messages": [HumanMessage(content="What is 16 plus 9?")]
 }
 final_state = compiled_graph.invoke(initial_state)
 # --- Output ---
 for msg in final_state["messages"]:
    print(f"{msg.type.upper()}: {msg.content}")
--- a/docs/source/building_applications/langchain_langgraph/more_on_responses.md
+++ b/docs/source/building_applications/langchain_langgraph/more_on_responses.md
@ -0,0 +1,21 @@
 # Deep dive references for Llama Stack, OpenAI Responses API, and LangChain/LangGraph
 Examples for dealing with combinations the LLama Stack Client API with say:
 - OpenAI Responses API
 - And a wide variety of frameworks, such as the LangChain API
 are rapidly evolving throughout various code repositories, blogs, and documentations sites.
 The list of scenarios included at such location are impossible to list and keep current, but for certain the
 minimally include such scenarios as:
 - Simple model inference
 - RAG with document search
 - Tool calling to MCP
 - Complex multi-step workflows.
 Rather then duplicate these Llama Stack Client related examples in this documentation site, this section will provide
 references to these external sites.
 ## The AI Alliance
 Consider the Responses API Examples detailed [here](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/README.md).
--- a/docs/source/building_applications/more_on_openai_compatibility.md
+++ b/docs/source/building_applications/more_on_openai_compatibility.md
@ -0,0 +1,21 @@
 # More on Llama Stack's OpenAI API Compatibility and other AI Frameworks
 Many of the other Agentic frameworks also recognize the value of providing OpenAI API compatibility to allow for coupling
 with their framework specific APIs, similar to the use of the OpenAI Responses API from a Llama Stack Client
 instance as described in the previous [Agents vs Responses API](responses_vs_agents) section.
 This OpenAI API compatibility becomes the "least common denominator" of sorts, and allows for migrating these agentic applications written
 with these other frameworks onto AI infrastructure leveraging Llama Stack.  Once on Llama Stack, the application maintainer
 can then leverage all the advantages Llama Stack can provide as summarized in the [Core Concepts section](../concepts/index.md).
 As the Llama Stack community continues to dive into these different AI Frameworks with Open AI API compatibility, a
 variety of documentation sections, examples, and references will be provided.  Here is what is currently available:
 - **[LangChain/LangGraph](langchain_langgraph/index)**: the LangChain and associated LangGraph AI Frameworks.
 ```{toctree}
 :hidden:
 :maxdepth: 1
 langchain_langgraph/index
 ```