mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-09 13:14:39 +00:00
docs: add an AI frameworks with common OpenAI API compatibility section to AI Application Examples
This change attempts to build off of the existing "Agents vs. OpenAI Responses API" section of "AI Application Examples", and get into how several of the popular AI frameworks provide some form of OpenAI API compatibility, and how this fact can allow one to deploy such application on Llama Stack. This change also - introduces a simple LangChain/LangGraph example that runs on LLama Stack via use of OpenAI API compatibility API - circles back to the Responses API, and introduces a page of external references to examples - makes it clear that other OpenAI API compatibile AI frameworks can be added as the community has time to dive into them.
This commit is contained in:
parent
7394828c7a
commit
20fd5ff54c
7 changed files with 342 additions and 0 deletions
|
@ -0,0 +1,120 @@
|
|||
# Example: A multi-node LangGraph Agent Application that registers a simple tool that adds two numbers together
|
||||
|
||||
### Setup
|
||||
|
||||
#### Activate model
|
||||
|
||||
```bash
|
||||
ollama run llama3.2:3b-instruct-fp16 --keepalive 60m
|
||||
```
|
||||
|
||||
Note: this blocks the terminal as `ollama run` allows you to chat with the model. So use
|
||||
|
||||
```bash
|
||||
/bye
|
||||
```
|
||||
|
||||
to return to your command prompt. To confirm the model is in fact running, you can run
|
||||
|
||||
```bash
|
||||
ollama ps
|
||||
```
|
||||
|
||||
#### Start up Llama Stack
|
||||
|
||||
```bash
|
||||
OLLAMA_URL=http://localhost:11434 uv run --with llama-stack llama stack build --distro starter --image-type venv --run
|
||||
```
|
||||
|
||||
#### Install dependencies
|
||||
|
||||
In order to install LangChain, LangGraph, OpenAI, and their related dependencies, run
|
||||
|
||||
```bash
|
||||
uv pip install langgraph langchain openai langchain_openai langchain_community
|
||||
```
|
||||
|
||||
### Application details
|
||||
|
||||
To run the application, from the root of your Llama Stack git repository clone, execute:
|
||||
|
||||
```bash
|
||||
python docs/source/building_applications/langgraph-agent-add.py
|
||||
```
|
||||
|
||||
and you should see this in the output:
|
||||
|
||||
```bash
|
||||
HUMAN: What is 16 plus 9?
|
||||
AI:
|
||||
TOOL: 25
|
||||
```
|
||||
|
||||
The sample also adds some debug that illustrates the use of the Open AI Chat Completion API, as the `response.response_metadata`
|
||||
field equates to the [Chat Completion Object](https://platform.openai.com/docs/api-reference/chat/object).
|
||||
|
||||
```bash
|
||||
LLM returned Chat Completion object: {'token_usage': {'completion_tokens': 23, 'prompt_tokens': 169, 'total_tokens': 192, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'llama3.2:3b-instruct-fp16', 'system_fingerprint': 'fp_ollama', 'id': 'chatcmpl-51307b80-a1a1-4092-b005-21ea9cde29a0', 'service_tier': None, 'finish_reason': 'tool_calls', 'logprobs': None}
|
||||
```
|
||||
|
||||
This is analogous to the object returned by the Llama Stack Client's `chat.completions.create` call.
|
||||
|
||||
The example application leverages a series of LangGraph and LangChain API. The two keys ones are:
|
||||
1. [ChatOpenAI](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#chatopenai) is the primary LangChain Open AI compatible chat API. The standard parameters fort his API supply the Llama Stack OpenAI provider endpoint, followed by a model registered with Llama Stack.
|
||||
2. [StateGraph](https://langchain-ai.github.io/langgraph/reference/graphs/#langgraph.graph.state.StateGraph) provides the LangGraph API for building the nodes and edges of the graph that define (potential) steps of the agentic workflow.
|
||||
|
||||
Additional LangChain API are leveraged in order to:
|
||||
- register or [bind](https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html#langchain_openai.chat_models.base.ChatOpenAI.bind_tools) the tool used by the LangGraph agentic workflow
|
||||
- [process](https://api.python.langchain.com/en/latest/agents/langchain.agents.format_scratchpad.openai_tools.format_to_openai_tool_messages.html) any messages generated by the workflow
|
||||
- supply [user](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.human.HumanMessage.html) and [tool](https://python.langchain.com/api_reference/core/messages/langchain_core.messages.tool.ToolMessage.html) prompts
|
||||
|
||||
Ultimately, this agentic workflow application performs the simple task of adding numbers together.
|
||||
|
||||
```{literalinclude} ./langgraph-agent-add.py
|
||||
:language: python
|
||||
```
|
||||
|
||||
#### Minor application tweak - the OpenAI Responses API
|
||||
|
||||
It is very easy to switch from the default OpenAI Chat Completion API to the newer OpenAI Responses API. Simply modify
|
||||
the `ChatOpenAI` instantiator with the additional `use_responses_api=True` flag:
|
||||
|
||||
```python
|
||||
llm = ChatOpenAI(
|
||||
model="ollama/llama3.2:3b-instruct-fp16",
|
||||
openai_api_key="none",
|
||||
openai_api_base="http://localhost:8321/v1/openai/v1",
|
||||
use_responses_api=True).bind_tools(tools)
|
||||
```
|
||||
|
||||
For convenience, here is the entire sample with that change to the constructor:
|
||||
|
||||
```{literalinclude} ./langgraph-agent-add-via-responses.py
|
||||
:language: python
|
||||
```
|
||||
|
||||
If you are examining the Llama Stack server logs while running the application, you'll see use of the `/v1/openai/v1/responses` REST APIs instead of `/v1/openai/v1/chat/completions` REST APIs.
|
||||
|
||||
In the sample application's output, the debug statement displaying the response from the LLM will now illustrate that instead of the Chat Completion Object,
|
||||
the LLM returns a [Response Object from the Responses API](https://platform.openai.com/docs/api-reference/responses/object).
|
||||
|
||||
```bash
|
||||
LLM returned Responses object: {'id': 'resp-9dbaa1e1-7ba4-45cd-978e-e84448aee278', 'created_at': 1756140326.0, 'model': 'ollama/llama3.2:3b-instruct-fp16', 'object': 'response', 'status': 'completed', 'model_name': 'ollama/llama3.2:3b-instruct-fp16'}
|
||||
```
|
||||
|
||||
This is analogous to the object returned by the Llama Stack Client's `responses.create` call.
|
||||
|
||||
The Responses API is considered the next generation of OpenAI's core agentic API primitive.
|
||||
For a detailed comparison with and migration suggestions from the Chat API, visit the [Open AI documentation](https://platform.openai.com/docs/guides/migrate-to-responses).
|
||||
|
||||
### Comparing Llama Stack Agents and LangGraph Agents
|
||||
|
||||
Expressing the agent workflow as a LangGraph `StateGraph` is an alternative approach to the Llama Stack agent execution
|
||||
loop as discussed in [this prior section](../agent_execution_loop.md).
|
||||
|
||||
To summarize some of the key takeaways detailed earlier:
|
||||
- LangGraph does not offer the easy integration with Llama Stack's API providers, like say the shields / safety mechanisms, that Llama Stack Agents benefit from
|
||||
- Llama stack agents provide a simpler, predefined, sequence of steps incorporated in a loop is the standard execution pattern (similar to LangChain), where multiple LLM calls
|
||||
and tool invocations are possible.
|
||||
- LangGraph execution order is more flexible, with edges allowing for conditional branching along with loops. Each node is a LLM call
|
||||
or tool call. Also, a mutable state is passed between nodes, enabling complex, multi-turn interactions and adaptive behavior, i.e. workflow orchestration.
|
Loading…
Add table
Add a link
Reference in a new issue