diff --git a/MANIFEST.in b/MANIFEST.in
index b0d4e2866..4b76f85fe 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,4 +1,4 @@
 include requirements.txt
 include llama_toolchain/data/*.yaml
-include llama_toolchain/distribution/*.sh
+include llama_toolchain/core/*.sh
 include llama_toolchain/cli/scripts/*.sh
diff --git a/docs/cli_reference.md b/docs/cli_reference.md
index 03b98a57d..626e970ec 100644
--- a/docs/cli_reference.md
+++ b/docs/cli_reference.md
@@ -2,10 +2,10 @@
 
 The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-toolchain` package.
 
-### Subcommands 
-1. `download`: `llama` cli tools supports downloading the model from Meta or HuggingFace. 
+### Subcommands
+1. `download`: `llama` cli tools supports downloading the model from Meta or HuggingFace.
 2. `model`: Lists available models and their properties.
-3. `distribution`: A distribution is a set of REST APIs, this command allows you to manage (list, install, create, configure, start) distributions. You can read more about this [here](https://github.com/meta-llama/llama-stack/blob/main/docs/cli_reference.md#step-3-installing-and-configuring-distributions). 
+3. `stack`: Allows you to build and run a Llama Stack server. You can read more about this [here](https://github.com/meta-llama/llama-stack/blob/api_updates_1/docs/cli_reference.md#step-3-building-configuring-and-running-llama-stack-servers).
 
 ### Sample Usage
 
@@ -13,7 +13,7 @@ The `llama` CLI tool helps you setup and use the Llama toolchain & agentic syste
 llama --help
 ```
 <pre style="font-family: monospace;">
-usage: llama [-h] {download,model,distribution} ...
+usage: llama [-h] {download,model,stack,api} ...
 
 Welcome to the Llama CLI
 
@@ -21,7 +21,7 @@ options:
   -h, --help            show this help message and exit
 
 subcommands:
-  {download,model,distribution}
+  {download,model,stack,api}
 </pre>
 
 ## Step 1. Get the models
@@ -101,9 +101,9 @@ The `llama model` command helps you explore the model’s interface.
 
 ### 2.1 Subcommands
 1. `download`: Download the model from different sources. (meta, huggingface)
-2. `list`: Lists all the models available for download with hardware requirements to deploy the models. 
+2. `list`: Lists all the models available for download with hardware requirements to deploy the models.
 3. `template`: <TODO: What is a template?>
-4. `describe`: Describes all the properties of the model. 
+4. `describe`: Describes all the properties of the model.
 
 ### 2.2 Sample Usage
 
@@ -236,11 +236,13 @@ These commands can help understand the model interface and how prompts / message
 **NOTE**: Outputs in terminal are color printed to show special tokens.
 
 
-## Step 3: Installing and Configuring Distributions
+## Step 3: Building, Configuring and Running Llama Stack servers
 
 An agentic app has several components including model inference, tool execution and system safety shields. Running all these components is made simpler (we hope!) with Llama Stack Distributions.
 
-A Distribution is simply a collection of REST API providers that are part of the Llama stack. As an example, by running a simple command `llama distribution start`, you can bring up a server serving the following endpoints, among others:
+The Llama Stack is a collection of REST APIs. An API is _implemented_ by Provider. An assembly of Providers together provides the implementation for the Stack -- this package is called a Distribution.
+
+As an example, by running a simple command `llama stack run`, you can bring up a server serving the following endpoints, among others:
 ```
 POST /inference/chat_completion
 POST /inference/completion
@@ -253,103 +255,135 @@ POST /agentic_system/delete
 
 The agentic app can now simply point to this server to execute all its needed components.
 
-A distribution’s behavior can be configured by defining a specification or “spec”. This specification lays out the different API “Providers” that constitute this distribution.
+Lets build, configure and start a Llama Stack server specified via a "Distribution ID" to understand more !
 
-Lets install, configure and start a distribution to understand more !
-
-Let’s start with listing available distributions
+Let’s start with listing available distributions:
 ```
-llama distribution list
+llama stack list-distributions
 ```
 
 <pre style="font-family: monospace;">
-+--------------+---------------------------------------------+----------------------------------------------------------------------+
-| Spec ID      | ProviderSpecs                               | Description                                                          |
-+--------------+---------------------------------------------+----------------------------------------------------------------------+
-| local        | {                                           | Use code from `llama_toolchain` itself to serve all llama stack APIs |
-|              |   "inference": "meta-reference",            |                                                                      |
-|              |   "safety": "meta-reference",               |                                                                      |
-|              |   "agentic_system": "meta-reference"        |                                                                      |
-|              | }                                           |                                                                      |
-+--------------+---------------------------------------------+----------------------------------------------------------------------+
-| remote       | {                                           | Point to remote services for all llama stack APIs                    |
-|              |   "inference": "inference-remote",          |                                                                      |
-|              |   "safety": "safety-remote",                |                                                                      |
-|              |   "agentic_system": "agentic_system-remote" |                                                                      |
-|              | }                                           |                                                                      |
-+--------------+---------------------------------------------+----------------------------------------------------------------------+
-| local-ollama | {                                           | Like local, but use ollama for running LLM inference                 |
-|              |   "inference": "meta-ollama",               |                                                                      |
-|              |   "safety": "meta-reference",               |                                                                      |
-|              |   "agentic_system": "meta-reference"        |                                                                      |
-|              | }                                           |                                                                      |
-+--------------+---------------------------------------------+----------------------------------------------------------------------+
+i+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| Distribution ID                | Providers                             | Description                                                          |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| local                          | {                                     | Use code from `llama_toolchain` itself to serve all llama stack APIs |
+|                                |   "inference": "meta-reference",      |                                                                      |
+|                                |   "memory": "meta-reference-faiss",   |                                                                      |
+|                                |   "safety": "meta-reference",         |                                                                      |
+|                                |   "agentic_system": "meta-reference"  |                                                                      |
+|                                | }                                     |                                                                      |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| remote                         | {                                     | Point to remote services for all llama stack APIs                    |
+|                                |   "inference": "remote",              |                                                                      |
+|                                |   "safety": "remote",                 |                                                                      |
+|                                |   "agentic_system": "remote",         |                                                                      |
+|                                |   "memory": "remote"                  |                                                                      |
+|                                | }                                     |                                                                      |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| local-ollama                   | {                                     | Like local, but use ollama for running LLM inference                 |
+|                                |   "inference": "remote::ollama",      |                                                                      |
+|                                |   "safety": "meta-reference",         |                                                                      |
+|                                |   "agentic_system": "meta-reference", |                                                                      |
+|                                |   "memory": "meta-reference-faiss"    |                                                                      |
+|                                | }                                     |                                                                      |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| local-plus-fireworks-inference | {                                     | Use Fireworks.ai for running LLM inference                           |
+|                                |   "inference": "remote::fireworks",   |                                                                      |
+|                                |   "safety": "meta-reference",         |                                                                      |
+|                                |   "agentic_system": "meta-reference", |                                                                      |
+|                                |   "memory": "meta-reference-faiss"    |                                                                      |
+|                                | }                                     |                                                                      |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
+| local-plus-together-inference  | {                                     | Use Together.ai for running LLM inference                            |
+|                                |   "inference": "remote::together",    |                                                                      |
+|                                |   "safety": "meta-reference",         |                                                                      |
+|                                |   "agentic_system": "meta-reference", |                                                                      |
+|                                |   "memory": "meta-reference-faiss"    |                                                                      |
+|                                | }                                     |                                                                      |
++--------------------------------+---------------------------------------+----------------------------------------------------------------------+
 </pre>
 
-As you can see above, each “spec” details the “providers” that make up that spec. For eg. The `local` spec uses the “meta-reference” provider for inference while the `local-ollama` spec relies on a different provider ( ollama ) for inference.
+As you can see above, each “distribution” details the “providers” it is composed of. For example, `local` uses the “meta-reference” provider for inference while local-ollama relies on a different provider (Ollama) for inference. Similarly, you can use Fireworks or Together.AI for running inference as well.
 
-Lets install the fully local implementation of the llama-stack – named `local` above.
+To install a distribution, we run a simple command providing 2 inputs:
+- **Distribution Id** of the distribution that we want to install ( as obtained from the list-distributions command )
+- A **Name** for the specific build and configuration of this distribution.
 
-To install a distro, we run a simple command providing 2 inputs –
-- **Spec Id** of the distribution that we want to install ( as obtained from the list command )
-- A **Name** by which this installation will be known locally.
+Let's imagine you are working with a 8B-Instruct model. The following command will build a package (in the form of a Conda environment) _and_ configure it. As part of the configuration, you will be asked for some inputs (model_id, max_seq_len, etc.) Since we are working with a 8B model, we will name our build `8b-instruct` to help us remember the config.
 
 ```
-llama distribution install --spec local --name local_llama_8b
+llama stack build local --name 8b-instruct
 ```
 
-This will create a new conda environment (name can be passed optionally) and install dependencies (via pip) as required by the distro.
-
-Once it runs successfully , you should see some outputs in the form
+Once it runs successfully , you should see some outputs in the form:
 
 ```
-llama distribution install --spec local --name local_llama_8b
-```
-<pre style="font-family: monospace;">
+$ llama stack build local --name 8b-instruct
+....
+....
 Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
 
-Distribution `local_llama_8b` (with spec local) has been installed successfully!
-</pre>
+Successfully setup conda environment. Configuring build...
 
-Next step is to configure the distribution that you just installed. We provide a simple CLI tool to enable simple configuration.
-This command will walk you through the configuration process.
-It will ask for some details like model name, paths to models, etc.
+...
+...
 
-**NOTE**: You will have to download the models if not done already. Follow instructions here on how to download using the llama cli
-```
-llama distribution configure --name local_llama_8b
+YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
 ```
 
-Here is an example output of how the cli will guide you to fill the configuration:
-<pre style="font-family: monospace;">
-Configuring API surface: inference
+You can re-configure this distribution by running:
+```
+llama stack configure local --name 8b-instruct
+```
+
+Here is an example run of how the CLI will guide you to fill the configuration
+```
+$ llama stack configure local --name 8b-instruct
+
+Configuring API: inference (meta-reference)
 Enter value for model (required): Meta-Llama3.1-8B-Instruct
 Enter value for quantization (optional):
 Enter value for torch_seed (optional):
 Enter value for max_seq_len (required): 4096
 Enter value for max_batch_size (default: 1): 1
-Configuring API surface: safety
-Do you want to configure llama_guard_shield? (y/n): n
-Do you want to configure prompt_guard_shield? (y/n): n
-Configuring API surface: agentic_system
+Configuring API: safety (meta-reference)
+Do you want to configure llama_guard_shield? (y/n): y
+Entering sub-configuration for llama_guard_shield:
+Enter value for model (required): Llama-Guard-3-8B
+Enter value for excluded_categories (required): []
+Enter value for disable_input_check (default: False):
+Enter value for disable_output_check (default: False):
+Do you want to configure prompt_guard_shield? (y/n): y
+Entering sub-configuration for prompt_guard_shield:
+Enter value for model (required): Prompt-Guard-86M
+...
+...
+YAML configuration has been written to ~/.llama/builds/local/conda/8b-instruct.yaml
+```
 
-YAML configuration has been written to ~/.llama/distributions/local0/config.yaml
-</pre>
-
-As you can see, we did basic configuration above and configured inference to run on model Meta-Llama3.1-8B-Instruct ( obtained from the llama model list command ).
-For this initial setup we did not set up safety.
+As you can see, we did basic configuration above and configured:
+- inference to run on model `Meta-Llama3.1-8B-Instruct` (obtained from `llama model list`)
+- Llama Guard safety shield with model `Llama-Guard-3-8B`
+- Prompt Guard safety shield with model `Prompt-Guard-86M`
 
 For how these configurations are stored as yaml, checkout the file printed at the end of the configuration.
 
-## Step 4: Starting a Distribution and Testing it
+Note that all configurations as well as models are stored in `~/.llama`
 
-Now let’s start the distribution using the cli.
-```
-llama distribution start --name local_llama_8b --port 5000
-```
-You should see the distribution start and print the APIs that it is supporting:
+## Step 4: Starting a Llama Stack Distribution and Testing it
+
+Now let’s start Llama Stack server.
+
+You need the YAML configuration file which was written out at the end by the `llama stack build` step.
+
+```
+llama stack run local --name 8b-instruct --port 5000
+```
+You should see the Stack server start and print the APIs that it is supporting,
+
+```
+$ llama stack run local --name 8b-instruct --port 5000
 
-<pre style="font-family: monospace;">
 > initializing model parallel with size 1
 > initializing ddp with size 1
 > initializing pipeline with size 1
@@ -376,15 +410,23 @@ INFO:     Started server process [453333]
 INFO:     Waiting for application startup.
 INFO:     Application startup complete.
 INFO:     Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
-</pre>
-
-Lets test with a client
-
 ```
-cd /path/to/llama-toolchain
-conda activate <env-for-distribution> # ( Eg. local_llama_8b in above example )
 
-python -m  llama_toolchain.inference.client localhost 5000
+
+> [!NOTE]
+> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`.
+
+> [!IMPORTANT]
+> The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines.
+
+This server is running a Llama model locally.
+
+Lets test with a client.
+```
+cd /path/to/llama-stack
+conda activate <env>  # any environment containing the llama-toolchain pip package will work
+
+python -m llama_toolchain.inference.client localhost 5000
 ```
 
 This will run the chat completion client and query the distribution’s /inference/chat_completion API.
diff --git a/llama_toolchain/agentic_system/api/__init__.py b/llama_toolchain/agentic_system/api/__init__.py
index 4cefa053f..a7e55ba91 100644
--- a/llama_toolchain/agentic_system/api/__init__.py
+++ b/llama_toolchain/agentic_system/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa
-from .endpoints import *  # noqa
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/agentic_system/api/api.py b/llama_toolchain/agentic_system/api/api.py
new file mode 100644
index 000000000..e3f417918
--- /dev/null
+++ b/llama_toolchain/agentic_system/api/api.py
@@ -0,0 +1,413 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, List, Literal, Optional, Protocol, Union
+
+from llama_models.schema_utils import json_schema_type, webmethod
+
+from pydantic import BaseModel, ConfigDict, Field
+from typing_extensions import Annotated
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.common.deployment_types import *  # noqa: F403
+from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
+from llama_toolchain.memory.api import *  # noqa: F403
+
+
+@json_schema_type
+class Attachment(BaseModel):
+    content: InterleavedTextMedia | URL
+    mime_type: str
+
+
+class AgenticSystemTool(Enum):
+    brave_search = "brave_search"
+    wolfram_alpha = "wolfram_alpha"
+    photogen = "photogen"
+    code_interpreter = "code_interpreter"
+
+    function_call = "function_call"
+    memory = "memory"
+
+
+class ToolDefinitionCommon(BaseModel):
+    input_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
+    output_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
+
+
+@json_schema_type
+class BraveSearchToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.brave_search.value] = (
+        AgenticSystemTool.brave_search.value
+    )
+    remote_execution: Optional[RestAPIExecutionConfig] = None
+
+
+@json_schema_type
+class WolframAlphaToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.wolfram_alpha.value] = (
+        AgenticSystemTool.wolfram_alpha.value
+    )
+    remote_execution: Optional[RestAPIExecutionConfig] = None
+
+
+@json_schema_type
+class PhotogenToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.photogen.value] = AgenticSystemTool.photogen.value
+    remote_execution: Optional[RestAPIExecutionConfig] = None
+
+
+@json_schema_type
+class CodeInterpreterToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.code_interpreter.value] = (
+        AgenticSystemTool.code_interpreter.value
+    )
+    enable_inline_code_execution: bool = True
+    remote_execution: Optional[RestAPIExecutionConfig] = None
+
+
+@json_schema_type
+class FunctionCallToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.function_call.value] = (
+        AgenticSystemTool.function_call.value
+    )
+    function_name: str
+    description: str
+    parameters: Dict[str, ToolParamDefinition]
+    remote_execution: Optional[RestAPIExecutionConfig] = None
+
+
+class _MemoryBankConfigCommon(BaseModel):
+    bank_id: str
+
+
+class AgenticSystemVectorMemoryBankConfig(_MemoryBankConfigCommon):
+    type: Literal[MemoryBankType.vector.value] = MemoryBankType.vector.value
+
+
+class AgenticSystemKeyValueMemoryBankConfig(_MemoryBankConfigCommon):
+    type: Literal[MemoryBankType.keyvalue.value] = MemoryBankType.keyvalue.value
+    keys: List[str]  # what keys to focus on
+
+
+class AgenticSystemKeywordMemoryBankConfig(_MemoryBankConfigCommon):
+    type: Literal[MemoryBankType.keyword.value] = MemoryBankType.keyword.value
+
+
+class AgenticSystemGraphMemoryBankConfig(_MemoryBankConfigCommon):
+    type: Literal[MemoryBankType.graph.value] = MemoryBankType.graph.value
+    entities: List[str]  # what entities to focus on
+
+
+MemoryBankConfig = Annotated[
+    Union[
+        AgenticSystemVectorMemoryBankConfig,
+        AgenticSystemKeyValueMemoryBankConfig,
+        AgenticSystemKeywordMemoryBankConfig,
+        AgenticSystemGraphMemoryBankConfig,
+    ],
+    Field(discriminator="type"),
+]
+
+
+@json_schema_type
+class MemoryToolDefinition(ToolDefinitionCommon):
+    type: Literal[AgenticSystemTool.memory.value] = AgenticSystemTool.memory.value
+    memory_bank_configs: List[MemoryBankConfig] = Field(default_factory=list)
+    max_tokens_in_context: int = 4096
+    max_chunks: int = 10
+
+
+AgenticSystemToolDefinition = Annotated[
+    Union[
+        BraveSearchToolDefinition,
+        WolframAlphaToolDefinition,
+        PhotogenToolDefinition,
+        CodeInterpreterToolDefinition,
+        FunctionCallToolDefinition,
+        MemoryToolDefinition,
+    ],
+    Field(discriminator="type"),
+]
+
+
+class StepCommon(BaseModel):
+    turn_id: str
+    step_id: str
+    started_at: Optional[datetime] = None
+    completed_at: Optional[datetime] = None
+
+
+class StepType(Enum):
+    inference = "inference"
+    tool_execution = "tool_execution"
+    shield_call = "shield_call"
+    memory_retrieval = "memory_retrieval"
+
+
+@json_schema_type
+class InferenceStep(StepCommon):
+    model_config = ConfigDict(protected_namespaces=())
+
+    step_type: Literal[StepType.inference.value] = StepType.inference.value
+    model_response: CompletionMessage
+
+
+@json_schema_type
+class ToolExecutionStep(StepCommon):
+    step_type: Literal[StepType.tool_execution.value] = StepType.tool_execution.value
+    tool_calls: List[ToolCall]
+    tool_responses: List[ToolResponse]
+
+
+@json_schema_type
+class ShieldCallStep(StepCommon):
+    step_type: Literal[StepType.shield_call.value] = StepType.shield_call.value
+    response: ShieldResponse
+
+
+@json_schema_type
+class MemoryRetrievalStep(StepCommon):
+    step_type: Literal[StepType.memory_retrieval.value] = (
+        StepType.memory_retrieval.value
+    )
+    memory_bank_ids: List[str]
+    inserted_context: InterleavedTextMedia
+
+
+Step = Annotated[
+    Union[
+        InferenceStep,
+        ToolExecutionStep,
+        ShieldCallStep,
+        MemoryRetrievalStep,
+    ],
+    Field(discriminator="step_type"),
+]
+
+
+@json_schema_type
+class Turn(BaseModel):
+    """A single turn in an interaction with an Agentic System."""
+
+    turn_id: str
+    session_id: str
+    input_messages: List[
+        Union[
+            UserMessage,
+            ToolResponseMessage,
+        ]
+    ]
+    steps: List[Step]
+    output_message: CompletionMessage
+    output_attachments: List[Attachment] = Field(default_factory=list)
+
+    started_at: datetime
+    completed_at: Optional[datetime] = None
+
+
+@json_schema_type
+class Session(BaseModel):
+    """A single session of an interaction with an Agentic System."""
+
+    session_id: str
+    session_name: str
+    turns: List[Turn]
+    started_at: datetime
+
+    memory_bank: Optional[MemoryBank] = None
+
+
+class AgentConfigCommon(BaseModel):
+    sampling_params: Optional[SamplingParams] = SamplingParams()
+
+    input_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
+    output_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
+
+    tools: Optional[List[AgenticSystemToolDefinition]] = Field(default_factory=list)
+    tool_choice: Optional[ToolChoice] = Field(default=ToolChoice.auto)
+    tool_prompt_format: Optional[ToolPromptFormat] = Field(
+        default=ToolPromptFormat.json
+    )
+
+
+@json_schema_type
+class AgentConfig(AgentConfigCommon):
+    model: str
+    instructions: str
+
+
+class AgentConfigOverridablePerTurn(AgentConfigCommon):
+    instructions: Optional[str] = None
+
+
+class AgenticSystemTurnResponseEventType(Enum):
+    step_start = "step_start"
+    step_complete = "step_complete"
+    step_progress = "step_progress"
+
+    turn_start = "turn_start"
+    turn_complete = "turn_complete"
+
+
+@json_schema_type
+class AgenticSystemTurnResponseStepStartPayload(BaseModel):
+    event_type: Literal[AgenticSystemTurnResponseEventType.step_start.value] = (
+        AgenticSystemTurnResponseEventType.step_start.value
+    )
+    step_type: StepType
+    step_id: str
+    metadata: Optional[Dict[str, Any]] = Field(default_factory=dict)
+
+
+@json_schema_type
+class AgenticSystemTurnResponseStepCompletePayload(BaseModel):
+    event_type: Literal[AgenticSystemTurnResponseEventType.step_complete.value] = (
+        AgenticSystemTurnResponseEventType.step_complete.value
+    )
+    step_type: StepType
+    step_details: Step
+
+
+@json_schema_type
+class AgenticSystemTurnResponseStepProgressPayload(BaseModel):
+    model_config = ConfigDict(protected_namespaces=())
+
+    event_type: Literal[AgenticSystemTurnResponseEventType.step_progress.value] = (
+        AgenticSystemTurnResponseEventType.step_progress.value
+    )
+    step_type: StepType
+    step_id: str
+
+    model_response_text_delta: Optional[str] = None
+    tool_call_delta: Optional[ToolCallDelta] = None
+    tool_response_text_delta: Optional[str] = None
+
+
+@json_schema_type
+class AgenticSystemTurnResponseTurnStartPayload(BaseModel):
+    event_type: Literal[AgenticSystemTurnResponseEventType.turn_start.value] = (
+        AgenticSystemTurnResponseEventType.turn_start.value
+    )
+    turn_id: str
+
+
+@json_schema_type
+class AgenticSystemTurnResponseTurnCompletePayload(BaseModel):
+    event_type: Literal[AgenticSystemTurnResponseEventType.turn_complete.value] = (
+        AgenticSystemTurnResponseEventType.turn_complete.value
+    )
+    turn: Turn
+
+
+@json_schema_type
+class AgenticSystemTurnResponseEvent(BaseModel):
+    """Streamed agent execution response."""
+
+    payload: Annotated[
+        Union[
+            AgenticSystemTurnResponseStepStartPayload,
+            AgenticSystemTurnResponseStepProgressPayload,
+            AgenticSystemTurnResponseStepCompletePayload,
+            AgenticSystemTurnResponseTurnStartPayload,
+            AgenticSystemTurnResponseTurnCompletePayload,
+        ],
+        Field(discriminator="event_type"),
+    ]
+
+
+@json_schema_type
+class AgenticSystemCreateResponse(BaseModel):
+    agent_id: str
+
+
+@json_schema_type
+class AgenticSystemSessionCreateResponse(BaseModel):
+    session_id: str
+
+
+@json_schema_type
+class AgenticSystemTurnCreateRequest(AgentConfigOverridablePerTurn):
+    agent_id: str
+    session_id: str
+
+    # TODO: figure out how we can simplify this and make why
+    # ToolResponseMessage needs to be here (it is function call
+    # execution from outside the system)
+    messages: List[
+        Union[
+            UserMessage,
+            ToolResponseMessage,
+        ]
+    ]
+    attachments: Optional[List[Attachment]] = None
+
+    stream: Optional[bool] = False
+
+
+@json_schema_type
+class AgenticSystemTurnResponseStreamChunk(BaseModel):
+    event: AgenticSystemTurnResponseEvent
+
+
+@json_schema_type
+class AgenticSystemStepResponse(BaseModel):
+    step: Step
+
+
+class AgenticSystem(Protocol):
+    @webmethod(route="/agentic_system/create")
+    async def create_agentic_system(
+        self,
+        agent_config: AgentConfig,
+    ) -> AgenticSystemCreateResponse: ...
+
+    @webmethod(route="/agentic_system/turn/create")
+    async def create_agentic_system_turn(
+        self,
+        request: AgenticSystemTurnCreateRequest,
+    ) -> AgenticSystemTurnResponseStreamChunk: ...
+
+    @webmethod(route="/agentic_system/turn/get")
+    async def get_agentic_system_turn(
+        self,
+        agent_id: str,
+        turn_id: str,
+    ) -> Turn: ...
+
+    @webmethod(route="/agentic_system/step/get")
+    async def get_agentic_system_step(
+        self, agent_id: str, turn_id: str, step_id: str
+    ) -> AgenticSystemStepResponse: ...
+
+    @webmethod(route="/agentic_system/session/create")
+    async def create_agentic_system_session(
+        self,
+        agent_id: str,
+        session_name: str,
+    ) -> AgenticSystemSessionCreateResponse: ...
+
+    @webmethod(route="/agentic_system/session/get")
+    async def get_agentic_system_session(
+        self,
+        agent_id: str,
+        session_id: str,
+        turn_ids: Optional[List[str]] = None,
+    ) -> Session: ...
+
+    @webmethod(route="/agentic_system/session/delete")
+    async def delete_agentic_system_session(
+        self, agent_id: str, session_id: str
+    ) -> None: ...
+
+    @webmethod(route="/agentic_system/delete")
+    async def delete_agentic_system(
+        self,
+        agent_id: str,
+    ) -> None: ...
diff --git a/llama_toolchain/agentic_system/api/datatypes.py b/llama_toolchain/agentic_system/api/datatypes.py
deleted file mode 100644
index 648aed698..000000000
--- a/llama_toolchain/agentic_system/api/datatypes.py
+++ /dev/null
@@ -1,234 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from datetime import datetime
-from enum import Enum
-from typing import Any, Dict, List, Literal, Optional, Union
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel, ConfigDict, Field
-from typing_extensions import Annotated
-
-from llama_toolchain.common.deployment_types import *  # noqa: F403
-from llama_toolchain.inference.api import *  # noqa: F403
-from llama_toolchain.safety.api.datatypes import *  # noqa: F403
-from llama_toolchain.memory.api.datatypes import *  # noqa: F403
-
-
-@json_schema_type
-class AgenticSystemToolDefinition(ToolDefinition):
-    execution_config: Optional[RestAPIExecutionConfig] = None
-    input_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
-    output_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
-
-
-class StepCommon(BaseModel):
-    turn_id: str
-    step_id: str
-    started_at: Optional[datetime] = None
-    completed_at: Optional[datetime] = None
-
-
-class StepType(Enum):
-    inference = "inference"
-    tool_execution = "tool_execution"
-    shield_call = "shield_call"
-    memory_retrieval = "memory_retrieval"
-
-
-@json_schema_type
-class InferenceStep(StepCommon):
-    model_config = ConfigDict(protected_namespaces=())
-
-    step_type: Literal[StepType.inference.value] = StepType.inference.value
-    model_response: CompletionMessage
-
-
-@json_schema_type
-class ToolExecutionStep(StepCommon):
-    step_type: Literal[StepType.tool_execution.value] = StepType.tool_execution.value
-    tool_calls: List[ToolCall]
-    tool_responses: List[ToolResponse]
-
-
-@json_schema_type
-class ShieldCallStep(StepCommon):
-    step_type: Literal[StepType.shield_call.value] = StepType.shield_call.value
-    response: ShieldResponse
-
-
-@json_schema_type
-class MemoryRetrievalStep(StepCommon):
-    step_type: Literal[StepType.memory_retrieval.value] = (
-        StepType.memory_retrieval.value
-    )
-    memory_bank_ids: List[str]
-    documents: List[MemoryBankDocument]
-    scores: List[float]
-
-
-Step = Annotated[
-    Union[
-        InferenceStep,
-        ToolExecutionStep,
-        ShieldCallStep,
-        MemoryRetrievalStep,
-    ],
-    Field(discriminator="step_type"),
-]
-
-
-@json_schema_type
-class Turn(BaseModel):
-    """A single turn in an interaction with an Agentic System."""
-
-    turn_id: str
-    session_id: str
-    input_messages: List[
-        Union[
-            UserMessage,
-            ToolResponseMessage,
-        ]
-    ]
-    steps: List[Step]
-    output_message: CompletionMessage
-    started_at: datetime
-    completed_at: Optional[datetime] = None
-
-
-@json_schema_type
-class Session(BaseModel):
-    """A single session of an interaction with an Agentic System."""
-
-    session_id: str
-    session_name: str
-    turns: List[Turn]
-    started_at: datetime
-
-
-@json_schema_type
-class ToolPromptFormat(Enum):
-    """This Enum refers to the prompt format for calling zero shot tools
-
-    `json` --
-        Refers to the json format for calling tools.
-        The json format takes the form like
-        {
-            "type": "function",
-            "function" : {
-                "name": "function_name",
-                "description": "function_description",
-                "parameters": {...}
-            }
-        }
-
-    `function_tag` --
-        This is an example of how you could define
-        your own user defined format for making tool calls.
-        The function_tag format looks like this,
-        <function=function_name>(parameters)</function>
-
-    The detailed prompts for each of these formats are defined in `system_prompt.py`
-    """
-
-    json = "json"
-    function_tag = "function_tag"
-
-
-@json_schema_type
-class AgenticSystemInstanceConfig(BaseModel):
-    instructions: str
-    sampling_params: Optional[SamplingParams] = SamplingParams()
-    # zero-shot or built-in tool configurations as input to the model
-    available_tools: Optional[List[AgenticSystemToolDefinition]] = Field(
-        default_factory=list
-    )
-
-    input_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
-    output_shields: Optional[List[ShieldDefinition]] = Field(default_factory=list)
-
-    # if you completely want to replace the messages prefixed by the system,
-    # this is debug only
-    debug_prefix_messages: Optional[List[Message]] = Field(default_factory=list)
-    tool_prompt_format: Optional[ToolPromptFormat] = Field(
-        default=ToolPromptFormat.json
-    )
-
-
-class AgenticSystemTurnResponseEventType(Enum):
-    step_start = "step_start"
-    step_complete = "step_complete"
-    step_progress = "step_progress"
-
-    turn_start = "turn_start"
-    turn_complete = "turn_complete"
-
-
-@json_schema_type
-class AgenticSystemTurnResponseStepStartPayload(BaseModel):
-    event_type: Literal[AgenticSystemTurnResponseEventType.step_start.value] = (
-        AgenticSystemTurnResponseEventType.step_start.value
-    )
-    step_type: StepType
-    step_id: str
-    metadata: Optional[Dict[str, Any]] = Field(default_factory=dict)
-
-
-@json_schema_type
-class AgenticSystemTurnResponseStepCompletePayload(BaseModel):
-    event_type: Literal[AgenticSystemTurnResponseEventType.step_complete.value] = (
-        AgenticSystemTurnResponseEventType.step_complete.value
-    )
-    step_type: StepType
-    step_details: Step
-
-
-@json_schema_type
-class AgenticSystemTurnResponseStepProgressPayload(BaseModel):
-    model_config = ConfigDict(protected_namespaces=())
-
-    event_type: Literal[AgenticSystemTurnResponseEventType.step_progress.value] = (
-        AgenticSystemTurnResponseEventType.step_progress.value
-    )
-    step_type: StepType
-    step_id: str
-
-    model_response_text_delta: Optional[str] = None
-    tool_call_delta: Optional[ToolCallDelta] = None
-    tool_response_text_delta: Optional[str] = None
-
-
-@json_schema_type
-class AgenticSystemTurnResponseTurnStartPayload(BaseModel):
-    event_type: Literal[AgenticSystemTurnResponseEventType.turn_start.value] = (
-        AgenticSystemTurnResponseEventType.turn_start.value
-    )
-    turn_id: str
-
-
-@json_schema_type
-class AgenticSystemTurnResponseTurnCompletePayload(BaseModel):
-    event_type: Literal[AgenticSystemTurnResponseEventType.turn_complete.value] = (
-        AgenticSystemTurnResponseEventType.turn_complete.value
-    )
-    turn: Turn
-
-
-@json_schema_type
-class AgenticSystemTurnResponseEvent(BaseModel):
-    """Streamed agent execution response."""
-
-    payload: Annotated[
-        Union[
-            AgenticSystemTurnResponseStepStartPayload,
-            AgenticSystemTurnResponseStepProgressPayload,
-            AgenticSystemTurnResponseStepCompletePayload,
-            AgenticSystemTurnResponseTurnStartPayload,
-            AgenticSystemTurnResponseTurnCompletePayload,
-        ],
-        Field(discriminator="event_type"),
-    ]
diff --git a/llama_toolchain/agentic_system/api/endpoints.py b/llama_toolchain/agentic_system/api/endpoints.py
deleted file mode 100644
index 06a7323ea..000000000
--- a/llama_toolchain/agentic_system/api/endpoints.py
+++ /dev/null
@@ -1,127 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from .datatypes import *  # noqa: F403
-from typing import Protocol
-
-# this dependency is annoying and we need a forked up version anyway
-from llama_models.schema_utils import json_schema_type, webmethod
-
-
-@json_schema_type
-class AgenticSystemCreateRequest(BaseModel):
-    model: str
-    instance_config: AgenticSystemInstanceConfig
-
-
-@json_schema_type
-class AgenticSystemCreateResponse(BaseModel):
-    system_id: str
-
-
-@json_schema_type
-class AgenticSystemSessionCreateRequest(BaseModel):
-    system_id: str
-    session_name: str
-
-
-@json_schema_type
-class AgenticSystemSessionCreateResponse(BaseModel):
-    session_id: str
-
-
-@json_schema_type
-# what's the URI?
-class AgenticSystemTurnCreateRequest(BaseModel):
-    system_id: str
-    session_id: str
-
-    messages: List[
-        Union[
-            UserMessage,
-            ToolResponseMessage,
-        ]
-    ]
-
-    stream: Optional[bool] = False
-    override_config: Optional[AgenticSystemInstanceConfig] = None
-
-
-@json_schema_type
-class AgenticSystemTurnResponseStreamChunk(BaseModel):
-    event: AgenticSystemTurnResponseEvent
-
-
-@json_schema_type
-class AgenticSystemStepResponse(BaseModel):
-    step: Step
-
-
-class AgenticSystem(Protocol):
-    @webmethod(route="/agentic_system/create")
-    async def create_agentic_system(
-        self,
-        request: AgenticSystemCreateRequest,
-    ) -> AgenticSystemCreateResponse: ...
-
-    @webmethod(route="/agentic_system/turn/create")
-    async def create_agentic_system_turn(
-        self,
-        request: AgenticSystemTurnCreateRequest,
-    ) -> AgenticSystemTurnResponseStreamChunk: ...
-
-    @webmethod(route="/agentic_system/turn/get")
-    async def get_agentic_system_turn(
-        self,
-        agent_id: str,
-        turn_id: str,
-    ) -> Turn: ...
-
-    @webmethod(route="/agentic_system/step/get")
-    async def get_agentic_system_step(
-        self, agent_id: str, turn_id: str, step_id: str
-    ) -> AgenticSystemStepResponse: ...
-
-    @webmethod(route="/agentic_system/session/create")
-    async def create_agentic_system_session(
-        self,
-        request: AgenticSystemSessionCreateRequest,
-    ) -> AgenticSystemSessionCreateResponse: ...
-
-    @webmethod(route="/agentic_system/memory_bank/attach")
-    async def attach_memory_bank_to_agentic_system(
-        self,
-        agent_id: str,
-        session_id: str,
-        memory_bank_ids: List[str],
-    ) -> None: ...
-
-    @webmethod(route="/agentic_system/memory_bank/detach")
-    async def detach_memory_bank_from_agentic_system(
-        self,
-        agent_id: str,
-        session_id: str,
-        memory_bank_ids: List[str],
-    ) -> None: ...
-
-    @webmethod(route="/agentic_system/session/get")
-    async def get_agentic_system_session(
-        self,
-        agent_id: str,
-        session_id: str,
-        turn_ids: Optional[List[str]] = None,
-    ) -> Session: ...
-
-    @webmethod(route="/agentic_system/session/delete")
-    async def delete_agentic_system_session(
-        self, agent_id: str, session_id: str
-    ) -> None: ...
-
-    @webmethod(route="/agentic_system/delete")
-    async def delete_agentic_system(
-        self,
-        agent_id: str,
-    ) -> None: ...
diff --git a/llama_toolchain/agentic_system/client.py b/llama_toolchain/agentic_system/client.py
index 56428c425..fadb78182 100644
--- a/llama_toolchain/agentic_system/client.py
+++ b/llama_toolchain/agentic_system/client.py
@@ -6,38 +6,28 @@
 
 import asyncio
 import json
-
 from typing import AsyncGenerator
 
 import fire
 
 import httpx
 
-from llama_models.llama3.api.datatypes import (
-    BuiltinTool,
-    SamplingParams,
-    ToolParamDefinition,
-    UserMessage,
-)
+from pydantic import BaseModel
 from termcolor import cprint
 
-from llama_toolchain.agentic_system.event_logger import EventLogger
-from .api import (
-    AgenticSystem,
-    AgenticSystemCreateRequest,
-    AgenticSystemCreateResponse,
-    AgenticSystemInstanceConfig,
-    AgenticSystemSessionCreateRequest,
-    AgenticSystemSessionCreateResponse,
-    AgenticSystemToolDefinition,
-    AgenticSystemTurnCreateRequest,
-    AgenticSystemTurnResponseStreamChunk,
-    ToolPromptFormat,
-)
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.core.datatypes import RemoteProviderConfig
+
+from .api import *  # noqa: F403
+from .event_logger import EventLogger
 
 
-async def get_client_impl(base_url: str):
-    return AgenticSystemClient(base_url)
+async def get_client_impl(config: RemoteProviderConfig, _deps):
+    return AgenticSystemClient(config.url)
+
+
+def encodable_dict(d: BaseModel):
+    return json.loads(d.json())
 
 
 class AgenticSystemClient(AgenticSystem):
@@ -45,12 +35,14 @@ class AgenticSystemClient(AgenticSystem):
         self.base_url = base_url
 
     async def create_agentic_system(
-        self, request: AgenticSystemCreateRequest
+        self, agent_config: AgentConfig
     ) -> AgenticSystemCreateResponse:
         async with httpx.AsyncClient() as client:
             response = await client.post(
                 f"{self.base_url}/agentic_system/create",
-                data=request.json(),
+                json={
+                    "agent_config": encodable_dict(agent_config),
+                },
                 headers={"Content-Type": "application/json"},
             )
             response.raise_for_status()
@@ -58,12 +50,16 @@ class AgenticSystemClient(AgenticSystem):
 
     async def create_agentic_system_session(
         self,
-        request: AgenticSystemSessionCreateRequest,
+        agent_id: str,
+        session_name: str,
     ) -> AgenticSystemSessionCreateResponse:
         async with httpx.AsyncClient() as client:
             response = await client.post(
                 f"{self.base_url}/agentic_system/session/create",
-                data=request.json(),
+                json={
+                    "agent_id": agent_id,
+                    "session_name": session_name,
+                },
                 headers={"Content-Type": "application/json"},
             )
             response.raise_for_status()
@@ -77,7 +73,9 @@ class AgenticSystemClient(AgenticSystem):
             async with client.stream(
                 "POST",
                 f"{self.base_url}/agentic_system/turn/create",
-                data=request.json(),
+                json={
+                    "request": encodable_dict(request),
+                },
                 headers={"Content-Type": "application/json"},
                 timeout=20,
             ) as response:
@@ -85,6 +83,10 @@ class AgenticSystemClient(AgenticSystem):
                     if line.startswith("data:"):
                         data = line[len("data: ") :]
                         try:
+                            if "error" in data:
+                                cprint(data, "red")
+                                continue
+
                             yield AgenticSystemTurnResponseStreamChunk(
                                 **json.loads(data)
                             )
@@ -93,24 +95,52 @@ class AgenticSystemClient(AgenticSystem):
                             print(f"Error with parsing or validation: {e}")
 
 
+async def _run_agent(api, tool_definitions, user_prompts, attachments=None):
+    agent_config = AgentConfig(
+        model="Meta-Llama3.1-8B-Instruct",
+        instructions="You are a helpful assistant",
+        sampling_params=SamplingParams(temperature=1.0, top_p=0.9),
+        tools=tool_definitions,
+        tool_choice=ToolChoice.auto,
+        tool_prompt_format=ToolPromptFormat.function_tag,
+    )
+
+    create_response = await api.create_agentic_system(agent_config)
+    session_response = await api.create_agentic_system_session(
+        agent_id=create_response.agent_id,
+        session_name="test_session",
+    )
+
+    for content in user_prompts:
+        cprint(f"User> {content}", color="white", attrs=["bold"])
+        iterator = api.create_agentic_system_turn(
+            AgenticSystemTurnCreateRequest(
+                agent_id=create_response.agent_id,
+                session_id=session_response.session_id,
+                messages=[
+                    UserMessage(content=content),
+                ],
+                attachments=attachments,
+                stream=True,
+            )
+        )
+
+        async for event, log in EventLogger().log(iterator):
+            if log is not None:
+                log.print()
+
+
 async def run_main(host: str, port: int):
-    # client to test remote impl of agentic system
     api = AgenticSystemClient(f"http://{host}:{port}")
 
     tool_definitions = [
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.brave_search,
-        ),
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.wolfram_alpha,
-        ),
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.code_interpreter,
-        ),
+        BraveSearchToolDefinition(),
+        WolframAlphaToolDefinition(),
+        CodeInterpreterToolDefinition(),
     ]
     tool_definitions += [
-        AgenticSystemToolDefinition(
-            tool_name="get_boiling_point",
+        FunctionCallToolDefinition(
+            function_name="get_boiling_point",
             description="Get the boiling point of a imaginary liquids (eg. polyjuice)",
             parameters={
                 "liquid_name": ToolParamDefinition(
@@ -127,30 +157,6 @@ async def run_main(host: str, port: int):
         ),
     ]
 
-    create_request = AgenticSystemCreateRequest(
-        model="Meta-Llama3.1-8B-Instruct",
-        instance_config=AgenticSystemInstanceConfig(
-            instructions="You are a helpful assistant",
-            sampling_params=SamplingParams(),
-            available_tools=tool_definitions,
-            input_shields=[],
-            output_shields=[],
-            debug_prefix_messages=[],
-            tool_prompt_format=ToolPromptFormat.json,
-        ),
-    )
-
-    create_response = await api.create_agentic_system(create_request)
-    print(create_response)
-
-    session_response = await api.create_agentic_system_session(
-        AgenticSystemSessionCreateRequest(
-            system_id=create_response.system_id,
-            session_name="test_session",
-        )
-    )
-    print(session_response)
-
     user_prompts = [
         "Who are you?",
         "what is the 100th prime number?",
@@ -158,26 +164,51 @@ async def run_main(host: str, port: int):
         "Write code to check if a number is prime. Use that to check if 7 is prime",
         "What is the boiling point of polyjuicepotion ?",
     ]
-    for content in user_prompts:
-        cprint(f"User> {content}", color="blue")
-        iterator = api.create_agentic_system_turn(
-            AgenticSystemTurnCreateRequest(
-                system_id=create_response.system_id,
-                session_id=session_response.session_id,
-                messages=[
-                    UserMessage(content=content),
-                ],
-                stream=True,
-            )
+    await _run_agent(api, tool_definitions, user_prompts)
+
+
+async def run_rag(host: str, port: int):
+    api = AgenticSystemClient(f"http://{host}:{port}")
+
+    urls = [
+        "memory_optimizations.rst",
+        "chat.rst",
+        "llama3.rst",
+        "datasets.rst",
+        "qat_finetune.rst",
+        "lora_finetune.rst",
+    ]
+    attachments = [
+        Attachment(
+            content=URL(
+                uri=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}"
+            ),
+            mime_type="text/plain",
         )
+        for i, url in enumerate(urls)
+    ]
 
-        async for event, log in EventLogger().log(iterator):
-            if log is not None:
-                log.print()
+    # Alternatively, you can pre-populate the memory bank with documents for example,
+    # using `llama_toolchain.memory.client`. Then you can grab the bank_id
+    # from the output of that run.
+    tool_definitions = [
+        MemoryToolDefinition(
+            max_tokens_in_context=2048,
+            memory_bank_configs=[],
+        ),
+    ]
+
+    user_prompts = [
+        "How do I use Lora?",
+        "Tell me briefly about llama3 and torchtune",
+    ]
+
+    await _run_agent(api, tool_definitions, user_prompts, attachments)
 
 
-def main(host: str, port: int):
-    asyncio.run(run_main(host, port))
+def main(host: str, port: int, rag: bool = False):
+    fn = run_rag if rag else run_main
+    asyncio.run(fn(host, port))
 
 
 if __name__ == "__main__":
diff --git a/llama_toolchain/agentic_system/event_logger.py b/llama_toolchain/agentic_system/event_logger.py
index 22d961a10..3d15ee239 100644
--- a/llama_toolchain/agentic_system/event_logger.py
+++ b/llama_toolchain/agentic_system/event_logger.py
@@ -6,7 +6,7 @@
 
 from typing import Optional
 
-from llama_models.llama3.api.datatypes import ToolResponseMessage
+from llama_models.llama3.api.datatypes import *  # noqa: F403
 from llama_models.llama3.api.tool_utils import ToolUtils
 
 from termcolor import cprint
@@ -44,7 +44,12 @@ EventType = AgenticSystemTurnResponseEventType
 
 
 class EventLogger:
-    async def log(self, event_generator, stream=True):
+    async def log(
+        self,
+        event_generator,
+        stream=True,
+        tool_prompt_format: ToolPromptFormat = ToolPromptFormat.json,
+    ):
         previous_event_type = None
         previous_step_type = None
 
@@ -132,7 +137,9 @@ class EventLogger:
                     if event_type == EventType.step_complete.value:
                         response = event.payload.step_details.model_response
                         if response.tool_calls:
-                            content = ToolUtils.encode_tool_call(response.tool_calls[0])
+                            content = ToolUtils.encode_tool_call(
+                                response.tool_calls[0], tool_prompt_format
+                            )
                         else:
                             content = response.content
                         yield event, LogEvent(
@@ -162,5 +169,19 @@ class EventLogger:
                         color="green",
                     )
 
+            if (
+                step_type == StepType.memory_retrieval
+                and event_type == EventType.step_complete.value
+            ):
+                details = event.payload.step_details
+                content = interleaved_text_media_as_str(details.inserted_context)
+                content = content[:200] + "..." if len(content) > 200 else content
+
+                yield event, LogEvent(
+                    role=step_type,
+                    content=f"Retrieved context from banks: {details.memory_bank_ids}.\n====\n{content}\n>",
+                    color="cyan",
+                )
+
             preivous_event_type = event_type
             previous_step_type = step_type
diff --git a/llama_toolchain/agentic_system/execute_with_custom_tools.py b/llama_toolchain/agentic_system/execute_with_custom_tools.py
new file mode 100644
index 000000000..e8038bc20
--- /dev/null
+++ b/llama_toolchain/agentic_system/execute_with_custom_tools.py
@@ -0,0 +1,96 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import AsyncGenerator, List
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.agentic_system.api import *  # noqa: F403
+from llama_toolchain.memory.api import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
+
+from llama_toolchain.agentic_system.api import (
+    AgenticSystemTurnResponseEventType as EventType,
+)
+from llama_toolchain.tools.custom.datatypes import CustomTool
+
+
+class AgentWithCustomToolExecutor:
+    def __init__(
+        self,
+        api: AgenticSystem,
+        agent_id: str,
+        session_id: str,
+        agent_config: AgentConfig,
+        custom_tools: List[CustomTool],
+    ):
+        self.api = api
+        self.agent_id = agent_id
+        self.session_id = session_id
+        self.agent_config = agent_config
+        self.custom_tools = custom_tools
+
+    async def execute_turn(
+        self,
+        messages: List[Message],
+        attachments: Optional[List[Attachment]] = None,
+        max_iters: int = 5,
+        stream: bool = True,
+    ) -> AsyncGenerator:
+        tools_dict = {t.get_name(): t for t in self.custom_tools}
+
+        current_messages = messages.copy()
+        n_iter = 0
+        while n_iter < max_iters:
+            n_iter += 1
+
+            request = AgenticSystemTurnCreateRequest(
+                agent_id=self.agent_id,
+                session_id=self.session_id,
+                messages=current_messages,
+                attachments=attachments,
+                stream=stream,
+            )
+
+            turn = None
+            async for chunk in self.api.create_agentic_system_turn(request):
+                if chunk.event.payload.event_type != EventType.turn_complete.value:
+                    yield chunk
+                else:
+                    turn = chunk.event.payload.turn
+
+            message = turn.output_message
+            if len(message.tool_calls) == 0:
+                yield chunk
+                return
+
+            if message.stop_reason == StopReason.out_of_tokens:
+                yield chunk
+                return
+
+            tool_call = message.tool_calls[0]
+            if tool_call.tool_name not in tools_dict:
+                m = ToolResponseMessage(
+                    call_id=tool_call.call_id,
+                    tool_name=tool_call.tool_name,
+                    content=f"Unknown tool `{tool_call.tool_name}` was called. Try again with something else",
+                )
+                next_message = m
+            else:
+                tool = tools_dict[tool_call.tool_name]
+                result_messages = await execute_custom_tool(tool, message)
+                next_message = result_messages[0]
+
+            yield next_message
+            current_messages = [next_message]
+
+
+async def execute_custom_tool(tool: CustomTool, message: Message) -> List[Message]:
+    result_messages = await tool.run([message])
+    assert (
+        len(result_messages) == 1
+    ), f"Expected single message, got {len(result_messages)}"
+
+    return result_messages
diff --git a/llama_toolchain/agentic_system/meta_reference/__init__.py b/llama_toolchain/agentic_system/meta_reference/__init__.py
index 22b1f788a..b49cc4c84 100644
--- a/llama_toolchain/agentic_system/meta_reference/__init__.py
+++ b/llama_toolchain/agentic_system/meta_reference/__init__.py
@@ -4,5 +4,27 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .agentic_system import get_provider_impl  # noqa
-from .config import AgenticSystemConfig  # noqa
+from typing import Dict
+
+from llama_toolchain.core.datatypes import Api, ProviderSpec
+
+from .config import MetaReferenceImplConfig
+
+
+async def get_provider_impl(
+    config: MetaReferenceImplConfig, deps: Dict[Api, ProviderSpec]
+):
+    from .agentic_system import MetaReferenceAgenticSystemImpl
+
+    assert isinstance(
+        config, MetaReferenceImplConfig
+    ), f"Unexpected config type: {type(config)}"
+
+    impl = MetaReferenceAgenticSystemImpl(
+        config,
+        deps[Api.inference],
+        deps[Api.memory],
+        deps[Api.safety],
+    )
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/agentic_system/meta_reference/agent_instance.py b/llama_toolchain/agentic_system/meta_reference/agent_instance.py
index 5be9f8bb6..ed3145b1e 100644
--- a/llama_toolchain/agentic_system/meta_reference/agent_instance.py
+++ b/llama_toolchain/agentic_system/meta_reference/agent_instance.py
@@ -4,111 +4,111 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-
+import asyncio
 import copy
+import os
+import secrets
+import shutil
+import string
+import tempfile
 import uuid
 from datetime import datetime
-from typing import AsyncGenerator, List, Optional
+from typing import AsyncGenerator, List, Tuple
+from urllib.parse import urlparse
+
+import httpx
 
 from termcolor import cprint
 
-from llama_toolchain.agentic_system.api.datatypes import (
-    AgenticSystemInstanceConfig,
-    AgenticSystemTurnResponseEvent,
-    AgenticSystemTurnResponseEventType,
-    AgenticSystemTurnResponseStepCompletePayload,
-    AgenticSystemTurnResponseStepProgressPayload,
-    AgenticSystemTurnResponseStepStartPayload,
-    AgenticSystemTurnResponseTurnCompletePayload,
-    AgenticSystemTurnResponseTurnStartPayload,
-    InferenceStep,
-    Session,
-    ShieldCallStep,
-    StepType,
-    ToolExecutionStep,
-    ToolPromptFormat,
-    Turn,
-)
+from llama_toolchain.agentic_system.api import *  # noqa: F403
+from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.memory.api import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
 
-from llama_toolchain.inference.api import ChatCompletionRequest, Inference
-
-from llama_toolchain.inference.api.datatypes import (
-    Attachment,
-    BuiltinTool,
-    ChatCompletionResponseEventType,
-    CompletionMessage,
-    Message,
-    Role,
-    SamplingParams,
-    StopReason,
-    ToolCallDelta,
-    ToolCallParseStatus,
-    ToolDefinition,
-    ToolResponse,
-    ToolResponseMessage,
-    URL,
+from llama_toolchain.tools.base import BaseTool
+from llama_toolchain.tools.builtin import (
+    interpret_content_as_attachment,
+    SingleMessageBuiltinTool,
 )
-from llama_toolchain.safety.api import Safety
-from llama_toolchain.safety.api.datatypes import (
-    BuiltinShield,
-    ShieldDefinition,
-    ShieldResponse,
-)
-from llama_toolchain.agentic_system.api.endpoints import *  # noqa
 
 from .safety import SafetyException, ShieldRunnerMixin
-from .system_prompt import get_agentic_prefix_messages
-from .tools.base import BaseTool
-from .tools.builtin import SingleMessageBuiltinTool
 
 
-class AgentInstance(ShieldRunnerMixin):
+def make_random_string(length: int = 8):
+    return "".join(
+        secrets.choice(string.ascii_letters + string.digits) for _ in range(length)
+    )
+
+
+class ChatAgent(ShieldRunnerMixin):
     def __init__(
         self,
-        system_id: int,
-        instance_config: AgenticSystemInstanceConfig,
-        model: str,
+        agent_config: AgentConfig,
         inference_api: Inference,
+        memory_api: Memory,
         safety_api: Safety,
         builtin_tools: List[SingleMessageBuiltinTool],
-        custom_tool_definitions: List[ToolDefinition],
-        input_shields: List[ShieldDefinition],
-        output_shields: List[ShieldDefinition],
         max_infer_iters: int = 10,
-        prefix_messages: Optional[List[Message]] = None,
-        tool_prompt_format: Optional[ToolPromptFormat] = ToolPromptFormat.json,
     ):
-        self.system_id = system_id
-        self.instance_config = instance_config
-
-        self.model = model
+        self.agent_config = agent_config
         self.inference_api = inference_api
+        self.memory_api = memory_api
         self.safety_api = safety_api
 
-        if prefix_messages is not None and len(prefix_messages) > 0:
-            self.prefix_messages = prefix_messages
-        else:
-            self.prefix_messages = get_agentic_prefix_messages(
-                builtin_tools,
-                custom_tool_definitions,
-                tool_prompt_format,
-            )
-
-        for m in self.prefix_messages:
-            print(m.content)
-
         self.max_infer_iters = max_infer_iters
         self.tools_dict = {t.get_name(): t for t in builtin_tools}
 
+        self.tempdir = tempfile.mkdtemp()
         self.sessions = {}
 
         ShieldRunnerMixin.__init__(
             self,
             safety_api,
-            input_shields=input_shields,
-            output_shields=output_shields,
+            input_shields=agent_config.input_shields,
+            output_shields=agent_config.output_shields,
         )
 
+    def __del__(self):
+        shutil.rmtree(self.tempdir)
+
+    def turn_to_messages(self, turn: Turn) -> List[Message]:
+        messages = []
+
+        # We do not want to keep adding RAG context to the input messages
+        # May be this should be a parameter of the agentic instance
+        # that can define its behavior in a custom way
+        for m in turn.input_messages:
+            msg = m.copy()
+            if isinstance(msg, UserMessage):
+                msg.context = None
+            messages.append(msg)
+
+        # messages.extend(turn.input_messages)
+        for step in turn.steps:
+            if step.step_type == StepType.inference.value:
+                messages.append(step.model_response)
+            elif step.step_type == StepType.tool_execution.value:
+                for response in step.tool_responses:
+                    messages.append(
+                        ToolResponseMessage(
+                            call_id=response.call_id,
+                            tool_name=response.tool_name,
+                            content=response.content,
+                        )
+                    )
+            elif step.step_type == StepType.shield_call.value:
+                response = step.response
+                if response.is_violation:
+                    # CompletionMessage itself in the ShieldResponse
+                    messages.append(
+                        CompletionMessage(
+                            content=response.violation_return_message,
+                            stop_reason=StopReason.end_of_turn,
+                        )
+                    )
+        # print_dialog(messages)
+        return messages
+
     def create_session(self, name: str) -> Session:
         session_id = str(uuid.uuid4())
         session = Session(
@@ -131,32 +131,7 @@ class AgentInstance(ShieldRunnerMixin):
 
         messages = []
         for i, turn in enumerate(session.turns):
-            # print(f"turn {i}")
-            # print_dialog(turn.input_messages)
-            messages.extend(turn.input_messages)
-            for step in turn.steps:
-                if step.step_type == StepType.inference.value:
-                    messages.append(step.model_response)
-                elif step.step_type == StepType.tool_execution.value:
-                    for response in step.tool_responses:
-                        messages.append(
-                            ToolResponseMessage(
-                                call_id=response.call_id,
-                                tool_name=response.tool_name,
-                                content=response.content,
-                            )
-                        )
-                elif step.step_type == StepType.shield_call.value:
-                    response = step.response
-                    if response.is_violation:
-                        # TODO: Properly persist the
-                        # CompletionMessage itself in the ShieldResponse
-                        messages.append(
-                            CompletionMessage(
-                                content=response.violation_return_message,
-                                stop_reason=StopReason.end_of_turn,
-                            )
-                        )
+            messages.extend(self.turn_to_messages(turn))
 
         messages.extend(request.messages)
 
@@ -164,7 +139,6 @@ class AgentInstance(ShieldRunnerMixin):
         # print_dialog(messages)
 
         turn_id = str(uuid.uuid4())
-        params = self.instance_config.sampling_params
         start_time = datetime.now()
         yield AgenticSystemTurnResponseStreamChunk(
             event=AgenticSystemTurnResponseEvent(
@@ -177,12 +151,12 @@ class AgentInstance(ShieldRunnerMixin):
         steps = []
         output_message = None
         async for chunk in self.run(
+            session=session,
             turn_id=turn_id,
             input_messages=messages,
-            temperature=params.temperature,
-            top_p=params.top_p,
+            attachments=request.attachments or [],
+            sampling_params=self.agent_config.sampling_params,
             stream=request.stream,
-            max_gen_len=params.max_tokens,
         ):
             if isinstance(chunk, CompletionMessage):
                 cprint(
@@ -227,6 +201,53 @@ class AgentInstance(ShieldRunnerMixin):
         )
         yield chunk
 
+    async def run(
+        self,
+        session: Session,
+        turn_id: str,
+        input_messages: List[Message],
+        attachments: List[Attachment],
+        sampling_params: SamplingParams,
+        stream: bool = False,
+    ) -> AsyncGenerator:
+        # Doing async generators makes downstream code much simpler and everything amenable to
+        # streaming. However, it also makes things complicated here because AsyncGenerators cannot
+        # return a "final value" for the `yield from` statement. we simulate that by yielding a
+        # final boolean (to see whether an exception happened) and then explicitly testing for it.
+
+        async for res in self.run_shields_wrapper(
+            turn_id, input_messages, self.input_shields, "user-input"
+        ):
+            if isinstance(res, bool):
+                return
+            else:
+                yield res
+
+        async for res in self._run(
+            session, turn_id, input_messages, attachments, sampling_params, stream
+        ):
+            if isinstance(res, bool):
+                return
+            elif isinstance(res, CompletionMessage):
+                final_response = res
+                break
+            else:
+                yield res
+
+        assert final_response is not None
+        # for output shields run on the full input and output combination
+        messages = input_messages + [final_response]
+
+        async for res in self.run_shields_wrapper(
+            turn_id, messages, self.output_shields, "assistant-output"
+        ):
+            if isinstance(res, bool):
+                return
+            else:
+                yield res
+
+        yield final_response
+
     async def run_shields_wrapper(
         self,
         turn_id: str,
@@ -288,65 +309,62 @@ class AgentInstance(ShieldRunnerMixin):
             )
         )
 
-    async def run(
-        self,
-        turn_id: str,
-        input_messages: List[Message],
-        temperature: float,
-        top_p: float,
-        stream: bool = False,
-        max_gen_len: Optional[int] = None,
-    ) -> AsyncGenerator:
-        # Doing async generators makes downstream code much simpler and everything amenable to
-        # stremaing. However, it also makes things complicated here because AsyncGenerators cannot
-        # return a "final value" for the `yield from` statement. we simulate that by yielding a
-        # final boolean (to see whether an exception happened) and then explicitly testing for it.
-
-        async for res in self.run_shields_wrapper(
-            turn_id, input_messages, self.input_shields, "user-input"
-        ):
-            if isinstance(res, bool):
-                return
-            else:
-                yield res
-
-        async for res in self._run(
-            turn_id, input_messages, temperature, top_p, stream, max_gen_len
-        ):
-            if isinstance(res, bool):
-                return
-            elif isinstance(res, CompletionMessage):
-                final_response = res
-                break
-            else:
-                yield res
-
-        assert final_response is not None
-        # for output shields run on the full input and output combination
-        messages = input_messages + [final_response]
-
-        async for res in self.run_shields_wrapper(
-            turn_id, messages, self.output_shields, "assistant-output"
-        ):
-            if isinstance(res, bool):
-                return
-            else:
-                yield res
-
-        yield final_response
-
     async def _run(
         self,
+        session: Session,
         turn_id: str,
         input_messages: List[Message],
-        temperature: float,
-        top_p: float,
+        attachments: List[Attachment],
+        sampling_params: SamplingParams,
         stream: bool = False,
-        max_gen_len: Optional[int] = None,
     ) -> AsyncGenerator:
-        input_messages = preprocess_dialog(input_messages, self.prefix_messages)
+        enabled_tools = set(t.type for t in self.agent_config.tools)
+        need_rag_context = await self._should_retrieve_context(
+            input_messages, attachments
+        )
+        if need_rag_context:
+            step_id = str(uuid.uuid4())
+            yield AgenticSystemTurnResponseStreamChunk(
+                event=AgenticSystemTurnResponseEvent(
+                    payload=AgenticSystemTurnResponseStepStartPayload(
+                        step_type=StepType.memory_retrieval.value,
+                        step_id=step_id,
+                    )
+                )
+            )
 
-        attachments = []
+            # TODO: find older context from the session and either replace it
+            # or append with a sliding window. this is really a very simplistic implementation
+            rag_context, bank_ids = await self._retrieve_context(
+                session, input_messages, attachments
+            )
+
+            step_id = str(uuid.uuid4())
+            yield AgenticSystemTurnResponseStreamChunk(
+                event=AgenticSystemTurnResponseEvent(
+                    payload=AgenticSystemTurnResponseStepCompletePayload(
+                        step_type=StepType.memory_retrieval.value,
+                        step_id=step_id,
+                        step_details=MemoryRetrievalStep(
+                            turn_id=turn_id,
+                            step_id=step_id,
+                            memory_bank_ids=bank_ids,
+                            inserted_context=rag_context or "",
+                        ),
+                    )
+                )
+            )
+
+            if rag_context:
+                last_message = input_messages[-1]
+                last_message.context = "\n".join(rag_context)
+
+        elif attachments and AgenticSystemTool.code_interpreter.value in enabled_tools:
+            urls = [a.content for a in attachments if isinstance(a.content, URL)]
+            msg = await attachment_message(self.tempdir, urls)
+            input_messages.append(msg)
+
+        output_attachments = []
 
         n_iter = 0
         while True:
@@ -369,17 +387,13 @@ class AgentInstance(ShieldRunnerMixin):
                 )
             )
 
-            # where are the available tools?
             req = ChatCompletionRequest(
-                model=self.model,
+                model=self.agent_config.model,
                 messages=input_messages,
-                available_tools=self.instance_config.available_tools,
+                tools=self._get_tools(),
+                tool_prompt_format=self.agent_config.tool_prompt_format,
                 stream=True,
-                sampling_params=SamplingParams(
-                    temperature=temperature,
-                    top_p=top_p,
-                    max_tokens=max_gen_len,
-                ),
+                sampling_params=sampling_params,
             )
 
             tool_calls = []
@@ -464,7 +478,8 @@ class AgentInstance(ShieldRunnerMixin):
 
             if len(message.tool_calls) == 0:
                 if stop_reason == StopReason.end_of_turn:
-                    if len(attachments) > 0:
+                    # TODO: UPDATE RETURN TYPE TO SEND A TUPLE OF (MESSAGE, ATTACHMENTS)
+                    if len(output_attachments) > 0:
                         if isinstance(message.content, list):
                             message.content += attachments
                         else:
@@ -572,63 +587,175 @@ class AgentInstance(ShieldRunnerMixin):
                     yield False
                     return
 
-                if isinstance(result_message.content, Attachment):
+                if out_attachment := interpret_content_as_attachment(
+                    result_message.content
+                ):
                     # NOTE: when we push this message back to the model, the model may ignore the
                     # attached file path etc. since the model is trained to only provide a user message
                     # with the summary. We keep all generated attachments and then attach them to final message
-                    attachments.append(result_message.content)
-                elif isinstance(result_message.content, list) or isinstance(
-                    result_message.content, tuple
-                ):
-                    for c in result_message.content:
-                        if isinstance(c, Attachment):
-                            attachments.append(c)
+                    output_attachments.append(out_attachment)
 
                 input_messages = input_messages + [message, result_message]
 
             n_iter += 1
 
+    async def _ensure_memory_bank(self, session: Session) -> MemoryBank:
+        if session.memory_bank is None:
+            session.memory_bank = await self.memory_api.create_memory_bank(
+                name=f"memory_bank_{session.session_id}",
+                config=VectorMemoryBankConfig(
+                    embedding_model="sentence-transformer/all-MiniLM-L6-v2",
+                    chunk_size_in_tokens=512,
+                ),
+            )
 
-def attachment_message(url: URL) -> ToolResponseMessage:
-    uri = url.uri
-    assert uri.startswith("file://")
-    filepath = uri[len("file://") :]
+        return session.memory_bank
+
+    async def _should_retrieve_context(
+        self, messages: List[Message], attachments: List[Attachment]
+    ) -> bool:
+        enabled_tools = set(t.type for t in self.agent_config.tools)
+        if attachments:
+            if (
+                AgenticSystemTool.code_interpreter.value in enabled_tools
+                and self.agent_config.tool_choice == ToolChoice.required
+            ):
+                return False
+            else:
+                return True
+
+        return AgenticSystemTool.memory.value in enabled_tools
+
+    def _memory_tool_definition(self) -> Optional[MemoryToolDefinition]:
+        for t in self.agent_config.tools:
+            if t.type == AgenticSystemTool.memory.value:
+                return t
+
+        return None
+
+    async def _retrieve_context(
+        self, session: Session, messages: List[Message], attachments: List[Attachment]
+    ) -> Tuple[List[str], List[int]]:  # (rag_context, bank_ids)
+        bank_ids = []
+
+        memory = self._memory_tool_definition()
+        assert memory is not None, "Memory tool not configured"
+        bank_ids.extend(c.bank_id for c in memory.memory_bank_configs)
+
+        if attachments:
+            bank = await self._ensure_memory_bank(session)
+            bank_ids.append(bank.bank_id)
+
+            documents = [
+                MemoryBankDocument(
+                    document_id=str(uuid.uuid4()),
+                    content=a.content,
+                    mime_type=a.mime_type,
+                    metadata={},
+                )
+                for a in attachments
+            ]
+            await self.memory_api.insert_documents(bank.bank_id, documents)
+        elif session.memory_bank:
+            bank_ids.append(session.memory_bank.bank_id)
+
+        if not bank_ids:
+            # this can happen if the per-session memory bank is not yet populated
+            # (i.e., no prior turns uploaded an Attachment)
+            return None, []
+
+        query = " ".join(m.content for m in messages)
+        tasks = [
+            self.memory_api.query_documents(
+                bank_id=bank_id,
+                query=query,
+                params={
+                    "max_chunks": 5,
+                },
+            )
+            for bank_id in bank_ids
+        ]
+        results: List[QueryDocumentsResponse] = await asyncio.gather(*tasks)
+        chunks = [c for r in results for c in r.chunks]
+        scores = [s for r in results for s in r.scores]
+
+        # sort by score
+        chunks, scores = zip(
+            *sorted(zip(chunks, scores), key=lambda x: x[1], reverse=True)
+        )
+        if not chunks:
+            return None, bank_ids
+
+        tokens = 0
+        picked = []
+        for c in chunks[: memory.max_chunks]:
+            tokens += c.token_count
+            if tokens > memory.max_tokens_in_context:
+                cprint(
+                    f"Using {len(picked)} chunks; reached max tokens in context: {tokens}",
+                    "red",
+                )
+                break
+            picked.append(f"id:{c.document_id}; content:{c.content}")
+
+        return [
+            "Here are the retrieved documents for relevant context:\n=== START-RETRIEVED-CONTEXT ===\n",
+            *picked,
+            "\n=== END-RETRIEVED-CONTEXT ===\n",
+        ], bank_ids
+
+    def _get_tools(self) -> List[ToolDefinition]:
+        ret = []
+        for t in self.agent_config.tools:
+            if isinstance(t, BraveSearchToolDefinition):
+                ret.append(ToolDefinition(tool_name=BuiltinTool.brave_search))
+            elif isinstance(t, WolframAlphaToolDefinition):
+                ret.append(ToolDefinition(tool_name=BuiltinTool.wolfram_alpha))
+            elif isinstance(t, PhotogenToolDefinition):
+                ret.append(ToolDefinition(tool_name=BuiltinTool.photogen))
+            elif isinstance(t, CodeInterpreterToolDefinition):
+                ret.append(ToolDefinition(tool_name=BuiltinTool.code_interpreter))
+            elif isinstance(t, FunctionCallToolDefinition):
+                ret.append(
+                    ToolDefinition(
+                        tool_name=t.function_name,
+                        description=t.description,
+                        parameters=t.parameters,
+                    )
+                )
+        return ret
+
+
+async def attachment_message(tempdir: str, urls: List[URL]) -> ToolResponseMessage:
+    content = []
+
+    for url in urls:
+        uri = url.uri
+        if uri.startswith("file://"):
+            filepath = uri[len("file://") :]
+        elif uri.startswith("http"):
+            path = urlparse(uri).path
+            basename = os.path.basename(path)
+            filepath = f"{tempdir}/{make_random_string() + basename}"
+            print(f"Downloading {url} -> {filepath}")
+
+            async with httpx.AsyncClient() as client:
+                r = await client.get(uri)
+                resp = r.text
+                with open(filepath, "w") as fp:
+                    fp.write(resp)
+        else:
+            raise ValueError(f"Unsupported URL {url}")
+
+        content.append(f'# There is a file accessible to you at "{filepath}"\n')
 
     return ToolResponseMessage(
         call_id="",
         tool_name=BuiltinTool.code_interpreter,
-        content=f'# There is a file accessible to you at "{filepath}"',
+        content=content,
     )
 
 
-def preprocess_dialog(
-    messages: List[Message], prefix_messages: List[Message]
-) -> List[Message]:
-    """
-    Preprocesses the dialog by removing the system message and
-    adding the system message to the beginning of the dialog.
-    """
-    ret = prefix_messages.copy()
-
-    for m in messages:
-        if m.role == Role.system.value:
-            continue
-
-        # NOTE: the ideal behavior is to use `file_path = ...` but that
-        # means we need to have stateful execution o    f code which we currently
-        # do not have.
-        if isinstance(m.content, Attachment):
-            ret.append(attachment_message(m.content.url))
-        elif isinstance(m.content, list):
-            for c in m.content:
-                if isinstance(c, Attachment):
-                    ret.append(attachment_message(c.url))
-
-        ret.append(m)
-
-    return ret
-
-
 async def execute_tool_call_maybe(
     tools_dict: Dict[str, BaseTool], messages: List[CompletionMessage]
 ) -> List[ToolResponseMessage]:
diff --git a/llama_toolchain/agentic_system/meta_reference/agentic_system.py b/llama_toolchain/agentic_system/meta_reference/agentic_system.py
index 5252e7515..4fa2aa584 100644
--- a/llama_toolchain/agentic_system/meta_reference/agentic_system.py
+++ b/llama_toolchain/agentic_system/meta_reference/agentic_system.py
@@ -8,62 +8,42 @@
 import logging
 import os
 import uuid
-from typing import AsyncGenerator, Dict
+from typing import AsyncGenerator
 
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
 from llama_toolchain.inference.api import Inference
-from llama_toolchain.inference.api.datatypes import BuiltinTool
+from llama_toolchain.memory.api import Memory
 from llama_toolchain.safety.api import Safety
-from llama_toolchain.agentic_system.api.endpoints import *  # noqa
-from llama_toolchain.agentic_system.api import (
-    AgenticSystem,
-    AgenticSystemCreateRequest,
-    AgenticSystemCreateResponse,
-    AgenticSystemSessionCreateRequest,
-    AgenticSystemSessionCreateResponse,
-    AgenticSystemTurnCreateRequest,
-)
-
-from .agent_instance import AgentInstance
-
-from .config import AgenticSystemConfig
-
-from .tools.builtin import (
+from llama_toolchain.agentic_system.api import *  # noqa: F403
+from llama_toolchain.tools.builtin import (
     BraveSearchTool,
     CodeInterpreterTool,
     PhotogenTool,
     WolframAlphaTool,
 )
-from .tools.safety import with_safety
+from llama_toolchain.tools.safety import with_safety
+
+from .agent_instance import ChatAgent
+from .config import MetaReferenceImplConfig
 
 
 logger = logging.getLogger()
 logger.setLevel(logging.INFO)
 
 
-async def get_provider_impl(config: AgenticSystemConfig, deps: Dict[Api, ProviderSpec]):
-    assert isinstance(
-        config, AgenticSystemConfig
-    ), f"Unexpected config type: {type(config)}"
-
-    impl = MetaReferenceAgenticSystemImpl(
-        config,
-        deps[Api.inference],
-        deps[Api.safety],
-    )
-    await impl.initialize()
-    return impl
-
-
 AGENT_INSTANCES_BY_ID = {}
 
 
 class MetaReferenceAgenticSystemImpl(AgenticSystem):
     def __init__(
-        self, config: AgenticSystemConfig, inference_api: Inference, safety_api: Safety
+        self,
+        config: MetaReferenceImplConfig,
+        inference_api: Inference,
+        memory_api: Memory,
+        safety_api: Safety,
     ):
         self.config = config
         self.inference_api = inference_api
+        self.memory_api = memory_api
         self.safety_api = safety_api
 
     async def initialize(self) -> None:
@@ -71,69 +51,61 @@ class MetaReferenceAgenticSystemImpl(AgenticSystem):
 
     async def create_agentic_system(
         self,
-        request: AgenticSystemCreateRequest,
+        agent_config: AgentConfig,
     ) -> AgenticSystemCreateResponse:
-        system_id = str(uuid.uuid4())
+        agent_id = str(uuid.uuid4())
 
         builtin_tools = []
-        custom_tool_definitions = []
-        cfg = request.instance_config
-        for dfn in cfg.available_tools:
-            if isinstance(dfn.tool_name, BuiltinTool):
-                if dfn.tool_name == BuiltinTool.wolfram_alpha:
-                    key = self.config.wolfram_api_key
-                    if not key:
-                        raise ValueError("Wolfram API key not defined in config")
-                    tool = WolframAlphaTool(key)
-                elif dfn.tool_name == BuiltinTool.brave_search:
-                    key = self.config.brave_search_api_key
-                    if not key:
-                        raise ValueError("Brave API key not defined in config")
-                    tool = BraveSearchTool(key)
-                elif dfn.tool_name == BuiltinTool.code_interpreter:
-                    tool = CodeInterpreterTool()
-                elif dfn.tool_name == BuiltinTool.photogen:
-                    tool = PhotogenTool(
-                        dump_dir="/tmp/photogen_dump_" + os.environ["USER"],
-                    )
-                else:
-                    raise ValueError(f"Unknown builtin tool: {dfn.tool_name}")
-
-                builtin_tools.append(
-                    with_safety(
-                        tool, self.safety_api, dfn.input_shields, dfn.output_shields
-                    )
+        for tool_defn in agent_config.tools:
+            if isinstance(tool_defn, WolframAlphaToolDefinition):
+                key = self.config.wolfram_api_key
+                if not key:
+                    raise ValueError("Wolfram API key not defined in config")
+                tool = WolframAlphaTool(key)
+            elif isinstance(tool_defn, BraveSearchToolDefinition):
+                key = self.config.brave_search_api_key
+                if not key:
+                    raise ValueError("Brave API key not defined in config")
+                tool = BraveSearchTool(key)
+            elif isinstance(tool_defn, CodeInterpreterToolDefinition):
+                tool = CodeInterpreterTool()
+            elif isinstance(tool_defn, PhotogenToolDefinition):
+                tool = PhotogenTool(
+                    dump_dir="/tmp/photogen_dump_" + os.environ["USER"],
                 )
             else:
-                custom_tool_definitions.append(dfn)
+                continue
 
-        AGENT_INSTANCES_BY_ID[system_id] = AgentInstance(
-            system_id=system_id,
-            instance_config=request.instance_config,
-            model=request.model,
+            builtin_tools.append(
+                with_safety(
+                    tool,
+                    self.safety_api,
+                    tool_defn.input_shields,
+                    tool_defn.output_shields,
+                )
+            )
+
+        AGENT_INSTANCES_BY_ID[agent_id] = ChatAgent(
+            agent_config=agent_config,
             inference_api=self.inference_api,
-            builtin_tools=builtin_tools,
-            custom_tool_definitions=custom_tool_definitions,
             safety_api=self.safety_api,
-            input_shields=cfg.input_shields,
-            output_shields=cfg.output_shields,
-            prefix_messages=cfg.debug_prefix_messages,
-            tool_prompt_format=cfg.tool_prompt_format,
+            memory_api=self.memory_api,
+            builtin_tools=builtin_tools,
         )
 
         return AgenticSystemCreateResponse(
-            system_id=system_id,
+            agent_id=agent_id,
         )
 
     async def create_agentic_system_session(
         self,
-        request: AgenticSystemSessionCreateRequest,
+        agent_id: str,
+        session_name: str,
     ) -> AgenticSystemSessionCreateResponse:
-        system_id = request.system_id
-        assert system_id in AGENT_INSTANCES_BY_ID, f"System {system_id} not found"
-        agent = AGENT_INSTANCES_BY_ID[system_id]
+        assert agent_id in AGENT_INSTANCES_BY_ID, f"System {agent_id} not found"
+        agent = AGENT_INSTANCES_BY_ID[agent_id]
 
-        session = agent.create_session(request.session_name)
+        session = agent.create_session(session_name)
         return AgenticSystemSessionCreateResponse(
             session_id=session.session_id,
         )
@@ -142,9 +114,9 @@ class MetaReferenceAgenticSystemImpl(AgenticSystem):
         self,
         request: AgenticSystemTurnCreateRequest,
     ) -> AsyncGenerator:
-        system_id = request.system_id
-        assert system_id in AGENT_INSTANCES_BY_ID, f"System {system_id} not found"
-        agent = AGENT_INSTANCES_BY_ID[system_id]
+        agent_id = request.agent_id
+        assert agent_id in AGENT_INSTANCES_BY_ID, f"System {agent_id} not found"
+        agent = AGENT_INSTANCES_BY_ID[agent_id]
 
         assert (
             request.session_id in agent.sessions
diff --git a/llama_toolchain/agentic_system/meta_reference/config.py b/llama_toolchain/agentic_system/meta_reference/config.py
index cff22d03d..367ab17a5 100644
--- a/llama_toolchain/agentic_system/meta_reference/config.py
+++ b/llama_toolchain/agentic_system/meta_reference/config.py
@@ -9,6 +9,6 @@ from typing import Optional
 from pydantic import BaseModel
 
 
-class AgenticSystemConfig(BaseModel):
+class MetaReferenceImplConfig(BaseModel):
     brave_search_api_key: Optional[str] = None
     wolfram_api_key: Optional[str] = None
diff --git a/llama_toolchain/agentic_system/meta_reference/safety.py b/llama_toolchain/agentic_system/meta_reference/safety.py
index 683ae622d..4bbb1f2f1 100644
--- a/llama_toolchain/agentic_system/meta_reference/safety.py
+++ b/llama_toolchain/agentic_system/meta_reference/safety.py
@@ -9,12 +9,13 @@ from typing import List
 from llama_models.llama3.api.datatypes import Message, Role, UserMessage
 from termcolor import cprint
 
-from llama_toolchain.safety.api.datatypes import (
+from llama_toolchain.safety.api import (
     OnViolationAction,
+    RunShieldRequest,
+    Safety,
     ShieldDefinition,
     ShieldResponse,
 )
-from llama_toolchain.safety.api.endpoints import RunShieldRequest, Safety
 
 
 class SafetyException(Exception):  # noqa: N818
diff --git a/llama_toolchain/agentic_system/meta_reference/system_prompt.py b/llama_toolchain/agentic_system/meta_reference/system_prompt.py
deleted file mode 100644
index 9db3218c1..000000000
--- a/llama_toolchain/agentic_system/meta_reference/system_prompt.py
+++ /dev/null
@@ -1,180 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import json
-import textwrap
-from datetime import datetime
-from typing import List
-
-from llama_toolchain.agentic_system.api.datatypes import ToolPromptFormat
-
-from llama_toolchain.inference.api import (
-    BuiltinTool,
-    Message,
-    SystemMessage,
-    ToolDefinition,
-    UserMessage,
-)
-
-from .tools.builtin import SingleMessageBuiltinTool
-
-
-def get_agentic_prefix_messages(
-    builtin_tools: List[SingleMessageBuiltinTool],
-    custom_tools: List[ToolDefinition],
-    tool_prompt_format: ToolPromptFormat,
-) -> List[Message]:
-    messages = []
-    content = ""
-    if builtin_tools:
-        content += "Environment: ipython\n"
-
-        tool_str = ", ".join(
-            [
-                t.get_name()
-                for t in builtin_tools
-                if t.get_name() != BuiltinTool.code_interpreter.value
-            ]
-        )
-        if tool_str:
-            content += f"Tools: {tool_str}"
-
-    current_date = datetime.now()
-    formatted_date = current_date.strftime("%d %B %Y")
-    date_str = f"""
-Cutting Knowledge Date: December 2023
-Today Date: {formatted_date}\n"""
-    content += date_str
-    messages.append(SystemMessage(content=content))
-
-    if custom_tools:
-        if tool_prompt_format == ToolPromptFormat.function_tag:
-            text = prompt_for_function_tag(custom_tools)
-            messages.append(UserMessage(content=text))
-        elif tool_prompt_format == ToolPromptFormat.json:
-            text = prompt_for_json(custom_tools)
-            messages.append(UserMessage(content=text))
-        else:
-            raise NotImplementedError(
-                f"Tool prompt format {tool_prompt_format} is not supported"
-            )
-    else:
-        messages.append(SystemMessage(content=content))
-
-    return messages
-
-
-def prompt_for_json(custom_tools: List[ToolDefinition]) -> str:
-    tool_defs = "\n".join(
-        translate_custom_tool_definition_to_json(t) for t in custom_tools
-    )
-    content = textwrap.dedent(
-        """
-        Answer the user's question by making use of the following functions if needed.
-        If none of the function can be used, please say so.
-        Here is a list of functions in JSON format:
-        {tool_defs}
-
-        Return function calls in JSON format.
-        """
-    )
-    content = content.lstrip("\n").format(tool_defs=tool_defs)
-    return content
-
-
-def prompt_for_function_tag(custom_tools: List[ToolDefinition]) -> str:
-    custom_tool_params = ""
-    for t in custom_tools:
-        custom_tool_params += get_instruction_string(t) + "\n"
-        custom_tool_params += get_parameters_string(t) + "\n\n"
-
-    content = f"""
-You have access to the following functions:
-
-{custom_tool_params}
-Think very carefully before calling functions.
-If you choose to call a function ONLY reply in the following format with no prefix or suffix:
-
-<function=example_function_name>{{"example_name": "example_value"}}</function>
-
-Reminder:
-- If looking for real time information use relevant functions before falling back to brave_search
-- Function calls MUST follow the specified format, start with <function= and end with </function>
-- Required parameters MUST be specified
-- Only call one function at a time
-- Put the entire function call reply on one line
-"""
-    return content
-
-
-def get_instruction_string(custom_tool_definition) -> str:
-    return f"Use the function '{custom_tool_definition.tool_name}' to '{custom_tool_definition.description}'"
-
-
-def get_parameters_string(custom_tool_definition) -> str:
-    return json.dumps(
-        {
-            "name": custom_tool_definition.tool_name,
-            "description": custom_tool_definition.description,
-            "parameters": {
-                name: definition.__dict__
-                for name, definition in custom_tool_definition.parameters.items()
-            },
-        }
-    )
-
-
-def translate_custom_tool_definition_to_json(tool_def):
-    """Translates ToolDefinition to json as expected by model
-    eg. output for a function
-    {
-        "type": "function",
-        "function": {
-            "name": "conv_int",
-            "description": "Convert serialized fract24 integer into int value.",
-            "parameters": {
-                "type": "object",
-                "properties": [
-                    {
-                        "data": {
-                            "type": "object",
-                            "description": ""
-                        }
-                    }
-                ],
-                "required": ["data"]
-            }
-        }
-    }
-    """
-    assert isinstance(tool_def.tool_name, str)
-    func_def = {"type": "function", "function": {}}
-    func_def["function"]["name"] = tool_def.tool_name
-    func_def["function"]["description"] = tool_def.description or ""
-    if tool_def.parameters:
-        required = []
-        properties = []
-        for p_name, p_def in tool_def.parameters.items():
-            properties.append(
-                {
-                    p_name: {
-                        # TODO: see if this should not always be object
-                        "type": "object",
-                        "description": p_def.description or "",
-                    }
-                }
-            )
-            if p_def.required:
-                required.append(p_name)
-        func_def["function"]["parameters"] = {
-            "type": "object",
-            "properties": properties,
-            "required": required,
-        }
-    else:
-        func_def["function"]["parameters"] = {}
-
-    return json.dumps(func_def, indent=4)
diff --git a/llama_toolchain/agentic_system/providers.py b/llama_toolchain/agentic_system/providers.py
index 463c2976e..a722d9400 100644
--- a/llama_toolchain/agentic_system/providers.py
+++ b/llama_toolchain/agentic_system/providers.py
@@ -6,7 +6,7 @@
 
 from typing import List
 
-from llama_toolchain.distribution.datatypes import Api, InlineProviderSpec, ProviderSpec
+from llama_toolchain.core.datatypes import Api, InlineProviderSpec, ProviderSpec
 
 
 def available_agentic_system_providers() -> List[ProviderSpec]:
@@ -16,15 +16,19 @@ def available_agentic_system_providers() -> List[ProviderSpec]:
             provider_id="meta-reference",
             pip_packages=[
                 "codeshield",
+                "matplotlib",
                 "pillow",
+                "pandas",
+                "scikit-learn",
                 "torch",
                 "transformers",
             ],
             module="llama_toolchain.agentic_system.meta_reference",
-            config_class="llama_toolchain.agentic_system.meta_reference.AgenticSystemConfig",
+            config_class="llama_toolchain.agentic_system.meta_reference.MetaReferenceImplConfig",
             api_dependencies=[
                 Api.inference,
                 Api.safety,
+                Api.memory,
             ],
         ),
     ]
diff --git a/llama_toolchain/agentic_system/tools/custom/execute.py b/llama_toolchain/agentic_system/tools/custom/execute.py
deleted file mode 100644
index 4729d35a7..000000000
--- a/llama_toolchain/agentic_system/tools/custom/execute.py
+++ /dev/null
@@ -1,83 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from typing import Any, AsyncGenerator, List
-
-from llama_models.llama3.api.datatypes import StopReason, ToolResponseMessage
-
-from llama_toolchain.agentic_system.api import (
-    AgenticSystem,
-    AgenticSystemTurnCreateRequest,
-    AgenticSystemTurnResponseEventType as EventType,
-)
-
-from llama_toolchain.inference.api import Message
-
-
-async def execute_with_custom_tools(
-    system: AgenticSystem,
-    system_id: str,
-    session_id: str,
-    messages: List[Message],
-    custom_tools: List[Any],
-    max_iters: int = 5,
-    stream: bool = True,
-) -> AsyncGenerator:
-    # first create a session, or do you keep a persistent session?
-    tools_dict = {t.get_name(): t for t in custom_tools}
-
-    current_messages = messages.copy()
-    n_iter = 0
-    while n_iter < max_iters:
-        n_iter += 1
-
-        request = AgenticSystemTurnCreateRequest(
-            system_id=system_id,
-            session_id=session_id,
-            messages=current_messages,
-            stream=stream,
-        )
-
-        turn = None
-        async for chunk in system.create_agentic_system_turn(request):
-            if chunk.event.payload.event_type != EventType.turn_complete.value:
-                yield chunk
-            else:
-                turn = chunk.event.payload.turn
-
-        message = turn.output_message
-        if len(message.tool_calls) == 0:
-            yield chunk
-            return
-
-        if message.stop_reason == StopReason.out_of_tokens:
-            yield chunk
-            return
-
-        tool_call = message.tool_calls[0]
-        if tool_call.tool_name not in tools_dict:
-            m = ToolResponseMessage(
-                call_id=tool_call.call_id,
-                tool_name=tool_call.tool_name,
-                content=f"Unknown tool `{tool_call.tool_name}` was called. Try again with something else",
-            )
-            next_message = m
-        else:
-            tool = tools_dict[tool_call.tool_name]
-            result_messages = await execute_custom_tool(tool, message)
-            next_message = result_messages[0]
-
-        yield next_message
-        current_messages = [next_message]
-
-
-async def execute_custom_tool(tool: Any, message: Message) -> List[Message]:
-    result_messages = await tool.run([message])
-    assert (
-        len(result_messages) == 1
-    ), f"Expected single message, got {len(result_messages)}"
-
-    return result_messages
diff --git a/llama_toolchain/agentic_system/utils.py b/llama_toolchain/agentic_system/utils.py
deleted file mode 100644
index 9613b45df..000000000
--- a/llama_toolchain/agentic_system/utils.py
+++ /dev/null
@@ -1,122 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import uuid
-from typing import Any, List, Optional
-
-from llama_models.llama3.api.datatypes import BuiltinTool, Message, SamplingParams
-
-from llama_toolchain.agentic_system.api import (
-    AgenticSystemCreateRequest,
-    AgenticSystemInstanceConfig,
-    AgenticSystemSessionCreateRequest,
-    AgenticSystemToolDefinition,
-)
-from llama_toolchain.agentic_system.api.datatypes import ToolPromptFormat
-from llama_toolchain.agentic_system.client import AgenticSystemClient
-
-from llama_toolchain.agentic_system.tools.custom.execute import (
-    execute_with_custom_tools,
-)
-from llama_toolchain.safety.api.datatypes import BuiltinShield, ShieldDefinition
-
-
-# TODO: this should move back to the llama-agentic-system repo
-
-
-class AgenticSystemClientWrapper:
-    def __init__(self, api, system_id, custom_tools):
-        self.api = api
-        self.system_id = system_id
-        self.custom_tools = custom_tools
-        self.session_id = None
-
-    async def create_session(self, name: str = None):
-        if name is None:
-            name = f"Session-{uuid.uuid4()}"
-
-        response = await self.api.create_agentic_system_session(
-            AgenticSystemSessionCreateRequest(
-                system_id=self.system_id,
-                session_name=name,
-            )
-        )
-        self.session_id = response.session_id
-        return self.session_id
-
-    async def run(self, messages: List[Message], stream: bool = True):
-        async for chunk in execute_with_custom_tools(
-            self.api,
-            self.system_id,
-            self.session_id,
-            messages,
-            self.custom_tools,
-            stream=stream,
-        ):
-            yield chunk
-
-
-async def get_agent_system_instance(
-    host: str,
-    port: int,
-    custom_tools: Optional[List[Any]] = None,
-    disable_safety: bool = False,
-    model: str = "Meta-Llama3.1-8B-Instruct",
-    tool_prompt_format: ToolPromptFormat = ToolPromptFormat.json,
-) -> AgenticSystemClientWrapper:
-    custom_tools = custom_tools or []
-
-    api = AgenticSystemClient(base_url=f"http://{host}:{port}")
-
-    tool_definitions = [
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.brave_search,
-        ),
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.wolfram_alpha,
-        ),
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.photogen,
-        ),
-        AgenticSystemToolDefinition(
-            tool_name=BuiltinTool.code_interpreter,
-        ),
-    ] + [t.get_tool_definition() for t in custom_tools]
-
-    if not disable_safety:
-        for t in tool_definitions:
-            t.input_shields = [ShieldDefinition(shield_type=BuiltinShield.llama_guard)]
-            t.output_shields = [
-                ShieldDefinition(shield_type=BuiltinShield.llama_guard),
-                ShieldDefinition(shield_type=BuiltinShield.injection_shield),
-            ]
-
-    create_request = AgenticSystemCreateRequest(
-        model=model,
-        instance_config=AgenticSystemInstanceConfig(
-            instructions="You are a helpful assistant",
-            available_tools=tool_definitions,
-            input_shields=(
-                []
-                if disable_safety
-                else [
-                    ShieldDefinition(shield_type=BuiltinShield.llama_guard),
-                    ShieldDefinition(shield_type=BuiltinShield.jailbreak_shield),
-                ]
-            ),
-            output_shields=(
-                []
-                if disable_safety
-                else [
-                    ShieldDefinition(shield_type=BuiltinShield.llama_guard),
-                ]
-            ),
-            sampling_params=SamplingParams(),
-            tool_prompt_format=tool_prompt_format,
-        ),
-    )
-    create_response = await api.create_agentic_system(create_request)
-    return AgenticSystemClientWrapper(api, create_response.system_id, custom_tools)
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/__init__.py b/llama_toolchain/batch_inference/__init__.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/__init__.py
rename to llama_toolchain/batch_inference/__init__.py
diff --git a/llama_toolchain/cli/distribution/__init__.py b/llama_toolchain/batch_inference/api/__init__.py
similarity index 79%
rename from llama_toolchain/cli/distribution/__init__.py
rename to llama_toolchain/batch_inference/api/__init__.py
index 81278f253..a7e55ba91 100644
--- a/llama_toolchain/cli/distribution/__init__.py
+++ b/llama_toolchain/batch_inference/api/__init__.py
@@ -4,4 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .distribution import DistributionParser  # noqa
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/batch_inference/api/api.py b/llama_toolchain/batch_inference/api/api.py
new file mode 100644
index 000000000..a02815388
--- /dev/null
+++ b/llama_toolchain/batch_inference/api/api.py
@@ -0,0 +1,61 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import List, Optional, Protocol
+
+from llama_models.schema_utils import json_schema_type, webmethod
+
+from pydantic import BaseModel, Field
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.inference.api import *  # noqa: F403
+
+
+@json_schema_type
+class BatchCompletionRequest(BaseModel):
+    model: str
+    content_batch: List[InterleavedTextMedia]
+    sampling_params: Optional[SamplingParams] = SamplingParams()
+    logprobs: Optional[LogProbConfig] = None
+
+
+@json_schema_type
+class BatchCompletionResponse(BaseModel):
+    completion_message_batch: List[CompletionMessage]
+
+
+@json_schema_type
+class BatchChatCompletionRequest(BaseModel):
+    model: str
+    messages_batch: List[List[Message]]
+    sampling_params: Optional[SamplingParams] = SamplingParams()
+
+    # zero-shot tool definitions as input to the model
+    tools: Optional[List[ToolDefinition]] = Field(default_factory=list)
+    tool_choice: Optional[ToolChoice] = Field(default=ToolChoice.auto)
+    tool_prompt_format: Optional[ToolPromptFormat] = Field(
+        default=ToolPromptFormat.json
+    )
+    logprobs: Optional[LogProbConfig] = None
+
+
+@json_schema_type
+class BatchChatCompletionResponse(BaseModel):
+    completion_message_batch: List[CompletionMessage]
+
+
+class BatchInference(Protocol):
+    @webmethod(route="/batch_inference/completion")
+    async def batch_completion(
+        self,
+        request: BatchCompletionRequest,
+    ) -> BatchCompletionResponse: ...
+
+    @webmethod(route="/batch_inference/chat_completion")
+    async def batch_chat_completion(
+        self,
+        request: BatchChatCompletionRequest,
+    ) -> BatchChatCompletionResponse: ...
diff --git a/llama_toolchain/cli/distribution/configure.py b/llama_toolchain/cli/distribution/configure.py
deleted file mode 100644
index 94764fce3..000000000
--- a/llama_toolchain/cli/distribution/configure.py
+++ /dev/null
@@ -1,106 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-import json
-import shlex
-
-import yaml
-
-from llama_toolchain.cli.subcommand import Subcommand
-from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR
-from termcolor import cprint
-
-
-class DistributionConfigure(Subcommand):
-    """Llama cli for configuring llama toolchain configs"""
-
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "configure",
-            prog="llama distribution configure",
-            description="configure a llama stack distribution",
-            formatter_class=argparse.RawTextHelpFormatter,
-        )
-        self._add_arguments()
-        self.parser.set_defaults(func=self._run_distribution_configure_cmd)
-
-    def _add_arguments(self):
-        self.parser.add_argument(
-            "--name",
-            type=str,
-            help="Name of the distribution to configure",
-            required=True,
-        )
-
-    def _run_distribution_configure_cmd(self, args: argparse.Namespace) -> None:
-        from llama_toolchain.distribution.datatypes import DistributionConfig
-        from llama_toolchain.distribution.registry import resolve_distribution_spec
-
-        config_file = DISTRIBS_BASE_DIR / args.name / "config.yaml"
-        if not config_file.exists():
-            self.parser.error(
-                f"Could not find {config_file}. Please run `llama distribution install` first"
-            )
-            return
-
-        # we need to find the spec from the name
-        with open(config_file, "r") as f:
-            config = DistributionConfig(**yaml.safe_load(f))
-
-        dist = resolve_distribution_spec(config.spec)
-        if dist is None:
-            raise ValueError(f"Could not find any registered spec `{config.spec}`")
-
-        configure_llama_distribution(dist, config)
-
-
-def configure_llama_distribution(dist: "Distribution", config: "DistributionConfig"):
-    from llama_toolchain.common.exec import run_command
-    from llama_toolchain.common.prompt_for_config import prompt_for_config
-    from llama_toolchain.common.serialize import EnumEncoder
-    from llama_toolchain.distribution.dynamic import instantiate_class_type
-
-    python_exe = run_command(shlex.split("which python"))
-    # simple check
-    conda_env = config.conda_env
-    if conda_env not in python_exe:
-        raise ValueError(
-            f"Please re-run configure by activating the `{conda_env}` conda environment"
-        )
-
-    if config.providers:
-        cprint(
-            f"Configuration already exists for {config.name}. Will overwrite...",
-            "yellow",
-            attrs=["bold"],
-        )
-
-    for api, provider_spec in dist.provider_specs.items():
-        cprint(f"Configuring API surface: {api.value}", "white", attrs=["bold"])
-        config_type = instantiate_class_type(provider_spec.config_class)
-        provider_config = prompt_for_config(
-            config_type,
-            (
-                config_type(**config.providers[api.value])
-                if api.value in config.providers
-                else None
-            ),
-        )
-        print("")
-
-        config.providers[api.value] = {
-            "provider_id": provider_spec.provider_id,
-            **provider_config.dict(),
-        }
-
-    config_path = DISTRIBS_BASE_DIR / config.name / "config.yaml"
-    with open(config_path, "w") as fp:
-        dist_config = json.loads(json.dumps(config.dict(), cls=EnumEncoder))
-        fp.write(yaml.dump(dist_config, sort_keys=False))
-
-    print(f"YAML configuration has been written to {config_path}")
diff --git a/llama_toolchain/cli/distribution/create.py b/llama_toolchain/cli/distribution/create.py
deleted file mode 100644
index f4b6d3f20..000000000
--- a/llama_toolchain/cli/distribution/create.py
+++ /dev/null
@@ -1,43 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-
-from llama_toolchain.cli.subcommand import Subcommand
-
-
-class DistributionCreate(Subcommand):
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "create",
-            prog="llama distribution create",
-            description="create a Llama stack distribution",
-            formatter_class=argparse.RawTextHelpFormatter,
-        )
-        self._add_arguments()
-        self.parser.set_defaults(func=self._run_distribution_create_cmd)
-
-    def _add_arguments(self):
-        self.parser.add_argument(
-            "--name",
-            type=str,
-            help="Name of the distribution to create",
-            required=True,
-        )
-        # for each Api the user wants to support, we should
-        # get the list of available providers, ask which one the user
-        # wants to pick and then ask for their configuration.
-
-    def _run_distribution_create_cmd(self, args: argparse.Namespace) -> None:
-        from llama_toolchain.distribution.registry import resolve_distribution_spec
-
-        dist = resolve_distribution_spec(args.name)
-        if dist is not None:
-            self.parser.error(f"Distribution with name {args.name} already exists")
-            return
-
-        raise NotImplementedError()
diff --git a/llama_toolchain/cli/distribution/distribution.py b/llama_toolchain/cli/distribution/distribution.py
deleted file mode 100644
index 641f360e9..000000000
--- a/llama_toolchain/cli/distribution/distribution.py
+++ /dev/null
@@ -1,34 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-
-from llama_toolchain.cli.subcommand import Subcommand
-
-from .configure import DistributionConfigure
-from .create import DistributionCreate
-from .install import DistributionInstall
-from .list import DistributionList
-from .start import DistributionStart
-
-
-class DistributionParser(Subcommand):
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "distribution",
-            prog="llama distribution",
-            description="Operate on llama stack distributions",
-        )
-
-        subparsers = self.parser.add_subparsers(title="distribution_subcommands")
-
-        # Add sub-commands
-        DistributionList.create(subparsers)
-        DistributionInstall.create(subparsers)
-        DistributionCreate.create(subparsers)
-        DistributionConfigure.create(subparsers)
-        DistributionStart.create(subparsers)
diff --git a/llama_toolchain/cli/distribution/install.py b/llama_toolchain/cli/distribution/install.py
deleted file mode 100644
index cd75effc3..000000000
--- a/llama_toolchain/cli/distribution/install.py
+++ /dev/null
@@ -1,111 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-import os
-
-import pkg_resources
-import yaml
-
-from llama_toolchain.cli.subcommand import Subcommand
-from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR
-
-from termcolor import cprint
-
-
-class DistributionInstall(Subcommand):
-    """Llama cli for configuring llama toolchain configs"""
-
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "install",
-            prog="llama distribution install",
-            description="Install a llama stack distribution",
-            formatter_class=argparse.RawTextHelpFormatter,
-        )
-        self._add_arguments()
-        self.parser.set_defaults(func=self._run_distribution_install_cmd)
-
-    def _add_arguments(self):
-        from llama_toolchain.distribution.registry import available_distribution_specs
-
-        self.parser.add_argument(
-            "--spec",
-            type=str,
-            help="Distribution spec to install (try local-ollama)",
-            required=True,
-            choices=[d.spec_id for d in available_distribution_specs()],
-        )
-        self.parser.add_argument(
-            "--name",
-            type=str,
-            help="What should the installation be called locally?",
-            required=True,
-        )
-        self.parser.add_argument(
-            "--conda-env",
-            type=str,
-            help="conda env in which this distribution will run (default = distribution name)",
-        )
-
-    def _run_distribution_install_cmd(self, args: argparse.Namespace) -> None:
-        from llama_toolchain.common.exec import run_with_pty
-        from llama_toolchain.distribution.datatypes import DistributionConfig
-        from llama_toolchain.distribution.distribution import distribution_dependencies
-        from llama_toolchain.distribution.registry import resolve_distribution_spec
-
-        os.makedirs(DISTRIBS_BASE_DIR, exist_ok=True)
-        script = pkg_resources.resource_filename(
-            "llama_toolchain",
-            "distribution/install_distribution.sh",
-        )
-
-        dist = resolve_distribution_spec(args.spec)
-        if dist is None:
-            self.parser.error(f"Could not find distribution {args.spec}")
-            return
-
-        distrib_dir = DISTRIBS_BASE_DIR / args.name
-        os.makedirs(distrib_dir, exist_ok=True)
-
-        deps = distribution_dependencies(dist)
-        if not args.conda_env:
-            print(f"Using {args.name} as the Conda environment for this distribution")
-
-        conda_env = args.conda_env or args.name
-
-        config_file = distrib_dir / "config.yaml"
-        if config_file.exists():
-            c = DistributionConfig(**yaml.safe_load(config_file.read_text()))
-            if c.spec != dist.spec_id:
-                self.parser.error(
-                    f"already installed distribution with `spec={c.spec}` does not match provided spec `{args.spec}`"
-                )
-                return
-            if c.conda_env != conda_env:
-                self.parser.error(
-                    f"already installed distribution has `conda_env={c.conda_env}` different from provided conda env `{conda_env}`"
-                )
-                return
-        else:
-            with open(config_file, "w") as f:
-                c = DistributionConfig(
-                    spec=dist.spec_id,
-                    name=args.name,
-                    conda_env=conda_env,
-                )
-                f.write(yaml.dump(c.dict(), sort_keys=False))
-
-        return_code = run_with_pty([script, conda_env, args.name, " ".join(deps)])
-
-        assert return_code == 0, cprint(
-            f"Failed to install distribution {dist.spec_id}", color="red"
-        )
-        cprint(
-            f"Distribution `{args.name}` (with spec {dist.spec_id}) has been installed successfully!",
-            color="green",
-        )
diff --git a/llama_toolchain/cli/distribution/start.py b/llama_toolchain/cli/distribution/start.py
deleted file mode 100644
index b854c79dc..000000000
--- a/llama_toolchain/cli/distribution/start.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-import argparse
-
-import pkg_resources
-import yaml
-
-from llama_toolchain.cli.subcommand import Subcommand
-from llama_toolchain.common.config_dirs import DISTRIBS_BASE_DIR
-
-
-class DistributionStart(Subcommand):
-    def __init__(self, subparsers: argparse._SubParsersAction):
-        super().__init__()
-        self.parser = subparsers.add_parser(
-            "start",
-            prog="llama distribution start",
-            description="""start the server for a Llama stack distribution. you should have already installed and configured the distribution""",
-            formatter_class=argparse.RawTextHelpFormatter,
-        )
-        self._add_arguments()
-        self.parser.set_defaults(func=self._run_distribution_start_cmd)
-
-    def _add_arguments(self):
-        self.parser.add_argument(
-            "--name",
-            type=str,
-            help="Name of the distribution to start",
-            required=True,
-        )
-        self.parser.add_argument(
-            "--port",
-            type=int,
-            help="Port to run the server on. Defaults to 5000",
-            default=5000,
-        )
-        self.parser.add_argument(
-            "--disable-ipv6",
-            action="store_true",
-            help="Disable IPv6 support",
-            default=False,
-        )
-
-    def _run_distribution_start_cmd(self, args: argparse.Namespace) -> None:
-        from llama_toolchain.common.exec import run_with_pty
-        from llama_toolchain.distribution.registry import resolve_distribution_spec
-
-        config_file = DISTRIBS_BASE_DIR / args.name / "config.yaml"
-        if not config_file.exists():
-            self.parser.error(
-                f"Could not find {config_file}. Please run `llama distribution install` first"
-            )
-            return
-
-        # we need to find the spec from the name
-        with open(config_file, "r") as f:
-            config = yaml.safe_load(f)
-
-        dist = resolve_distribution_spec(config["spec"])
-        if dist is None:
-            raise ValueError(f"Could not find any registered spec `{config['spec']}`")
-
-        conda_env = config["conda_env"]
-        if not conda_env:
-            raise ValueError(
-                f"Could not find Conda environment for distribution `{args.name}`"
-            )
-
-        script = pkg_resources.resource_filename(
-            "llama_toolchain",
-            "distribution/start_distribution.sh",
-        )
-        args = [script, conda_env, config_file, "--port", str(args.port)] + (
-            ["--disable-ipv6"] if args.disable_ipv6 else []
-        )
-
-        run_with_pty(args)
diff --git a/llama_toolchain/cli/llama.py b/llama_toolchain/cli/llama.py
index 5ff11ae84..9a5530c0c 100644
--- a/llama_toolchain/cli/llama.py
+++ b/llama_toolchain/cli/llama.py
@@ -6,9 +6,9 @@
 
 import argparse
 
-from .distribution import DistributionParser
 from .download import Download
 from .model import ModelParser
+from .stack import StackParser
 
 
 class LlamaCLIParser:
@@ -29,7 +29,7 @@ class LlamaCLIParser:
         # Add sub-commands
         Download.create(subparsers)
         ModelParser.create(subparsers)
-        DistributionParser.create(subparsers)
+        StackParser.create(subparsers)
 
         # Import sub-commands from agentic_system if they exist
         try:
diff --git a/llama_toolchain/cli/model/template.py b/llama_toolchain/cli/model/template.py
index 1915e87d3..2776d9703 100644
--- a/llama_toolchain/cli/model/template.py
+++ b/llama_toolchain/cli/model/template.py
@@ -32,6 +32,16 @@ class ModelTemplate(Subcommand):
         self._add_arguments()
         self.parser.set_defaults(func=self._run_model_template_cmd)
 
+    def _prompt_type(self, value):
+        from llama_models.llama3.api.datatypes import ToolPromptFormat
+
+        try:
+            return ToolPromptFormat(value.lower())
+        except ValueError:
+            raise argparse.ArgumentTypeError(
+                f"{value} is not a valid ToolPromptFormat. Choose from {', '.join(t.value for t in ToolPromptFormat)}"
+            ) from None
+
     def _add_arguments(self):
         self.parser.add_argument(
             "-m",
@@ -46,6 +56,18 @@ class ModelTemplate(Subcommand):
             help="Usecase template name (system_message, user_message, assistant_message, tool_message)...",
             required=False,
         )
+        self.parser.add_argument(
+            "--format",
+            type=str,
+            help="ToolPromptFormat (json or function_tag). This flag is used to print the template in a specific formats.",
+            required=False,
+            default="json",
+        )
+        self.parser.add_argument(
+            "--raw",
+            action="store_true",
+            help="If set to true, don't pretty-print into a table. Useful to copy-paste.",
+        )
 
     def _run_model_template_cmd(self, args: argparse.Namespace) -> None:
         from llama_models.llama3.api.interface import (
@@ -56,22 +78,32 @@ class ModelTemplate(Subcommand):
         from llama_toolchain.cli.table import print_table
 
         if args.name:
-            template, tokens_info = render_jinja_template(args.name)
+            tool_prompt_format = self._prompt_type(args.format)
+            template, tokens_info = render_jinja_template(args.name, tool_prompt_format)
             rendered = ""
             for tok, is_special in tokens_info:
                 if is_special:
                     rendered += colored(tok, "yellow", attrs=["bold"])
                 else:
                     rendered += tok
-            rendered += "\n"
-            print_table(
-                [
-                    ("Name", colored(template.template_name, "white", attrs=["bold"])),
-                    ("Template", rendered),
-                    ("Notes", template.notes),
-                ],
-                separate_rows=True,
-            )
+
+            if not args.raw:
+                rendered = rendered.replace("\n", "↵\n")
+                print_table(
+                    [
+                        (
+                            "Name",
+                            colored(template.template_name, "white", attrs=["bold"]),
+                        ),
+                        ("Template", rendered),
+                        ("Notes", template.notes),
+                    ],
+                    separate_rows=True,
+                )
+            else:
+                print("Template: ", template.template_name)
+                print("=" * 40)
+                print(rendered)
         else:
             templates = list_jinja_templates()
             headers = ["Role", "Template Name"]
diff --git a/llama_toolchain/inference/ollama/__init__.py b/llama_toolchain/cli/stack/__init__.py
similarity index 68%
rename from llama_toolchain/inference/ollama/__init__.py
rename to llama_toolchain/cli/stack/__init__.py
index 40d79618a..fd2cbcdff 100644
--- a/llama_toolchain/inference/ollama/__init__.py
+++ b/llama_toolchain/cli/stack/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .config import OllamaImplConfig  # noqa
-from .ollama import get_provider_impl  # noqa
+from .stack import StackParser  # noqa
diff --git a/llama_toolchain/cli/stack/build.py b/llama_toolchain/cli/stack/build.py
new file mode 100644
index 000000000..c81a6d350
--- /dev/null
+++ b/llama_toolchain/cli/stack/build.py
@@ -0,0 +1,133 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import argparse
+
+from llama_toolchain.cli.subcommand import Subcommand
+from llama_toolchain.core.datatypes import *  # noqa: F403
+
+
+def parse_api_provider_tuples(
+    tuples: str, parser: argparse.ArgumentParser
+) -> Dict[str, ProviderSpec]:
+    from llama_toolchain.core.distribution import api_providers
+
+    all_providers = api_providers()
+
+    deps = {}
+    for dep in tuples.split(","):
+        dep = dep.strip()
+        if not dep:
+            continue
+        api_str, provider = dep.split("=")
+        api = Api(api_str)
+
+        provider = provider.strip()
+        if provider not in all_providers[api]:
+            parser.error(f"Provider `{provider}` is not available for API `{api}`")
+            return
+        deps[api] = all_providers[api][provider]
+
+    return deps
+
+
+class StackBuild(Subcommand):
+    def __init__(self, subparsers: argparse._SubParsersAction):
+        super().__init__()
+        self.parser = subparsers.add_parser(
+            "build",
+            prog="llama stack build",
+            description="Build a Llama stack container",
+            formatter_class=argparse.RawTextHelpFormatter,
+        )
+        self._add_arguments()
+        self.parser.set_defaults(func=self._run_stack_build_command)
+
+    def _add_arguments(self):
+        from llama_toolchain.core.distribution_registry import available_distribution_specs
+        from llama_toolchain.core.package import (
+            BuildType,
+        )
+
+        allowed_ids = [d.distribution_id for d in available_distribution_specs()]
+        self.parser.add_argument(
+            "distribution",
+            type=str,
+            help="Distribution to build (either \"adhoc\" OR one of: {})".format(allowed_ids),
+        )
+        self.parser.add_argument(
+            "api_providers",
+            nargs='?',
+            help="Comma separated list of (api=provider) tuples",
+        )
+
+        self.parser.add_argument(
+            "--name",
+            type=str,
+            help="Name of the build target (image, conda env)",
+            required=True,
+        )
+        self.parser.add_argument(
+            "--type",
+            type=str,
+            default="conda_env",
+            choices=[v.value for v in BuildType],
+        )
+
+    def _run_stack_build_command(self, args: argparse.Namespace) -> None:
+        from llama_toolchain.core.distribution_registry import resolve_distribution_spec
+        from llama_toolchain.core.package import (
+            ApiInput,
+            BuildType,
+            build_package,
+        )
+
+        api_inputs = []
+        if args.distribution == "adhoc":
+            if not args.api_providers:
+                self.parser.error("You must specify API providers with (api=provider,...) for building an adhoc distribution")
+                return
+
+            parsed = parse_api_provider_tuples(args.api_providers, self.parser)
+            for api, provider_spec in parsed.items():
+                for dep in provider_spec.api_dependencies:
+                    if dep not in parsed:
+                        self.parser.error(f"API {api} needs dependency {dep} provided also")
+                        return
+
+                api_inputs.append(
+                    ApiInput(
+                        api=api,
+                        provider=provider_spec.provider_id,
+                    )
+                )
+            docker_image = None
+        else:
+            if args.api_providers:
+                self.parser.error("You cannot specify API providers for pre-registered distributions")
+                return
+
+            dist = resolve_distribution_spec(args.distribution)
+            if dist is None:
+                self.parser.error(f"Could not find distribution {args.distribution}")
+                return
+
+            for api, provider_id in dist.providers.items():
+                api_inputs.append(
+                    ApiInput(
+                        api=api,
+                        provider=provider_id,
+                    )
+                )
+            docker_image = dist.docker_image
+
+        build_package(
+            api_inputs,
+            build_type=BuildType(args.type),
+            name=args.name,
+            distribution_id=args.distribution,
+            docker_image=docker_image,
+        )
diff --git a/llama_toolchain/cli/stack/configure.py b/llama_toolchain/cli/stack/configure.py
new file mode 100644
index 000000000..510601523
--- /dev/null
+++ b/llama_toolchain/cli/stack/configure.py
@@ -0,0 +1,106 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import argparse
+import json
+from pathlib import Path
+
+import yaml
+from termcolor import cprint
+
+from llama_toolchain.cli.subcommand import Subcommand
+from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR
+from llama_toolchain.core.datatypes import *  # noqa: F403
+
+
+class StackConfigure(Subcommand):
+    """Llama cli for configuring llama toolchain configs"""
+
+    def __init__(self, subparsers: argparse._SubParsersAction):
+        super().__init__()
+        self.parser = subparsers.add_parser(
+            "configure",
+            prog="llama stack configure",
+            description="configure a llama stack distribution",
+            formatter_class=argparse.RawTextHelpFormatter,
+        )
+        self._add_arguments()
+        self.parser.set_defaults(func=self._run_stack_configure_cmd)
+
+    def _add_arguments(self):
+        from llama_toolchain.core.distribution_registry import (
+            available_distribution_specs,
+        )
+        from llama_toolchain.core.package import BuildType
+
+        allowed_ids = [d.distribution_id for d in available_distribution_specs()]
+        self.parser.add_argument(
+            "distribution",
+            type=str,
+            choices=allowed_ids,
+            help="Distribution (one of: {})".format(allowed_ids),
+        )
+        self.parser.add_argument(
+            "--name",
+            type=str,
+            help="Name of the build",
+            required=True,
+        )
+        self.parser.add_argument(
+            "--type",
+            type=str,
+            default="conda_env",
+            choices=[v.value for v in BuildType],
+        )
+
+    def _run_stack_configure_cmd(self, args: argparse.Namespace) -> None:
+        from llama_toolchain.core.package import BuildType
+
+        build_type = BuildType(args.type)
+        name = args.name
+        config_file = (
+            BUILDS_BASE_DIR
+            / args.distribution
+            / build_type.descriptor()
+            / f"{name}.yaml"
+        )
+        if not config_file.exists():
+            self.parser.error(
+                f"Could not find {config_file}. Please run `llama stack build` first"
+            )
+            return
+
+        configure_llama_distribution(config_file)
+
+
+def configure_llama_distribution(config_file: Path) -> None:
+    from llama_toolchain.common.serialize import EnumEncoder
+    from llama_toolchain.core.configure import configure_api_providers
+    from llama_toolchain.core.distribution_registry import resolve_distribution_spec
+
+    with open(config_file, "r") as f:
+        config = PackageConfig(**yaml.safe_load(f))
+
+    dist = resolve_distribution_spec(config.distribution_id)
+    if dist is None:
+        raise ValueError(
+            f"Could not find any registered distribution `{config.distribution_id}`"
+        )
+
+    if config.providers:
+        cprint(
+            f"Configuration already exists for {config.distribution_id}. Will overwrite...",
+            "yellow",
+            attrs=["bold"],
+        )
+
+    config.providers = configure_api_providers(config.providers)
+
+    with open(config_file, "w") as fp:
+        to_write = json.loads(json.dumps(config.dict(), cls=EnumEncoder))
+        fp.write(yaml.dump(to_write, sort_keys=False))
+
+    print(f"YAML configuration has been written to {config_file}")
diff --git a/llama_toolchain/cli/distribution/list.py b/llama_toolchain/cli/stack/list.py
similarity index 73%
rename from llama_toolchain/cli/distribution/list.py
rename to llama_toolchain/cli/stack/list.py
index e214490ef..cbd7610f5 100644
--- a/llama_toolchain/cli/distribution/list.py
+++ b/llama_toolchain/cli/stack/list.py
@@ -10,13 +10,13 @@ import json
 from llama_toolchain.cli.subcommand import Subcommand
 
 
-class DistributionList(Subcommand):
+class StackList(Subcommand):
     def __init__(self, subparsers: argparse._SubParsersAction):
         super().__init__()
         self.parser = subparsers.add_parser(
-            "list",
-            prog="llama distribution list",
-            description="Show available llama stack distributions",
+            "list-distributions",
+            prog="llama stack list-distributions",
+            description="Show available Llama Stack Distributions",
             formatter_class=argparse.RawTextHelpFormatter,
         )
         self._add_arguments()
@@ -27,21 +27,23 @@ class DistributionList(Subcommand):
 
     def _run_distribution_list_cmd(self, args: argparse.Namespace) -> None:
         from llama_toolchain.cli.table import print_table
-        from llama_toolchain.distribution.registry import available_distribution_specs
+        from llama_toolchain.core.distribution_registry import (
+            available_distribution_specs,
+        )
 
         # eventually, this should query a registry at llama.meta.com/llamastack/distributions
         headers = [
-            "Spec ID",
-            "ProviderSpecs",
+            "Distribution ID",
+            "Providers",
             "Description",
         ]
 
         rows = []
         for spec in available_distribution_specs():
-            providers = {k.value: v.provider_id for k, v in spec.provider_specs.items()}
+            providers = {k.value: v for k, v in spec.providers.items()}
             rows.append(
                 [
-                    spec.spec_id,
+                    spec.distribution_id,
                     json.dumps(providers, indent=2),
                     spec.description,
                 ]
diff --git a/llama_toolchain/cli/stack/run.py b/llama_toolchain/cli/stack/run.py
new file mode 100644
index 000000000..68853db35
--- /dev/null
+++ b/llama_toolchain/cli/stack/run.py
@@ -0,0 +1,106 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import argparse
+
+from pathlib import Path
+
+import pkg_resources
+import yaml
+
+from llama_toolchain.cli.subcommand import Subcommand
+from llama_toolchain.core.datatypes import *  # noqa: F403
+from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR
+
+
+class StackRun(Subcommand):
+    def __init__(self, subparsers: argparse._SubParsersAction):
+        super().__init__()
+        self.parser = subparsers.add_parser(
+            "run",
+            prog="llama stack run",
+            description="""start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.""",
+            formatter_class=argparse.RawTextHelpFormatter,
+        )
+        self._add_arguments()
+        self.parser.set_defaults(func=self._run_stack_run_cmd)
+
+    def _add_arguments(self):
+        from llama_toolchain.core.package import BuildType
+
+        self.parser.add_argument(
+            "distribution",
+            type=str,
+            help="Distribution whose build you want to start",
+        )
+        self.parser.add_argument(
+            "--name",
+            type=str,
+            help="Name of the build you want to start",
+            required=True,
+        )
+        self.parser.add_argument(
+            "--type",
+            type=str,
+            default="conda_env",
+            choices=[v.value for v in BuildType],
+        )
+        self.parser.add_argument(
+            "--port",
+            type=int,
+            help="Port to run the server on. Defaults to 5000",
+            default=5000,
+        )
+        self.parser.add_argument(
+            "--disable-ipv6",
+            action="store_true",
+            help="Disable IPv6 support",
+            default=False,
+        )
+
+    def _run_stack_run_cmd(self, args: argparse.Namespace) -> None:
+        from llama_toolchain.common.exec import run_with_pty
+        from llama_toolchain.core.package import BuildType
+
+        build_type = BuildType(args.type)
+        build_dir = BUILDS_BASE_DIR / args.distribution / build_type.descriptor()
+        path = build_dir / f"{args.name}.yaml"
+
+        config_file = Path(path)
+
+        if not config_file.exists():
+            self.parser.error(
+                f"File {str(config_file)} does not exist. Did you run `llama stack build`?"
+            )
+            return
+
+        with open(config_file, "r") as f:
+            config = PackageConfig(**yaml.safe_load(f))
+
+        if not config.distribution_id:
+            raise ValueError("Build config appears to be corrupt.")
+
+        if config.docker_image:
+            script = pkg_resources.resource_filename(
+                "llama_toolchain",
+                "core/start_container.sh",
+            )
+            run_args = [script, config.docker_image]
+        else:
+            script = pkg_resources.resource_filename(
+                "llama_toolchain",
+                "core/start_conda_env.sh",
+            )
+            run_args = [
+                script,
+                config.conda_env,
+            ]
+
+        run_args.extend([str(config_file), str(args.port)])
+        if args.disable_ipv6:
+            run_args.append("--disable-ipv6")
+
+        run_with_pty(run_args)
diff --git a/llama_toolchain/cli/stack/stack.py b/llama_toolchain/cli/stack/stack.py
new file mode 100644
index 000000000..cba31e08d
--- /dev/null
+++ b/llama_toolchain/cli/stack/stack.py
@@ -0,0 +1,32 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import argparse
+
+from llama_toolchain.cli.subcommand import Subcommand
+
+from .build import StackBuild
+from .configure import StackConfigure
+from .list import StackList
+from .run import StackRun
+
+
+class StackParser(Subcommand):
+    def __init__(self, subparsers: argparse._SubParsersAction):
+        super().__init__()
+        self.parser = subparsers.add_parser(
+            "stack",
+            prog="llama stack",
+            description="Operations for the Llama Stack / Distributions",
+        )
+
+        subparsers = self.parser.add_subparsers(title="stack_subcommands")
+
+        # Add sub-commands
+        StackBuild.create(subparsers)
+        StackConfigure.create(subparsers)
+        StackList.create(subparsers)
+        StackRun.create(subparsers)
diff --git a/llama_toolchain/common/config_dirs.py b/llama_toolchain/common/config_dirs.py
index e625234ab..adf3876a3 100644
--- a/llama_toolchain/common/config_dirs.py
+++ b/llama_toolchain/common/config_dirs.py
@@ -13,3 +13,5 @@ LLAMA_STACK_CONFIG_DIR = Path(os.path.expanduser("~/.llama/"))
 DISTRIBS_BASE_DIR = LLAMA_STACK_CONFIG_DIR / "distributions"
 
 DEFAULT_CHECKPOINT_DIR = LLAMA_STACK_CONFIG_DIR / "checkpoints"
+
+BUILDS_BASE_DIR = LLAMA_STACK_CONFIG_DIR / "builds"
diff --git a/llama_toolchain/common/serialize.py b/llama_toolchain/common/serialize.py
index 813851fe9..667902beb 100644
--- a/llama_toolchain/common/serialize.py
+++ b/llama_toolchain/common/serialize.py
@@ -5,6 +5,7 @@
 # the root directory of this source tree.
 
 import json
+from datetime import datetime
 from enum import Enum
 
 
@@ -12,4 +13,6 @@ class EnumEncoder(json.JSONEncoder):
     def default(self, obj):
         if isinstance(obj, Enum):
             return obj.value
+        elif isinstance(obj, datetime):
+            return obj.isoformat()
         return super().default(obj)
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/__init__.py b/llama_toolchain/core/__init__.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/__init__.py
rename to llama_toolchain/core/__init__.py
diff --git a/llama_toolchain/distribution/install_distribution.sh b/llama_toolchain/core/build_conda_env.sh
similarity index 50%
rename from llama_toolchain/distribution/install_distribution.sh
rename to llama_toolchain/core/build_conda_env.sh
index 7cb343cfb..0a3eaf20a 100755
--- a/llama_toolchain/distribution/install_distribution.sh
+++ b/llama_toolchain/core/build_conda_env.sh
@@ -10,20 +10,36 @@ LLAMA_MODELS_DIR=${LLAMA_MODELS_DIR:-}
 LLAMA_TOOLCHAIN_DIR=${LLAMA_TOOLCHAIN_DIR:-}
 TEST_PYPI_VERSION=${TEST_PYPI_VERSION:-}
 
+if [ -n "$LLAMA_TOOLCHAIN_DIR" ]; then
+  echo "Using llama-toolchain-dir=$LLAMA_TOOLCHAIN_DIR"
+fi
+if [ -n "$LLAMA_MODELS_DIR" ]; then
+  echo "Using llama-models-dir=$LLAMA_MODELS_DIR"
+fi
+
 set -euo pipefail
 
+if [ "$#" -ne 3 ]; then
+  echo "Usage: $0 <distribution_id> <build_name> <pip_dependencies>" >&2
+  echo "Example: $0 <distribution_id> mybuild 'numpy pandas scipy'" >&2
+  exit 1
+fi
+
+distribution_id="$1"
+build_name="$2"
+env_name="llamastack-$build_name"
+pip_dependencies="$3"
+
 # Define color codes
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 NC='\033[0m' # No Color
 
-error_handler() {
-  echo "Error occurred in script at line: ${1}" >&2
-  exit 1
-}
+# this is set if we actually create a new conda in which case we need to clean up
+ENVNAME=""
 
-# Set up the error trap
-trap 'error_handler ${LINENO}' ERR
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+source "$SCRIPT_DIR/common.sh"
 
 ensure_conda_env_python310() {
   local env_name="$1"
@@ -32,26 +48,29 @@ ensure_conda_env_python310() {
 
   # Check if conda command is available
   if ! command -v conda &>/dev/null; then
-    echo -e "${RED}Error: conda command not found. Is Conda installed and in your PATH?${NC}" >&2
+    printf "${RED}Error: conda command not found. Is Conda installed and in your PATH?${NC}" >&2
     exit 1
   fi
 
   # Check if the environment exists
   if conda env list | grep -q "^${env_name} "; then
-    echo "Conda environment '${env_name}' exists. Checking Python version..."
+    printf "Conda environment '${env_name}' exists. Checking Python version...\n"
 
     # Check Python version in the environment
     current_version=$(conda run -n "${env_name}" python --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2)
 
     if [ "$current_version" = "$python_version" ]; then
-      echo "Environment '${env_name}' already has Python ${python_version}. No action needed."
+      printf "Environment '${env_name}' already has Python ${python_version}. No action needed.\n"
     else
-      echo "Updating environment '${env_name}' to Python ${python_version}..."
+      printf "Updating environment '${env_name}' to Python ${python_version}...\n"
       conda install -n "${env_name}" python="${python_version}" -y
     fi
   else
-    echo "Conda environment '${env_name}' does not exist. Creating with Python ${python_version}..."
+    printf "Conda environment '${env_name}' does not exist. Creating with Python ${python_version}...\n"
     conda create -n "${env_name}" python="${python_version}" -y
+
+    ENVNAME="${env_name}"
+    # setup_cleanup_handlers
   fi
 
   eval "$(conda shell.bash hook)"
@@ -65,48 +84,45 @@ ensure_conda_env_python310() {
     # Re-installing llama-toolchain in the new conda environment
     if [ -n "$LLAMA_TOOLCHAIN_DIR" ]; then
       if [ ! -d "$LLAMA_TOOLCHAIN_DIR" ]; then
-        echo -e "${RED}Warning: LLAMA_TOOLCHAIN_DIR is set but directory does not exist: $LLAMA_TOOLCHAIN_DIR${NC}" >&2
+        printf "${RED}Warning: LLAMA_TOOLCHAIN_DIR is set but directory does not exist: $LLAMA_TOOLCHAIN_DIR${NC}\n" >&2
         exit 1
       fi
 
-      echo "Installing from LLAMA_TOOLCHAIN_DIR: $LLAMA_TOOLCHAIN_DIR"
-      pip install -e "$LLAMA_TOOLCHAIN_DIR"
+      printf "Installing from LLAMA_TOOLCHAIN_DIR: $LLAMA_TOOLCHAIN_DIR\n"
+      pip install --no-cache-dir -e "$LLAMA_TOOLCHAIN_DIR"
     else
-      pip install llama-toolchain
+      pip install --no-cache-dir llama-toolchain
     fi
 
     if [ -n "$LLAMA_MODELS_DIR" ]; then
       if [ ! -d "$LLAMA_MODELS_DIR" ]; then
-        echo -e "${RED}Warning: LLAMA_MODELS_DIR is set but directory does not exist: $LLAMA_MODELS_DIR${NC}" >&2
+        printf "${RED}Warning: LLAMA_MODELS_DIR is set but directory does not exist: $LLAMA_MODELS_DIR${NC}\n" >&2
         exit 1
       fi
 
-      echo "Installing from LLAMA_MODELS_DIR: $LLAMA_MODELS_DIR"
+      printf "Installing from LLAMA_MODELS_DIR: $LLAMA_MODELS_DIR\n"
       pip uninstall -y llama-models
-      pip install -e "$LLAMA_MODELS_DIR"
+      pip install --no-cache-dir -e "$LLAMA_MODELS_DIR"
     fi
 
     # Install pip dependencies
     if [ -n "$pip_dependencies" ]; then
-      echo "Installing pip dependencies: $pip_dependencies"
+      printf "Installing pip dependencies: $pip_dependencies\n"
       pip install $pip_dependencies
     fi
   fi
 }
 
-if [ "$#" -ne 3 ]; then
-  echo "Usage: $0 <environment_name> <distribution_name> <pip_dependencies>" >&2
-  echo "Example: $0 my_env local-llama-8b 'numpy pandas scipy'" >&2
-  exit 1
-fi
-
-env_name="$1"
-distribution_name="$2"
-pip_dependencies="$3"
-
 ensure_conda_env_python310 "$env_name" "$pip_dependencies"
 
-echo -e "${GREEN}Successfully setup distribution environment. Configuring...${NC}"
+printf "${GREEN}Successfully setup conda environment. Configuring build...${NC}\n"
 
-which python3
-python3 -m llama_toolchain.cli.llama distribution configure --name "$distribution_name"
+if [ "$distribution_id" = "adhoc" ]; then
+  subcommand="api"
+  target=""
+else
+  subcommand="stack"
+  target="$distribution_id"
+fi
+
+$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama $subcommand configure $target --name "$build_name" --type conda_env
diff --git a/llama_toolchain/core/build_container.sh b/llama_toolchain/core/build_container.sh
new file mode 100755
index 000000000..5b05f1132
--- /dev/null
+++ b/llama_toolchain/core/build_container.sh
@@ -0,0 +1,120 @@
+#!/bin/bash
+
+LLAMA_MODELS_DIR=${LLAMA_MODELS_DIR:-}
+LLAMA_TOOLCHAIN_DIR=${LLAMA_TOOLCHAIN_DIR:-}
+TEST_PYPI_VERSION=${TEST_PYPI_VERSION:-}
+
+if [ "$#" -ne 4 ]; then
+  echo "Usage: $0 <distribution_id> <build_name> <docker_base> <pip_dependencies>
+  echo "Example: $0 distribution_id my-fastapi-app python:3.9-slim 'fastapi uvicorn'
+  exit 1
+fi
+
+distribution_id=$1
+build_name="$2"
+image_name="llamastack-$build_name"
+docker_base=$3
+pip_dependencies=$4
+
+# Define color codes
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+NC='\033[0m' # No Color
+
+set -euo pipefail
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+REPO_DIR=$(dirname $(dirname "$SCRIPT_DIR"))
+
+TEMP_DIR=$(mktemp -d)
+
+add_to_docker() {
+  local input
+  output_file="$TEMP_DIR/Dockerfile"
+  if [ -t 0 ]; then
+    printf '%s\n' "$1" >>"$output_file"
+  else
+    # If stdin is not a terminal, read from it (heredoc)
+    cat >>"$output_file"
+  fi
+}
+
+add_to_docker <<EOF
+FROM $docker_base
+WORKDIR /app
+
+RUN apt-get update && apt-get install -y \
+       iputils-ping net-tools iproute2 dnsutils telnet \
+       curl wget telnet \
+       procps psmisc lsof \
+       traceroute \
+       && rm -rf /var/lib/apt/lists/*
+
+EOF
+
+toolchain_mount="/app/llama-toolchain-source"
+models_mount="/app/llama-models-source"
+
+if [ -n "$LLAMA_TOOLCHAIN_DIR" ]; then
+  if [ ! -d "$LLAMA_TOOLCHAIN_DIR" ]; then
+    echo "${RED}Warning: LLAMA_TOOLCHAIN_DIR is set but directory does not exist: $LLAMA_TOOLCHAIN_DIR${NC}" >&2
+    exit 1
+  fi
+  add_to_docker "RUN pip install $toolchain_mount"
+else
+  add_to_docker "RUN pip install llama-toolchain"
+fi
+
+if [ -n "$LLAMA_MODELS_DIR" ]; then
+  if [ ! -d "$LLAMA_MODELS_DIR" ]; then
+    echo "${RED}Warning: LLAMA_MODELS_DIR is set but directory does not exist: $LLAMA_MODELS_DIR${NC}" >&2
+    exit 1
+  fi
+
+  add_to_docker <<EOF
+RUN pip uninstall -y llama-models
+RUN pip install $models_mount
+
+EOF
+fi
+
+if [ -n "$pip_dependencies" ]; then
+  add_to_docker "RUN pip install $pip_dependencies"
+fi
+
+add_to_docker <<EOF
+
+# This would be good in production but for debugging flexibility lets not add it right now
+# We need a more solid production ready entrypoint.sh anyway
+#
+# ENTRYPOINT ["python", "-m", "llama_toolchain.core.server"]
+
+EOF
+
+printf "Dockerfile created successfully in $TEMP_DIR/Dockerfile"
+cat $TEMP_DIR/Dockerfile
+printf "\n"
+
+mounts=""
+if [ -n "$LLAMA_TOOLCHAIN_DIR" ]; then
+  mounts="$mounts -v $(readlink -f $LLAMA_TOOLCHAIN_DIR):$toolchain_mount"
+fi
+if [ -n "$LLAMA_MODELS_DIR" ]; then
+  mounts="$mounts -v $(readlink -f $LLAMA_MODELS_DIR):$models_mount"
+fi
+set -x
+podman build -t $image_name -f "$TEMP_DIR/Dockerfile" "$REPO_DIR" $mounts
+set +x
+
+printf "${GREEN}Succesfully setup Podman image. Configuring build...${NC}"
+echo "You can run it with: podman run -p 8000:8000 $image_name"
+
+if [ "$distribution_id" = "adhoc" ]; then
+  subcommand="api"
+  target=""
+else
+  subcommand="stack"
+  target="$distribution_id"
+fi
+
+$CONDA_PREFIX/bin/python3 -m llama_toolchain.cli.llama $subcommand configure $target --name "$build_name" --type container
diff --git a/llama_toolchain/core/common.sh b/llama_toolchain/core/common.sh
new file mode 100755
index 000000000..963eb395b
--- /dev/null
+++ b/llama_toolchain/core/common.sh
@@ -0,0 +1,40 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+cleanup() {
+  envname="$1"
+
+  set +x
+  echo "Cleaning up..."
+  conda deactivate
+  conda env remove --name $envname -y
+}
+
+handle_int() {
+  if [ -n $ENVNAME ]; then
+    cleanup $ENVNAME
+  fi
+  exit 1
+}
+
+handle_exit() {
+  if [ $? -ne 0 ]; then
+    echo -e "\033[1;31mABORTING.\033[0m"
+    if [ -n $ENVNAME ]; then
+      cleanup $ENVNAME
+    fi
+  fi
+}
+
+setup_cleanup_handlers() {
+  trap handle_int INT
+  trap handle_exit EXIT
+
+  __conda_setup="$('conda' 'shell.bash' 'hook' 2>/dev/null)"
+  eval "$__conda_setup"
+
+  conda deactivate
+}
diff --git a/llama_toolchain/core/configure.py b/llama_toolchain/core/configure.py
new file mode 100644
index 000000000..7f9aa0140
--- /dev/null
+++ b/llama_toolchain/core/configure.py
@@ -0,0 +1,50 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import Any, Dict
+
+from llama_toolchain.core.datatypes import *  # noqa: F403
+from termcolor import cprint
+
+from llama_toolchain.common.prompt_for_config import prompt_for_config
+from llama_toolchain.core.distribution import api_providers
+from llama_toolchain.core.dynamic import instantiate_class_type
+
+
+def configure_api_providers(existing_configs: Dict[str, Any]) -> None:
+    all_providers = api_providers()
+
+    provider_configs = {}
+    for api_str, stub_config in existing_configs.items():
+        api = Api(api_str)
+        providers = all_providers[api]
+        provider_id = stub_config["provider_id"]
+        if provider_id not in providers:
+            raise ValueError(
+                f"Unknown provider `{provider_id}` is not available for API `{api_str}`"
+            )
+
+        provider_spec = providers[provider_id]
+        cprint(f"Configuring API: {api_str} ({provider_id})", "white", attrs=["bold"])
+        config_type = instantiate_class_type(provider_spec.config_class)
+
+        try:
+            existing_provider_config = config_type(**stub_config)
+        except Exception:
+            existing_provider_config = None
+
+        provider_config = prompt_for_config(
+            config_type,
+            existing_provider_config,
+        )
+        print("")
+
+        provider_configs[api_str] = {
+            "provider_id": provider_id,
+            **provider_config.dict(),
+        }
+
+    return provider_configs
diff --git a/llama_toolchain/core/datatypes.py b/llama_toolchain/core/datatypes.py
new file mode 100644
index 000000000..cbdda51d4
--- /dev/null
+++ b/llama_toolchain/core/datatypes.py
@@ -0,0 +1,190 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from datetime import datetime
+from enum import Enum
+from typing import Any, Dict, List, Optional
+
+from llama_models.schema_utils import json_schema_type
+
+from pydantic import BaseModel, Field, validator
+
+
+@json_schema_type
+class Api(Enum):
+    inference = "inference"
+    safety = "safety"
+    agentic_system = "agentic_system"
+    memory = "memory"
+
+
+@json_schema_type
+class ApiEndpoint(BaseModel):
+    route: str
+    method: str
+    name: str
+
+
+@json_schema_type
+class ProviderSpec(BaseModel):
+    api: Api
+    provider_id: str
+    config_class: str = Field(
+        ...,
+        description="Fully-qualified classname of the config for this provider",
+    )
+    api_dependencies: List[Api] = Field(
+        default_factory=list,
+        description="Higher-level API surfaces may depend on other providers to provide their functionality",
+    )
+
+
+@json_schema_type
+class AdapterSpec(BaseModel):
+    adapter_id: str = Field(
+        ...,
+        description="Unique identifier for this adapter",
+    )
+    module: str = Field(
+        ...,
+        description="""
+Fully-qualified name of the module to import. The module is expected to have:
+
+ - `get_adapter_impl(config, deps)`: returns the adapter implementation
+""",
+    )
+    pip_packages: List[str] = Field(
+        default_factory=list,
+        description="The pip dependencies needed for this implementation",
+    )
+    config_class: Optional[str] = Field(
+        default=None,
+        description="Fully-qualified classname of the config for this provider",
+    )
+
+
+@json_schema_type
+class InlineProviderSpec(ProviderSpec):
+    pip_packages: List[str] = Field(
+        default_factory=list,
+        description="The pip dependencies needed for this implementation",
+    )
+    docker_image: Optional[str] = Field(
+        default=None,
+        description="""
+The docker image to use for this implementation. If one is provided, pip_packages will be ignored.
+If a provider depends on other providers, the dependencies MUST NOT specify a docker image.
+""",
+    )
+    module: str = Field(
+        ...,
+        description="""
+Fully-qualified name of the module to import. The module is expected to have:
+
+ - `get_provider_impl(config, deps)`: returns the local implementation
+""",
+    )
+
+
+class RemoteProviderConfig(BaseModel):
+    url: str = Field(..., description="The URL for the provider")
+
+    @validator("url")
+    @classmethod
+    def validate_url(cls, url: str) -> str:
+        if not url.startswith("http"):
+            raise ValueError(f"URL must start with http: {url}")
+        return url.rstrip("/")
+
+
+def remote_provider_id(adapter_id: str) -> str:
+    return f"remote::{adapter_id}"
+
+
+@json_schema_type
+class RemoteProviderSpec(ProviderSpec):
+    adapter: Optional[AdapterSpec] = Field(
+        default=None,
+        description="""
+If some code is needed to convert the remote responses into Llama Stack compatible
+API responses, specify the adapter here. If not specified, it indicates the remote
+as being "Llama Stack compatible"
+""",
+    )
+
+    @property
+    def docker_image(self) -> Optional[str]:
+        return None
+
+    @property
+    def module(self) -> str:
+        if self.adapter:
+            return self.adapter.module
+        return f"llama_toolchain.{self.api.value}.client"
+
+    @property
+    def pip_packages(self) -> List[str]:
+        if self.adapter:
+            return self.adapter.pip_packages
+        return []
+
+
+# Can avoid this by using Pydantic computed_field
+def remote_provider_spec(
+    api: Api, adapter: Optional[AdapterSpec] = None
+) -> RemoteProviderSpec:
+    config_class = (
+        adapter.config_class
+        if adapter and adapter.config_class
+        else "llama_toolchain.core.datatypes.RemoteProviderConfig"
+    )
+    provider_id = remote_provider_id(adapter.adapter_id) if adapter else "remote"
+
+    return RemoteProviderSpec(
+        api=api, provider_id=provider_id, config_class=config_class, adapter=adapter
+    )
+
+
+@json_schema_type
+class DistributionSpec(BaseModel):
+    distribution_id: str
+    description: str
+
+    docker_image: Optional[str] = None
+    providers: Dict[Api, str] = Field(
+        default_factory=dict,
+        description="Provider IDs for each of the APIs provided by this distribution",
+    )
+
+
+@json_schema_type
+class PackageConfig(BaseModel):
+    built_at: datetime
+
+    package_name: str = Field(
+        ...,
+        description="""
+Reference to the distribution this package refers to. For unregistered (adhoc) packages,
+this could be just a hash
+""",
+    )
+    distribution_id: Optional[str] = None
+
+    docker_image: Optional[str] = Field(
+        default=None,
+        description="Reference to the docker image if this package refers to a container",
+    )
+    conda_env: Optional[str] = Field(
+        default=None,
+        description="Reference to the conda environment if this package refers to a conda environment",
+    )
+    providers: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="""
+Provider configurations for each of the APIs provided by this package. This includes configurations for
+the dependencies of these providers as well.
+""",
+    )
diff --git a/llama_toolchain/distribution/distribution.py b/llama_toolchain/core/distribution.py
similarity index 71%
rename from llama_toolchain/distribution/distribution.py
rename to llama_toolchain/core/distribution.py
index f96d0cac6..4c50189c0 100644
--- a/llama_toolchain/distribution/distribution.py
+++ b/llama_toolchain/core/distribution.py
@@ -7,11 +7,13 @@
 import inspect
 from typing import Dict, List
 
-from llama_toolchain.agentic_system.api.endpoints import AgenticSystem
+from llama_toolchain.agentic_system.api import AgenticSystem
 from llama_toolchain.agentic_system.providers import available_agentic_system_providers
-from llama_toolchain.inference.api.endpoints import Inference
+from llama_toolchain.inference.api import Inference
 from llama_toolchain.inference.providers import available_inference_providers
-from llama_toolchain.safety.api.endpoints import Safety
+from llama_toolchain.memory.api import Memory
+from llama_toolchain.memory.providers import available_memory_providers
+from llama_toolchain.safety.api import Safety
 from llama_toolchain.safety.providers import available_safety_providers
 
 from .datatypes import (
@@ -20,6 +22,7 @@ from .datatypes import (
     DistributionSpec,
     InlineProviderSpec,
     ProviderSpec,
+    remote_provider_spec,
 )
 
 # These are the dependencies needed by the distribution server.
@@ -40,6 +43,10 @@ def distribution_dependencies(distribution: DistributionSpec) -> List[str]:
     ] + SERVER_DEPENDENCIES
 
 
+def stack_apis() -> List[Api]:
+    return [Api.inference, Api.safety, Api.agentic_system, Api.memory]
+
+
 def api_endpoints() -> Dict[Api, List[ApiEndpoint]]:
     apis = {}
 
@@ -47,6 +54,7 @@ def api_endpoints() -> Dict[Api, List[ApiEndpoint]]:
         Api.inference: Inference,
         Api.safety: Safety,
         Api.agentic_system: AgenticSystem,
+        Api.memory: Memory,
     }
 
     for api, protocol in protocols.items():
@@ -60,9 +68,13 @@ def api_endpoints() -> Dict[Api, List[ApiEndpoint]]:
             webmethod = method.__webmethod__
             route = webmethod.route
 
-            # use `post` for all methods right now until we fix up the `webmethod` openapi
-            # annotation and write our own openapi generator
-            endpoints.append(ApiEndpoint(route=route, method="post", name=name))
+            if webmethod.method == "GET":
+                method = "get"
+            elif webmethod.method == "DELETE":
+                method = "delete"
+            else:
+                method = "post"
+            endpoints.append(ApiEndpoint(route=route, method=method, name=name))
 
         apis[api] = endpoints
 
@@ -78,8 +90,12 @@ def api_providers() -> Dict[Api, Dict[str, ProviderSpec]]:
         a.provider_id: a for a in available_agentic_system_providers()
     }
 
-    return {
+    ret = {
         Api.inference: inference_providers_by_id,
         Api.safety: safety_providers_by_id,
         Api.agentic_system: agentic_system_providers_by_id,
+        Api.memory: {a.provider_id: a for a in available_memory_providers()},
     }
+    for k, v in ret.items():
+        v["remote"] = remote_provider_spec(k)
+    return ret
diff --git a/llama_toolchain/core/distribution_registry.py b/llama_toolchain/core/distribution_registry.py
new file mode 100644
index 000000000..e134fdab6
--- /dev/null
+++ b/llama_toolchain/core/distribution_registry.py
@@ -0,0 +1,69 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from functools import lru_cache
+from typing import List, Optional
+
+from .datatypes import *  # noqa: F403
+
+
+@lru_cache()
+def available_distribution_specs() -> List[DistributionSpec]:
+    return [
+        DistributionSpec(
+            distribution_id="local",
+            description="Use code from `llama_toolchain` itself to serve all llama stack APIs",
+            providers={
+                Api.inference: "meta-reference",
+                Api.memory: "meta-reference-faiss",
+                Api.safety: "meta-reference",
+                Api.agentic_system: "meta-reference",
+            },
+        ),
+        DistributionSpec(
+            distribution_id="remote",
+            description="Point to remote services for all llama stack APIs",
+            providers={x: "remote" for x in Api},
+        ),
+        DistributionSpec(
+            distribution_id="local-ollama",
+            description="Like local, but use ollama for running LLM inference",
+            providers={
+                Api.inference: remote_provider_id("ollama"),
+                Api.safety: "meta-reference",
+                Api.agentic_system: "meta-reference",
+                Api.memory: "meta-reference-faiss",
+            },
+        ),
+        DistributionSpec(
+            distribution_id="local-plus-fireworks-inference",
+            description="Use Fireworks.ai for running LLM inference",
+            providers={
+                Api.inference: remote_provider_id("fireworks"),
+                Api.safety: "meta-reference",
+                Api.agentic_system: "meta-reference",
+                Api.memory: "meta-reference-faiss",
+            },
+        ),
+        DistributionSpec(
+            distribution_id="local-plus-together-inference",
+            description="Use Together.ai for running LLM inference",
+            providers={
+                Api.inference: remote_provider_id("together"),
+                Api.safety: "meta-reference",
+                Api.agentic_system: "meta-reference",
+                Api.memory: "meta-reference-faiss",
+            },
+        ),
+    ]
+
+
+@lru_cache()
+def resolve_distribution_spec(distribution_id: str) -> Optional[DistributionSpec]:
+    for spec in available_distribution_specs():
+        if spec.distribution_id == distribution_id:
+            return spec
+    return None
diff --git a/llama_toolchain/distribution/dynamic.py b/llama_toolchain/core/dynamic.py
similarity index 62%
rename from llama_toolchain/distribution/dynamic.py
rename to llama_toolchain/core/dynamic.py
index 20fa038bf..adb9b5dac 100644
--- a/llama_toolchain/distribution/dynamic.py
+++ b/llama_toolchain/core/dynamic.py
@@ -8,7 +8,7 @@ import asyncio
 import importlib
 from typing import Any, Dict
 
-from .datatypes import InlineProviderSpec, ProviderSpec, RemoteProviderSpec
+from .datatypes import ProviderSpec, RemoteProviderSpec
 
 
 def instantiate_class_type(fully_qualified_name):
@@ -19,18 +19,24 @@ def instantiate_class_type(fully_qualified_name):
 
 # returns a class implementing the protocol corresponding to the Api
 def instantiate_provider(
-    provider_spec: InlineProviderSpec,
+    provider_spec: ProviderSpec,
     provider_config: Dict[str, Any],
     deps: Dict[str, ProviderSpec],
 ):
     module = importlib.import_module(provider_spec.module)
 
     config_type = instantiate_class_type(provider_spec.config_class)
+    if isinstance(provider_spec, RemoteProviderSpec):
+        if provider_spec.adapter:
+            method = "get_adapter_impl"
+        else:
+            method = "get_client_impl"
+    else:
+        method = "get_provider_impl"
+
     config = config_type(**provider_config)
-    return asyncio.run(module.get_provider_impl(config, deps))
-
-
-def instantiate_client(provider_spec: RemoteProviderSpec, base_url: str):
-    module = importlib.import_module(provider_spec.module)
-
-    return asyncio.run(module.get_client_impl(base_url))
+    fn = getattr(module, method)
+    impl = asyncio.run(fn(config, deps))
+    impl.__provider_spec__ = provider_spec
+    impl.__provider_config__ = config
+    return impl
diff --git a/llama_toolchain/core/package.py b/llama_toolchain/core/package.py
new file mode 100644
index 000000000..72bd93152
--- /dev/null
+++ b/llama_toolchain/core/package.py
@@ -0,0 +1,149 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import json
+import os
+from datetime import datetime
+from enum import Enum
+from typing import List, Optional
+
+import pkg_resources
+import yaml
+from pydantic import BaseModel
+
+from termcolor import cprint
+
+from llama_toolchain.common.config_dirs import BUILDS_BASE_DIR
+from llama_toolchain.common.exec import run_with_pty
+from llama_toolchain.common.serialize import EnumEncoder
+
+from llama_toolchain.core.datatypes import *  # noqa: F403
+from llama_toolchain.core.distribution import api_providers, SERVER_DEPENDENCIES
+
+
+class BuildType(Enum):
+    container = "container"
+    conda_env = "conda_env"
+
+    def descriptor(self) -> str:
+        return "docker" if self == self.container else "conda"
+
+
+class Dependencies(BaseModel):
+    pip_packages: List[str]
+    docker_image: Optional[str] = None
+
+
+class ApiInput(BaseModel):
+    api: Api
+    provider: str
+
+
+def build_package(
+    api_inputs: List[ApiInput],
+    build_type: BuildType,
+    name: str,
+    distribution_id: Optional[str] = None,
+    docker_image: Optional[str] = None,
+):
+    if not distribution_id:
+        distribution_id = "adhoc"
+
+    build_dir = BUILDS_BASE_DIR / distribution_id / build_type.descriptor()
+    os.makedirs(build_dir, exist_ok=True)
+
+    package_name = name.replace("::", "-")
+    package_file = build_dir / f"{package_name}.yaml"
+
+    all_providers = api_providers()
+
+    package_deps = Dependencies(
+        docker_image=docker_image or "python:3.10-slim",
+        pip_packages=SERVER_DEPENDENCIES,
+    )
+
+    stub_config = {}
+    for api_input in api_inputs:
+        api = api_input.api
+        providers_for_api = all_providers[api]
+        if api_input.provider not in providers_for_api:
+            raise ValueError(
+                f"Provider `{api_input.provider}` is not available for API `{api}`"
+            )
+
+        provider = providers_for_api[api_input.provider]
+        package_deps.pip_packages.extend(provider.pip_packages)
+        if provider.docker_image:
+            raise ValueError("A stack's dependencies cannot have a docker image")
+
+        stub_config[api.value] = {"provider_id": api_input.provider}
+
+    if package_file.exists():
+        cprint(
+            f"Build `{package_name}` exists; will reconfigure",
+            color="yellow",
+        )
+        c = PackageConfig(**yaml.safe_load(package_file.read_text()))
+        for api_str, new_config in stub_config.items():
+            if api_str not in c.providers:
+                c.providers[api_str] = new_config
+            else:
+                existing_config = c.providers[api_str]
+                if existing_config["provider_id"] != new_config["provider_id"]:
+                    cprint(
+                        f"Provider `{api_str}` has changed from `{existing_config}` to `{new_config}`",
+                        color="yellow",
+                    )
+                    c.providers[api_str] = new_config
+    else:
+        c = PackageConfig(
+            built_at=datetime.now(),
+            package_name=package_name,
+            providers=stub_config,
+        )
+
+    c.distribution_id = distribution_id
+    c.docker_image = package_name if build_type == BuildType.container else None
+    c.conda_env = package_name if build_type == BuildType.conda_env else None
+
+    with open(package_file, "w") as f:
+        to_write = json.loads(json.dumps(c.dict(), cls=EnumEncoder))
+        f.write(yaml.dump(to_write, sort_keys=False))
+
+    if build_type == BuildType.container:
+        script = pkg_resources.resource_filename(
+            "llama_toolchain", "core/build_container.sh"
+        )
+        args = [
+            script,
+            distribution_id,
+            package_name,
+            package_deps.docker_image,
+            " ".join(package_deps.pip_packages),
+        ]
+    else:
+        script = pkg_resources.resource_filename(
+            "llama_toolchain", "core/build_conda_env.sh"
+        )
+        args = [
+            script,
+            distribution_id,
+            package_name,
+            " ".join(package_deps.pip_packages),
+        ]
+
+    return_code = run_with_pty(args)
+    if return_code != 0:
+        cprint(
+            f"Failed to build target {package_name} with return code {return_code}",
+            color="red",
+        )
+        return
+
+    cprint(
+        f"Target `{package_name}` built with configuration at {str(package_file)}",
+        color="green",
+    )
diff --git a/llama_toolchain/distribution/server.py b/llama_toolchain/core/server.py
similarity index 75%
rename from llama_toolchain/distribution/server.py
rename to llama_toolchain/core/server.py
index 8707fa9ed..4de84b726 100644
--- a/llama_toolchain/distribution/server.py
+++ b/llama_toolchain/core/server.py
@@ -5,8 +5,10 @@
 # the root directory of this source tree.
 
 import asyncio
+import inspect
 import json
 import signal
+import traceback
 from collections.abc import (
     AsyncGenerator as AsyncGeneratorABC,
     AsyncIterator as AsyncIteratorABC,
@@ -28,18 +30,17 @@ import fire
 import httpx
 import yaml
 
-from fastapi import FastAPI, HTTPException, Request, Response
+from fastapi import Body, FastAPI, HTTPException, Request, Response
 from fastapi.exceptions import RequestValidationError
 from fastapi.responses import JSONResponse, StreamingResponse
 from fastapi.routing import APIRoute
 from pydantic import BaseModel, ValidationError
 from termcolor import cprint
+from typing_extensions import Annotated
 
-from .datatypes import Api, DistributionSpec, ProviderSpec, RemoteProviderSpec
-from .distribution import api_endpoints
-from .dynamic import instantiate_client, instantiate_provider
-
-from .registry import resolve_distribution_spec
+from .datatypes import Api, InlineProviderSpec, ProviderSpec, RemoteProviderSpec
+from .distribution import api_endpoints, api_providers
+from .dynamic import instantiate_provider
 
 
 def is_async_iterator_type(typ):
@@ -66,6 +67,7 @@ def create_sse_event(data: Any) -> str:
 
 
 async def global_exception_handler(request: Request, exc: Exception):
+    traceback.print_exception(exc)
     http_exc = translate_exception(exc)
 
     return JSONResponse(
@@ -155,9 +157,8 @@ def create_dynamic_passthrough(
     return endpoint
 
 
-def create_dynamic_typed_route(func: Any):
+def create_dynamic_typed_route(func: Any, method: str):
     hints = get_type_hints(func)
-    request_model = next(iter(hints.values()))
     response_model = hints["return"]
 
     # NOTE: I think it is better to just add a method within each Api
@@ -168,7 +169,7 @@ def create_dynamic_typed_route(func: Any):
 
     if is_streaming:
 
-        async def endpoint(request: request_model):
+        async def endpoint(**kwargs):
             async def sse_generator(event_gen):
                 try:
                     async for item in event_gen:
@@ -178,10 +179,7 @@ def create_dynamic_typed_route(func: Any):
                     print("Generator cancelled")
                     await event_gen.aclose()
                 except Exception as e:
-                    print(e)
-                    import traceback
-
-                    traceback.print_exc()
+                    traceback.print_exception(e)
                     yield create_sse_event(
                         {
                             "error": {
@@ -191,25 +189,38 @@ def create_dynamic_typed_route(func: Any):
                     )
 
             return StreamingResponse(
-                sse_generator(func(request)), media_type="text/event-stream"
+                sse_generator(func(**kwargs)), media_type="text/event-stream"
             )
 
     else:
 
-        async def endpoint(request: request_model):
+        async def endpoint(**kwargs):
             try:
                 return (
-                    await func(request)
+                    await func(**kwargs)
                     if asyncio.iscoroutinefunction(func)
-                    else func(request)
+                    else func(**kwargs)
                 )
             except Exception as e:
-                print(e)
-                import traceback
-
-                traceback.print_exc()
+                traceback.print_exception(e)
                 raise translate_exception(e) from e
 
+    sig = inspect.signature(func)
+    if method == "post":
+        # make sure every parameter is annotated with Body() so FASTAPI doesn't
+        # do anything too intelligent and ask for some parameters in the query
+        # and some in the body
+        endpoint.__signature__ = sig.replace(
+            parameters=[
+                param.replace(
+                    annotation=Annotated[param.annotation, Body(..., embed=True)]
+                )
+                for param in sig.parameters.values()
+            ]
+        )
+    else:
+        endpoint.__signature__ = sig
+
     return endpoint
 
 
@@ -219,10 +230,9 @@ def topological_sort(providers: List[ProviderSpec]) -> List[ProviderSpec]:
     def dfs(a: ProviderSpec, visited: Set[Api], stack: List[Api]):
         visited.add(a.api)
 
-        if not isinstance(a, RemoteProviderSpec):
-            for api in a.api_dependencies:
-                if api not in visited:
-                    dfs(by_id[api], visited, stack)
+        for api in a.api_dependencies:
+            if api not in visited:
+                dfs(by_id[api], visited, stack)
 
         stack.append(a.api)
 
@@ -236,9 +246,11 @@ def topological_sort(providers: List[ProviderSpec]) -> List[ProviderSpec]:
     return [by_id[x] for x in stack]
 
 
-def resolve_impls(dist: DistributionSpec, config: Dict[str, Any]) -> Dict[Api, Any]:
+def resolve_impls(
+    provider_specs: Dict[str, ProviderSpec], config: Dict[str, Any]
+) -> Dict[Api, Any]:
     provider_configs = config["providers"]
-    provider_specs = topological_sort(dist.provider_specs.values())
+    provider_specs = topological_sort(provider_specs.values())
 
     impls = {}
     for provider_spec in provider_specs:
@@ -248,15 +260,13 @@ def resolve_impls(dist: DistributionSpec, config: Dict[str, Any]) -> Dict[Api, A
                 f"Could not find provider_spec config for {api}. Please add it to the config"
             )
 
-        provider_config = provider_configs[api.value]
-        if isinstance(provider_spec, RemoteProviderSpec):
-            impls[api] = instantiate_client(
-                provider_spec, provider_config["base_url"].rstrip("/")
-            )
-        else:
+        if isinstance(provider_spec, InlineProviderSpec):
             deps = {api: impls[api] for api in provider_spec.api_dependencies}
-            impl = instantiate_provider(provider_spec, provider_config, deps)
-            impls[api] = impl
+        else:
+            deps = {}
+        provider_config = provider_configs[api.value]
+        impl = instantiate_provider(provider_spec, provider_config, deps)
+        impls[api] = impl
 
     return impls
 
@@ -265,24 +275,36 @@ def main(yaml_config: str, port: int = 5000, disable_ipv6: bool = False):
     with open(yaml_config, "r") as fp:
         config = yaml.safe_load(fp)
 
-    spec = config["spec"]
-    dist = resolve_distribution_spec(spec)
-    if dist is None:
-        raise ValueError(f"Could not find distribution specification `{spec}`")
-
     app = FastAPI()
 
     all_endpoints = api_endpoints()
-    impls = resolve_impls(dist, config)
+    all_providers = api_providers()
 
-    for provider_spec in dist.provider_specs.values():
+    provider_specs = {}
+    for api_str, provider_config in config["providers"].items():
+        api = Api(api_str)
+        providers = all_providers[api]
+        provider_id = provider_config["provider_id"]
+        if provider_id not in providers:
+            raise ValueError(
+                f"Unknown provider `{provider_id}` is not available for API `{api}`"
+            )
+
+        provider_specs[api] = providers[provider_id]
+
+    impls = resolve_impls(provider_specs, config)
+
+    for provider_spec in provider_specs.values():
         api = provider_spec.api
         endpoints = all_endpoints[api]
         impl = impls[api]
 
-        if isinstance(provider_spec, RemoteProviderSpec):
+        if (
+            isinstance(provider_spec, RemoteProviderSpec)
+            and provider_spec.adapter is None
+        ):
             for endpoint in endpoints:
-                url = impl.base_url + endpoint.route
+                url = impl.__provider_config__.url.rstrip("/") + endpoint.route
                 getattr(app, endpoint.method)(endpoint.route)(
                     create_dynamic_passthrough(url)
                 )
@@ -296,7 +318,7 @@ def main(yaml_config: str, port: int = 5000, disable_ipv6: bool = False):
 
                 impl_method = getattr(impl, endpoint.name)
                 getattr(app, endpoint.method)(endpoint.route, response_model=None)(
-                    create_dynamic_typed_route(impl_method)
+                    create_dynamic_typed_route(impl_method, endpoint.method)
                 )
 
     for route in app.routes:
@@ -307,6 +329,7 @@ def main(yaml_config: str, port: int = 5000, disable_ipv6: bool = False):
                 attrs=["bold"],
             )
 
+    app.exception_handler(RequestValidationError)(global_exception_handler)
     app.exception_handler(Exception)(global_exception_handler)
     signal.signal(signal.SIGINT, handle_sigint)
 
diff --git a/llama_toolchain/distribution/start_distribution.sh b/llama_toolchain/core/start_conda_env.sh
similarity index 61%
rename from llama_toolchain/distribution/start_distribution.sh
rename to llama_toolchain/core/start_conda_env.sh
index 271919676..120dda006 100755
--- a/llama_toolchain/distribution/start_distribution.sh
+++ b/llama_toolchain/core/start_conda_env.sh
@@ -8,7 +8,6 @@
 
 set -euo pipefail
 
-# Define color codes
 RED='\033[0;31m'
 NC='\033[0m' # No Color
 
@@ -17,20 +16,27 @@ error_handler() {
   exit 1
 }
 
-# Set up the error trap
 trap 'error_handler ${LINENO}' ERR
 
-if [ $# -lt 2 ]; then
-    echo "Usage: $0 <environment_name> <script_args...>"
-    exit 1
+if [ $# -lt 3 ]; then
+  echo "Usage: $0 <build_name> <yaml_config> <port> <script_args...>"
+  exit 1
 fi
 
+build_name="$1"
+env_name="llamastack-$build_name"
+shift
 
-env_name="$1"
+yaml_config="$1"
+shift
+
+port="$1"
 shift
 
 eval "$(conda shell.bash hook)"
 conda deactivate && conda activate "$env_name"
 
-python_interp=$(conda run -n "$env_name" which python)
-$python_interp -m llama_toolchain.distribution.server "$@"
+$CONDA_PREFIX/bin/python \
+  -m llama_toolchain.core.server \
+  --yaml_config "$yaml_config" \
+  --port "$port" "$@"
diff --git a/llama_toolchain/core/start_container.sh b/llama_toolchain/core/start_container.sh
new file mode 100755
index 000000000..676bcedcf
--- /dev/null
+++ b/llama_toolchain/core/start_container.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+set -euo pipefail
+
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+error_handler() {
+  echo "Error occurred in script at line: ${1}" >&2
+  exit 1
+}
+
+trap 'error_handler ${LINENO}' ERR
+
+if [ $# -lt 3 ]; then
+  echo "Usage: $0 <build_name> <yaml_config> <port> <other_args...>"
+  exit 1
+fi
+
+build_name="$1"
+docker_image="llamastack-$build_name"
+shift
+
+yaml_config="$1"
+shift
+
+port="$1"
+shift
+
+set -x
+podman run -it \
+  -p $port:$port \
+  -v "$yaml_config:/app/config.yaml" \
+  $docker_image \
+  python -m llama_toolchain.core.server \
+  --yaml_config /app/config.yaml \
+  --port $port "$@"
diff --git a/llama_toolchain/dataset/api/__init__.py b/llama_toolchain/dataset/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/dataset/api/__init__.py
+++ b/llama_toolchain/dataset/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/dataset/api/endpoints.py b/llama_toolchain/dataset/api/api.py
similarity index 58%
rename from llama_toolchain/dataset/api/endpoints.py
rename to llama_toolchain/dataset/api/api.py
index 6a88f4b7a..c22fc01b0 100644
--- a/llama_toolchain/dataset/api/endpoints.py
+++ b/llama_toolchain/dataset/api/api.py
@@ -4,13 +4,34 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from typing import Protocol
+from enum import Enum
+from typing import Any, Dict, Optional, Protocol
+
+from llama_models.llama3.api.datatypes import URL
 
 from llama_models.schema_utils import json_schema_type, webmethod
 
 from pydantic import BaseModel
 
-from .datatypes import *  # noqa: F403
+
+@json_schema_type
+class TrainEvalDatasetColumnType(Enum):
+    dialog = "dialog"
+    text = "text"
+    media = "media"
+    number = "number"
+    json = "json"
+
+
+@json_schema_type
+class TrainEvalDataset(BaseModel):
+    """Dataset to be used for training or evaluating language models."""
+
+    # TODO(ashwin): figure out if we need to add an enum for a "dataset type"
+
+    columns: Dict[str, TrainEvalDatasetColumnType]
+    content_url: URL
+    metadata: Optional[Dict[str, Any]] = None
 
 
 @json_schema_type
diff --git a/llama_toolchain/dataset/api/datatypes.py b/llama_toolchain/dataset/api/datatypes.py
deleted file mode 100644
index 32109b37c..000000000
--- a/llama_toolchain/dataset/api/datatypes.py
+++ /dev/null
@@ -1,34 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-from typing import Any, Dict, Optional
-
-from llama_models.llama3.api.datatypes import URL
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel
-
-
-@json_schema_type
-class TrainEvalDatasetColumnType(Enum):
-    dialog = "dialog"
-    text = "text"
-    media = "media"
-    number = "number"
-    json = "json"
-
-
-@json_schema_type
-class TrainEvalDataset(BaseModel):
-    """Dataset to be used for training or evaluating language models."""
-
-    # TODO(ashwin): figure out if we need to add an enum for a "dataset type"
-
-    columns: Dict[str, TrainEvalDatasetColumnType]
-    content_url: URL
-    metadata: Optional[Dict[str, Any]] = None
diff --git a/llama_toolchain/distribution/datatypes.py b/llama_toolchain/distribution/datatypes.py
deleted file mode 100644
index 480024223..000000000
--- a/llama_toolchain/distribution/datatypes.py
+++ /dev/null
@@ -1,106 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-from typing import Any, Dict, List, Optional
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel, Field, validator
-
-
-@json_schema_type
-class Api(Enum):
-    inference = "inference"
-    safety = "safety"
-    agentic_system = "agentic_system"
-
-
-@json_schema_type
-class ApiEndpoint(BaseModel):
-    route: str
-    method: str
-    name: str
-
-
-@json_schema_type
-class ProviderSpec(BaseModel):
-    api: Api
-    provider_id: str
-    config_class: str = Field(
-        ...,
-        description="Fully-qualified classname of the config for this provider",
-    )
-
-
-@json_schema_type
-class InlineProviderSpec(ProviderSpec):
-    pip_packages: List[str] = Field(
-        default_factory=list,
-        description="The pip dependencies needed for this implementation",
-    )
-    module: str = Field(
-        ...,
-        description="""
-Fully-qualified name of the module to import. The module is expected to have:
-
- - `get_provider_impl(config, deps)`: returns the local implementation
-""",
-    )
-    api_dependencies: List[Api] = Field(
-        default_factory=list,
-        description="Higher-level API surfaces may depend on other providers to provide their functionality",
-    )
-
-
-class RemoteProviderConfig(BaseModel):
-    base_url: str = Field(..., description="The base URL for the llama stack provider")
-    api_key: Optional[str] = Field(
-        ..., description="API key, if needed, for the provider"
-    )
-
-    @validator("base_url")
-    @classmethod
-    def validate_base_url(cls, base_url: str) -> str:
-        if not base_url.startswith("http"):
-            raise ValueError(f"URL must start with http: {base_url}")
-        return base_url
-
-
-@json_schema_type
-class RemoteProviderSpec(ProviderSpec):
-    module: str = Field(
-        ...,
-        description="""
-Fully-qualified name of the module to import. The module is expected to have:
- - `get_client_impl(base_url)`: returns a client which can be used to call the remote implementation
-""",
-    )
-    config_class: str = "llama_toolchain.distribution.datatypes.RemoteProviderConfig"
-
-
-@json_schema_type
-class DistributionSpec(BaseModel):
-    spec_id: str
-    description: str
-
-    provider_specs: Dict[Api, ProviderSpec] = Field(
-        default_factory=dict,
-        description="Provider specifications for each of the APIs provided by this distribution",
-    )
-
-
-@json_schema_type
-class DistributionConfig(BaseModel):
-    """References to a installed / configured DistributionSpec"""
-
-    name: str
-    spec: str
-    conda_env: str
-    providers: Dict[str, Any] = Field(
-        default_factory=dict,
-        description="Provider configurations for each of the APIs provided by this distribution",
-    )
diff --git a/llama_toolchain/distribution/registry.py b/llama_toolchain/distribution/registry.py
deleted file mode 100644
index 466b02f7b..000000000
--- a/llama_toolchain/distribution/registry.py
+++ /dev/null
@@ -1,79 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from functools import lru_cache
-from typing import List, Optional
-
-from .datatypes import Api, DistributionSpec, RemoteProviderSpec
-from .distribution import api_providers
-
-
-def client_module(api: Api) -> str:
-    return f"llama_toolchain.{api.value}.client"
-
-
-def remote_spec(api: Api) -> RemoteProviderSpec:
-    return RemoteProviderSpec(
-        api=api,
-        provider_id=f"{api.value}-remote",
-        module=client_module(api),
-    )
-
-
-@lru_cache()
-def available_distribution_specs() -> List[DistributionSpec]:
-    providers = api_providers()
-    return [
-        DistributionSpec(
-            spec_id="local",
-            description="Use code from `llama_toolchain` itself to serve all llama stack APIs",
-            provider_specs={
-                Api.inference: providers[Api.inference]["meta-reference"],
-                Api.safety: providers[Api.safety]["meta-reference"],
-                Api.agentic_system: providers[Api.agentic_system]["meta-reference"],
-            },
-        ),
-        DistributionSpec(
-            spec_id="remote",
-            description="Point to remote services for all llama stack APIs",
-            provider_specs={x: remote_spec(x) for x in providers},
-        ),
-        DistributionSpec(
-            spec_id="local-ollama",
-            description="Like local, but use ollama for running LLM inference",
-            provider_specs={
-                Api.inference: providers[Api.inference]["meta-ollama"],
-                Api.safety: providers[Api.safety]["meta-reference"],
-                Api.agentic_system: providers[Api.agentic_system]["meta-reference"],
-            },
-        ),
-        DistributionSpec(
-            spec_id="remote-fireworks",
-            description="Use Fireworks.ai for running LLM inference",
-            provider_specs={
-                Api.inference: providers[Api.inference]["fireworks"],
-                Api.safety: providers[Api.safety]["meta-reference"],
-                Api.agentic_system: providers[Api.agentic_system]["meta-reference"],
-            },
-        ),
-        DistributionSpec(
-            spec_id="remote-together",
-            description="Use Together.ai for running LLM inference",
-            provider_specs={
-                Api.inference: providers[Api.inference]["together"],
-                Api.safety: providers[Api.safety]["meta-reference"],
-                Api.agentic_system: providers[Api.agentic_system]["meta-reference"],
-            },
-        ),
-    ]
-
-
-@lru_cache()
-def resolve_distribution_spec(spec_id: str) -> Optional[DistributionSpec]:
-    for spec in available_distribution_specs():
-        if spec.spec_id == spec_id:
-            return spec
-    return None
diff --git a/llama_toolchain/evaluations/api/__init__.py b/llama_toolchain/evaluations/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/evaluations/api/__init__.py
+++ b/llama_toolchain/evaluations/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/evaluations/api/endpoints.py b/llama_toolchain/evaluations/api/api.py
similarity index 85%
rename from llama_toolchain/evaluations/api/endpoints.py
rename to llama_toolchain/evaluations/api/api.py
index 25fb570f7..b8f3fa825 100644
--- a/llama_toolchain/evaluations/api/endpoints.py
+++ b/llama_toolchain/evaluations/api/api.py
@@ -4,6 +4,7 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
+from enum import Enum
 from typing import List, Protocol
 
 from llama_models.schema_utils import webmethod
@@ -11,11 +12,34 @@ from llama_models.schema_utils import webmethod
 from pydantic import BaseModel
 
 from llama_models.llama3.api.datatypes import *  # noqa: F403
-from .datatypes import *  # noqa: F403
-from llama_toolchain.dataset.api.datatypes import *  # noqa: F403
+from llama_toolchain.dataset.api import *  # noqa: F403
 from llama_toolchain.common.training_types import *  # noqa: F403
 
 
+class TextGenerationMetric(Enum):
+    perplexity = "perplexity"
+    rouge = "rouge"
+    bleu = "bleu"
+
+
+class QuestionAnsweringMetric(Enum):
+    em = "em"
+    f1 = "f1"
+
+
+class SummarizationMetric(Enum):
+    rouge = "rouge"
+    bleu = "bleu"
+
+
+class EvaluationJob(BaseModel):
+    job_uuid: str
+
+
+class EvaluationJobLogStream(BaseModel):
+    job_uuid: str
+
+
 class EvaluateTaskRequestCommon(BaseModel):
     job_uuid: str
     dataset: TrainEvalDataset
diff --git a/llama_toolchain/evaluations/api/datatypes.py b/llama_toolchain/evaluations/api/datatypes.py
deleted file mode 100644
index 0ba284e9d..000000000
--- a/llama_toolchain/evaluations/api/datatypes.py
+++ /dev/null
@@ -1,33 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-
-from pydantic import BaseModel
-
-
-class TextGenerationMetric(Enum):
-    perplexity = "perplexity"
-    rouge = "rouge"
-    bleu = "bleu"
-
-
-class QuestionAnsweringMetric(Enum):
-    em = "em"
-    f1 = "f1"
-
-
-class SummarizationMetric(Enum):
-    rouge = "rouge"
-    bleu = "bleu"
-
-
-class EvaluationJob(BaseModel):
-    job_uuid: str
-
-
-class EvaluationJobLogStream(BaseModel):
-    job_uuid: str
diff --git a/llama_toolchain/agentic_system/tools/__init__.py b/llama_toolchain/inference/adapters/__init__.py
similarity index 100%
rename from llama_toolchain/agentic_system/tools/__init__.py
rename to llama_toolchain/inference/adapters/__init__.py
diff --git a/llama_toolchain/inference/adapters/fireworks/__init__.py b/llama_toolchain/inference/adapters/fireworks/__init__.py
new file mode 100644
index 000000000..6de34833f
--- /dev/null
+++ b/llama_toolchain/inference/adapters/fireworks/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from .config import FireworksImplConfig
+
+
+async def get_adapter_impl(config: FireworksImplConfig, _deps) -> Inference:
+    from .fireworks import FireworksInferenceAdapter
+
+    assert isinstance(
+        config, FireworksImplConfig
+    ), f"Unexpected config type: {type(config)}"
+    impl = FireworksInferenceAdapter(config)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/inference/fireworks/config.py b/llama_toolchain/inference/adapters/fireworks/config.py
similarity index 100%
rename from llama_toolchain/inference/fireworks/config.py
rename to llama_toolchain/inference/adapters/fireworks/config.py
diff --git a/llama_toolchain/inference/fireworks/fireworks.py b/llama_toolchain/inference/adapters/fireworks/fireworks.py
similarity index 93%
rename from llama_toolchain/inference/fireworks/fireworks.py
rename to llama_toolchain/inference/adapters/fireworks/fireworks.py
index 2e08cc042..c9d6e38fd 100644
--- a/llama_toolchain/inference/fireworks/fireworks.py
+++ b/llama_toolchain/inference/adapters/fireworks/fireworks.py
@@ -5,9 +5,9 @@
 # the root directory of this source tree.
 
 import uuid
-from typing import AsyncGenerator, Dict
+from typing import AsyncGenerator
 
-import httpx
+from fireworks.client import Fireworks
 
 from llama_models.llama3.api.datatypes import (
     BuiltinTool,
@@ -18,20 +18,8 @@ from llama_models.llama3.api.datatypes import (
 )
 from llama_models.llama3.api.tool_utils import ToolUtils
 from llama_models.sku_list import resolve_model
-from fireworks.client import Fireworks
 
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
-from llama_toolchain.inference.api import (
-    ChatCompletionRequest,
-    ChatCompletionResponse,
-    ChatCompletionResponseEvent,
-    ChatCompletionResponseEventType,
-    ChatCompletionResponseStreamChunk,
-    CompletionRequest,
-    Inference,
-    ToolCallDelta,
-    ToolCallParseStatus,
-)
+from llama_toolchain.inference.api import *  # noqa: F403
 
 from .config import FireworksImplConfig
 
@@ -42,18 +30,7 @@ FIREWORKS_SUPPORTED_MODELS = {
 }
 
 
-async def get_provider_impl(
-    config: FireworksImplConfig, _deps: Dict[Api, ProviderSpec]
-) -> Inference:
-    assert isinstance(
-        config, FireworksImplConfig
-    ), f"Unexpected config type: {type(config)}"
-    impl = FireworksInference(config)
-    await impl.initialize()
-    return impl
-
-
-class FireworksInference(Inference):
+class FireworksInferenceAdapter(Inference):
     def __init__(self, config: FireworksImplConfig) -> None:
         self.config = config
 
diff --git a/llama_toolchain/inference/adapters/ollama/__init__.py b/llama_toolchain/inference/adapters/ollama/__init__.py
new file mode 100644
index 000000000..8369a00a5
--- /dev/null
+++ b/llama_toolchain/inference/adapters/ollama/__init__.py
@@ -0,0 +1,15 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from llama_toolchain.core.datatypes import RemoteProviderConfig
+
+
+async def get_adapter_impl(config: RemoteProviderConfig, _deps):
+    from .ollama import OllamaInferenceAdapter
+
+    impl = OllamaInferenceAdapter(config.url)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/inference/ollama/ollama.py b/llama_toolchain/inference/adapters/ollama/ollama.py
similarity index 69%
rename from llama_toolchain/inference/ollama/ollama.py
rename to llama_toolchain/inference/adapters/ollama/ollama.py
index 8901d5c02..375257ea9 100644
--- a/llama_toolchain/inference/ollama/ollama.py
+++ b/llama_toolchain/inference/adapters/ollama/ollama.py
@@ -4,63 +4,37 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-import uuid
-from typing import AsyncGenerator, Dict
+from typing import AsyncGenerator
 
 import httpx
 
-from llama_models.llama3.api.datatypes import (
-    BuiltinTool,
-    CompletionMessage,
-    Message,
-    StopReason,
-    ToolCall,
-)
-from llama_models.llama3.api.tool_utils import ToolUtils
+from llama_models.llama3.api.chat_format import ChatFormat
+from llama_models.llama3.api.datatypes import Message, StopReason
+from llama_models.llama3.api.tokenizer import Tokenizer
 from llama_models.sku_list import resolve_model
 from ollama import AsyncClient
 
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
-from llama_toolchain.inference.api import (
-    ChatCompletionRequest,
-    ChatCompletionResponse,
-    ChatCompletionResponseEvent,
-    ChatCompletionResponseEventType,
-    ChatCompletionResponseStreamChunk,
-    CompletionRequest,
-    Inference,
-    ToolCallDelta,
-    ToolCallParseStatus,
-)
-
-from .config import OllamaImplConfig
+from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.inference.prepare_messages import prepare_messages
 
 # TODO: Eventually this will move to the llama cli model list command
 # mapping of Model SKUs to ollama models
 OLLAMA_SUPPORTED_SKUS = {
+    # "Meta-Llama3.1-8B-Instruct": "llama3.1",
     "Meta-Llama3.1-8B-Instruct": "llama3.1:8b-instruct-fp16",
     "Meta-Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16",
 }
 
 
-async def get_provider_impl(
-    config: OllamaImplConfig, _deps: Dict[Api, ProviderSpec]
-) -> Inference:
-    assert isinstance(
-        config, OllamaImplConfig
-    ), f"Unexpected config type: {type(config)}"
-    impl = OllamaInference(config)
-    await impl.initialize()
-    return impl
-
-
-class OllamaInference(Inference):
-    def __init__(self, config: OllamaImplConfig) -> None:
-        self.config = config
+class OllamaInferenceAdapter(Inference):
+    def __init__(self, url: str) -> None:
+        self.url = url
+        tokenizer = Tokenizer.get_instance()
+        self.formatter = ChatFormat(tokenizer)
 
     @property
     def client(self) -> AsyncClient:
-        return AsyncClient(host=self.config.url)
+        return AsyncClient(host=self.url)
 
     async def initialize(self) -> None:
         try:
@@ -111,6 +85,7 @@ class OllamaInference(Inference):
         return options
 
     async def chat_completion(self, request: ChatCompletionRequest) -> AsyncGenerator:
+        messages = prepare_messages(request)
         # accumulate sampling params and other options to pass to ollama
         options = self.get_ollama_chat_options(request)
         ollama_model = self.resolve_ollama_model(request.model)
@@ -132,7 +107,7 @@ class OllamaInference(Inference):
         if not request.stream:
             r = await self.client.chat(
                 model=ollama_model,
-                messages=self._messages_to_ollama_messages(request.messages),
+                messages=self._messages_to_ollama_messages(messages),
                 stream=False,
                 options=options,
             )
@@ -143,9 +118,8 @@ class OllamaInference(Inference):
                 elif r["done_reason"] == "length":
                     stop_reason = StopReason.out_of_tokens
 
-            completion_message = decode_assistant_message_from_content(
-                r["message"]["content"],
-                stop_reason,
+            completion_message = self.formatter.decode_assistant_message_from_content(
+                r["message"]["content"], stop_reason
             )
             yield ChatCompletionResponse(
                 completion_message=completion_message,
@@ -160,7 +134,7 @@ class OllamaInference(Inference):
             )
             stream = await self.client.chat(
                 model=ollama_model,
-                messages=self._messages_to_ollama_messages(request.messages),
+                messages=self._messages_to_ollama_messages(messages),
                 stream=True,
                 options=options,
             )
@@ -228,7 +202,9 @@ class OllamaInference(Inference):
                     )
 
             # parse tool calls and report errors
-            message = decode_assistant_message_from_content(buffer, stop_reason)
+            message = self.formatter.decode_assistant_message_from_content(
+                buffer, stop_reason
+            )
             parsed_tool_calls = len(message.tool_calls) > 0
             if ipython and not parsed_tool_calls:
                 yield ChatCompletionResponseStreamChunk(
@@ -261,70 +237,3 @@ class OllamaInference(Inference):
                     stop_reason=stop_reason,
                 )
             )
-
-
-# TODO: Consolidate this with impl in llama-models
-def decode_assistant_message_from_content(
-    content: str,
-    stop_reason: StopReason,
-) -> CompletionMessage:
-    ipython = content.startswith("<|python_tag|>")
-    if ipython:
-        content = content[len("<|python_tag|>") :]
-
-    if content.endswith("<|eot_id|>"):
-        content = content[: -len("<|eot_id|>")]
-        stop_reason = StopReason.end_of_turn
-    elif content.endswith("<|eom_id|>"):
-        content = content[: -len("<|eom_id|>")]
-        stop_reason = StopReason.end_of_message
-
-    tool_name = None
-    tool_arguments = {}
-
-    custom_tool_info = ToolUtils.maybe_extract_custom_tool_call(content)
-    if custom_tool_info is not None:
-        tool_name, tool_arguments = custom_tool_info
-        # Sometimes when agent has custom tools alongside builin tools
-        # Agent responds for builtin tool calls in the format of the custom tools
-        # This code tries to handle that case
-        if tool_name in BuiltinTool.__members__:
-            tool_name = BuiltinTool[tool_name]
-            tool_arguments = {
-                "query": list(tool_arguments.values())[0],
-            }
-    else:
-        builtin_tool_info = ToolUtils.maybe_extract_builtin_tool_call(content)
-        if builtin_tool_info is not None:
-            tool_name, query = builtin_tool_info
-            tool_arguments = {
-                "query": query,
-            }
-            if tool_name in BuiltinTool.__members__:
-                tool_name = BuiltinTool[tool_name]
-        elif ipython:
-            tool_name = BuiltinTool.code_interpreter
-            tool_arguments = {
-                "code": content,
-            }
-
-    tool_calls = []
-    if tool_name is not None and tool_arguments is not None:
-        call_id = str(uuid.uuid4())
-        tool_calls.append(
-            ToolCall(
-                call_id=call_id,
-                tool_name=tool_name,
-                arguments=tool_arguments,
-            )
-        )
-        content = ""
-
-    if stop_reason is None:
-        stop_reason = StopReason.out_of_tokens
-
-    return CompletionMessage(
-        content=content,
-        stop_reason=stop_reason,
-        tool_calls=tool_calls,
-    )
diff --git a/llama_toolchain/inference/adapters/together/__init__.py b/llama_toolchain/inference/adapters/together/__init__.py
new file mode 100644
index 000000000..ad8bc2ac1
--- /dev/null
+++ b/llama_toolchain/inference/adapters/together/__init__.py
@@ -0,0 +1,18 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from .config import TogetherImplConfig
+
+
+async def get_adapter_impl(config: TogetherImplConfig, _deps) -> Inference:
+    from .together import TogetherInferenceAdapter
+
+    assert isinstance(
+        config, TogetherImplConfig
+    ), f"Unexpected config type: {type(config)}"
+    impl = TogetherInferenceAdapter(config)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/inference/together/config.py b/llama_toolchain/inference/adapters/together/config.py
similarity index 100%
rename from llama_toolchain/inference/together/config.py
rename to llama_toolchain/inference/adapters/together/config.py
diff --git a/llama_toolchain/inference/together/together.py b/llama_toolchain/inference/adapters/together/together.py
similarity index 93%
rename from llama_toolchain/inference/together/together.py
rename to llama_toolchain/inference/adapters/together/together.py
index e7ccf623e..b8f63df65 100644
--- a/llama_toolchain/inference/together/together.py
+++ b/llama_toolchain/inference/adapters/together/together.py
@@ -5,7 +5,7 @@
 # the root directory of this source tree.
 
 import uuid
-from typing import AsyncGenerator, Dict
+from typing import AsyncGenerator
 
 from llama_models.llama3.api.datatypes import (
     BuiltinTool,
@@ -18,18 +18,7 @@ from llama_models.llama3.api.tool_utils import ToolUtils
 from llama_models.sku_list import resolve_model
 from together import Together
 
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
-from llama_toolchain.inference.api import (
-    ChatCompletionRequest,
-    ChatCompletionResponse,
-    ChatCompletionResponseEvent,
-    ChatCompletionResponseEventType,
-    ChatCompletionResponseStreamChunk,
-    CompletionRequest,
-    Inference,
-    ToolCallDelta,
-    ToolCallParseStatus,
-)
+from llama_toolchain.inference.api import *  # noqa: F403
 
 from .config import TogetherImplConfig
 
@@ -40,18 +29,7 @@ TOGETHER_SUPPORTED_MODELS = {
 }
 
 
-async def get_provider_impl(
-    config: TogetherImplConfig, _deps: Dict[Api, ProviderSpec]
-) -> Inference:
-    assert isinstance(
-        config, TogetherImplConfig
-    ), f"Unexpected config type: {type(config)}"
-    impl = TogetherInference(config)
-    await impl.initialize()
-    return impl
-
-
-class TogetherInference(Inference):
+class TogetherInferenceAdapter(Inference):
     def __init__(self, config: TogetherImplConfig) -> None:
         self.config = config
 
diff --git a/llama_toolchain/inference/api/__init__.py b/llama_toolchain/inference/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/inference/api/__init__.py
+++ b/llama_toolchain/inference/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/inference/api/endpoints.py b/llama_toolchain/inference/api/api.py
similarity index 51%
rename from llama_toolchain/inference/api/endpoints.py
rename to llama_toolchain/inference/api/api.py
index ef1c7b159..7298cb27b 100644
--- a/llama_toolchain/inference/api/endpoints.py
+++ b/llama_toolchain/inference/api/api.py
@@ -4,17 +4,79 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F403
-from typing import Optional, Protocol
+from enum import Enum
 
-# this dependency is annoying and we need a forked up version anyway
-from llama_models.schema_utils import webmethod
+from typing import List, Literal, Optional, Protocol, Union
+
+from llama_models.schema_utils import json_schema_type, webmethod
+
+from pydantic import BaseModel, Field
+from typing_extensions import Annotated
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+
+
+class LogProbConfig(BaseModel):
+    top_k: Optional[int] = 0
+
+
+@json_schema_type
+class QuantizationType(Enum):
+    bf16 = "bf16"
+    fp8 = "fp8"
+
+
+@json_schema_type
+class Fp8QuantizationConfig(BaseModel):
+    type: Literal[QuantizationType.fp8.value] = QuantizationType.fp8.value
+
+
+@json_schema_type
+class Bf16QuantizationConfig(BaseModel):
+    type: Literal[QuantizationType.bf16.value] = QuantizationType.bf16.value
+
+
+QuantizationConfig = Annotated[
+    Union[Bf16QuantizationConfig, Fp8QuantizationConfig],
+    Field(discriminator="type"),
+]
+
+
+@json_schema_type
+class ChatCompletionResponseEventType(Enum):
+    start = "start"
+    complete = "complete"
+    progress = "progress"
+
+
+@json_schema_type
+class ToolCallParseStatus(Enum):
+    started = "started"
+    in_progress = "in_progress"
+    failure = "failure"
+    success = "success"
+
+
+@json_schema_type
+class ToolCallDelta(BaseModel):
+    content: Union[str, ToolCall]
+    parse_status: ToolCallParseStatus
+
+
+@json_schema_type
+class ChatCompletionResponseEvent(BaseModel):
+    """Chat completion response event."""
+
+    event_type: ChatCompletionResponseEventType
+    delta: Union[str, ToolCallDelta]
+    logprobs: Optional[List[TokenLogProbs]] = None
+    stop_reason: Optional[StopReason] = None
 
 
 @json_schema_type
 class CompletionRequest(BaseModel):
     model: str
-    content: InterleavedTextAttachment
+    content: InterleavedTextMedia
     sampling_params: Optional[SamplingParams] = SamplingParams()
 
     stream: Optional[bool] = False
@@ -39,7 +101,7 @@ class CompletionResponseStreamChunk(BaseModel):
 @json_schema_type
 class BatchCompletionRequest(BaseModel):
     model: str
-    content_batch: List[InterleavedTextAttachment]
+    content_batch: List[InterleavedTextMedia]
     sampling_params: Optional[SamplingParams] = SamplingParams()
     logprobs: Optional[LogProbConfig] = None
 
@@ -56,7 +118,11 @@ class ChatCompletionRequest(BaseModel):
     sampling_params: Optional[SamplingParams] = SamplingParams()
 
     # zero-shot tool definitions as input to the model
-    available_tools: Optional[List[ToolDefinition]] = Field(default_factory=list)
+    tools: Optional[List[ToolDefinition]] = Field(default_factory=list)
+    tool_choice: Optional[ToolChoice] = Field(default=ToolChoice.auto)
+    tool_prompt_format: Optional[ToolPromptFormat] = Field(
+        default=ToolPromptFormat.json
+    )
 
     stream: Optional[bool] = False
     logprobs: Optional[LogProbConfig] = None
@@ -82,8 +148,11 @@ class BatchChatCompletionRequest(BaseModel):
     sampling_params: Optional[SamplingParams] = SamplingParams()
 
     # zero-shot tool definitions as input to the model
-    available_tools: Optional[List[ToolDefinition]] = Field(default_factory=list)
-
+    tools: Optional[List[ToolDefinition]] = Field(default_factory=list)
+    tool_choice: Optional[ToolChoice] = Field(default=ToolChoice.auto)
+    tool_prompt_format: Optional[ToolPromptFormat] = Field(
+        default=ToolPromptFormat.json
+    )
     logprobs: Optional[LogProbConfig] = None
 
 
@@ -92,6 +161,11 @@ class BatchChatCompletionResponse(BaseModel):
     completion_message_batch: List[CompletionMessage]
 
 
+@json_schema_type
+class EmbeddingsResponse(BaseModel):
+    embeddings: List[List[float]]
+
+
 class Inference(Protocol):
     @webmethod(route="/inference/completion")
     async def completion(
@@ -105,14 +179,9 @@ class Inference(Protocol):
         request: ChatCompletionRequest,
     ) -> Union[ChatCompletionResponse, ChatCompletionResponseStreamChunk]: ...
 
-    @webmethod(route="/inference/batch_completion")
-    async def batch_completion(
+    @webmethod(route="/inference/embeddings")
+    async def embeddings(
         self,
-        request: BatchCompletionRequest,
-    ) -> BatchCompletionResponse: ...
-
-    @webmethod(route="/inference/batch_chat_completion")
-    async def batch_chat_completion(
-        self,
-        request: BatchChatCompletionRequest,
-    ) -> BatchChatCompletionResponse: ...
+        model: str,
+        contents: List[InterleavedTextMedia],
+    ) -> EmbeddingsResponse: ...
diff --git a/llama_toolchain/inference/api/datatypes.py b/llama_toolchain/inference/api/datatypes.py
deleted file mode 100644
index 571ecc3ea..000000000
--- a/llama_toolchain/inference/api/datatypes.py
+++ /dev/null
@@ -1,72 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-from typing import List, Literal, Optional, Union
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel, Field
-from typing_extensions import Annotated
-
-from llama_models.llama3.api.datatypes import *  # noqa: F403
-
-
-class LogProbConfig(BaseModel):
-    top_k: Optional[int] = 0
-
-
-@json_schema_type
-class QuantizationType(Enum):
-    bf16 = "bf16"
-    fp8 = "fp8"
-
-
-@json_schema_type
-class Fp8QuantizationConfig(BaseModel):
-    type: Literal[QuantizationType.fp8.value] = QuantizationType.fp8.value
-
-
-@json_schema_type
-class Bf16QuantizationConfig(BaseModel):
-    type: Literal[QuantizationType.bf16.value] = QuantizationType.bf16.value
-
-
-QuantizationConfig = Annotated[
-    Union[Bf16QuantizationConfig, Fp8QuantizationConfig],
-    Field(discriminator="type"),
-]
-
-
-@json_schema_type
-class ChatCompletionResponseEventType(Enum):
-    start = "start"
-    complete = "complete"
-    progress = "progress"
-
-
-@json_schema_type
-class ToolCallParseStatus(Enum):
-    started = "started"
-    in_progress = "in_progress"
-    failure = "failure"
-    success = "success"
-
-
-@json_schema_type
-class ToolCallDelta(BaseModel):
-    content: Union[str, ToolCall]
-    parse_status: ToolCallParseStatus
-
-
-@json_schema_type
-class ChatCompletionResponseEvent(BaseModel):
-    """Chat completion response event."""
-
-    event_type: ChatCompletionResponseEventType
-    delta: Union[str, ToolCallDelta]
-    logprobs: Optional[List[TokenLogProbs]] = None
-    stop_reason: Optional[StopReason] = None
diff --git a/llama_toolchain/inference/client.py b/llama_toolchain/inference/client.py
index aa84f906d..5ba9314bc 100644
--- a/llama_toolchain/inference/client.py
+++ b/llama_toolchain/inference/client.py
@@ -6,12 +6,15 @@
 
 import asyncio
 import json
-from typing import AsyncGenerator
+from typing import Any, AsyncGenerator
 
 import fire
 import httpx
+from pydantic import BaseModel
 from termcolor import cprint
 
+from llama_toolchain.core.datatypes import RemoteProviderConfig
+
 from .api import (
     ChatCompletionRequest,
     ChatCompletionResponse,
@@ -23,13 +26,16 @@ from .api import (
 from .event_logger import EventLogger
 
 
-async def get_client_impl(base_url: str):
-    return InferenceClient(base_url)
+async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Inference:
+    return InferenceClient(config.url)
+
+
+def encodable_dict(d: BaseModel):
+    return json.loads(d.json())
 
 
 class InferenceClient(Inference):
     def __init__(self, base_url: str):
-        print(f"Initializing client for {base_url}")
         self.base_url = base_url
 
     async def initialize(self) -> None:
@@ -46,7 +52,9 @@ class InferenceClient(Inference):
             async with client.stream(
                 "POST",
                 f"{self.base_url}/inference/chat_completion",
-                data=request.json(),
+                json={
+                    "request": encodable_dict(request),
+                },
                 headers={"Content-Type": "application/json"},
                 timeout=20,
             ) as response:
diff --git a/llama_toolchain/inference/meta_reference/__init__.py b/llama_toolchain/inference/meta_reference/__init__.py
index 87a08816e..64d315e79 100644
--- a/llama_toolchain/inference/meta_reference/__init__.py
+++ b/llama_toolchain/inference/meta_reference/__init__.py
@@ -5,4 +5,15 @@
 # the root directory of this source tree.
 
 from .config import MetaReferenceImplConfig  # noqa
-from .inference import get_provider_impl  # noqa
+
+
+async def get_provider_impl(config: MetaReferenceImplConfig, _deps):
+    from .inference import MetaReferenceInferenceImpl
+
+    assert isinstance(
+        config, MetaReferenceImplConfig
+    ), f"Unexpected config type: {type(config)}"
+
+    impl = MetaReferenceInferenceImpl(config)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/inference/meta_reference/config.py b/llama_toolchain/inference/meta_reference/config.py
index f85934118..d2e601680 100644
--- a/llama_toolchain/inference/meta_reference/config.py
+++ b/llama_toolchain/inference/meta_reference/config.py
@@ -11,10 +11,10 @@ from llama_models.datatypes import ModelFamily
 from llama_models.schema_utils import json_schema_type
 from llama_models.sku_list import all_registered_models
 
-from llama_toolchain.inference.api import QuantizationConfig
-
 from pydantic import BaseModel, Field, field_validator
 
+from llama_toolchain.inference.api import QuantizationConfig
+
 
 @json_schema_type
 class MetaReferenceImplConfig(BaseModel):
diff --git a/llama_toolchain/inference/meta_reference/generation.py b/llama_toolchain/inference/meta_reference/generation.py
index 058874702..1329f8699 100644
--- a/llama_toolchain/inference/meta_reference/generation.py
+++ b/llama_toolchain/inference/meta_reference/generation.py
@@ -24,7 +24,7 @@ from fairscale.nn.model_parallel.initialize import (
 )
 from llama_models.llama3.api.args import ModelArgs
 from llama_models.llama3.api.chat_format import ChatFormat, ModelInput
-from llama_models.llama3.api.datatypes import Message
+from llama_models.llama3.api.datatypes import Message, ToolPromptFormat
 from llama_models.llama3.api.tokenizer import Tokenizer
 from llama_models.llama3.reference_impl.model import Transformer
 from llama_models.sku_list import resolve_model
@@ -279,6 +279,7 @@ class Llama:
         top_p: float = 0.9,
         max_gen_len: Optional[int] = None,
         logprobs: bool = False,
+        tool_prompt_format: ToolPromptFormat = ToolPromptFormat.json,
     ) -> Generator:
         if (
             max_gen_len is None
@@ -288,7 +289,10 @@ class Llama:
             max_gen_len = self.model.params.max_seq_len - 1
 
         yield from self.generate(
-            model_input=self.formatter.encode_dialog_prompt(messages),
+            model_input=self.formatter.encode_dialog_prompt(
+                messages,
+                tool_prompt_format,
+            ),
             max_gen_len=max_gen_len,
             temperature=temperature,
             top_p=top_p,
diff --git a/llama_toolchain/inference/meta_reference/inference.py b/llama_toolchain/inference/meta_reference/inference.py
index 84caf1ecf..187d5baae 100644
--- a/llama_toolchain/inference/meta_reference/inference.py
+++ b/llama_toolchain/inference/meta_reference/inference.py
@@ -6,12 +6,11 @@
 
 import asyncio
 
-from typing import AsyncIterator, Dict, Union
+from typing import AsyncIterator, Union
 
 from llama_models.llama3.api.datatypes import StopReason
 from llama_models.sku_list import resolve_model
 
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
 from llama_toolchain.inference.api import (
     ChatCompletionRequest,
     ChatCompletionResponse,
@@ -22,23 +21,11 @@ from llama_toolchain.inference.api import (
     ToolCallDelta,
     ToolCallParseStatus,
 )
-
+from llama_toolchain.inference.prepare_messages import prepare_messages
 from .config import MetaReferenceImplConfig
 from .model_parallel import LlamaModelParallelGenerator
 
 
-async def get_provider_impl(
-    config: MetaReferenceImplConfig, _deps: Dict[Api, ProviderSpec]
-):
-    assert isinstance(
-        config, MetaReferenceImplConfig
-    ), f"Unexpected config type: {type(config)}"
-
-    impl = MetaReferenceInferenceImpl(config)
-    await impl.initialize()
-    return impl
-
-
 # there's a single model parallel process running serving the model. for now,
 # we don't support multiple concurrent requests to this process.
 SEMAPHORE = asyncio.Semaphore(1)
@@ -67,6 +54,7 @@ class MetaReferenceInferenceImpl(Inference):
     ) -> AsyncIterator[
         Union[ChatCompletionResponseStreamChunk, ChatCompletionResponse]
     ]:
+        messages = prepare_messages(request)
         model = resolve_model(request.model)
         if model is None:
             raise RuntimeError(
@@ -98,11 +86,12 @@ class MetaReferenceInferenceImpl(Inference):
             ipython = False
 
             for token_result in self.generator.chat_completion(
-                messages=request.messages,
+                messages=messages,
                 temperature=request.sampling_params.temperature,
                 top_p=request.sampling_params.top_p,
                 max_gen_len=request.sampling_params.max_tokens,
                 logprobs=request.logprobs,
+                tool_prompt_format=request.tool_prompt_format,
             ):
                 buffer += token_result.text
                 tokens.append(token_result.token)
diff --git a/llama_toolchain/inference/meta_reference/model_parallel.py b/llama_toolchain/inference/meta_reference/model_parallel.py
index 3de4a6381..b5d81287b 100644
--- a/llama_toolchain/inference/meta_reference/model_parallel.py
+++ b/llama_toolchain/inference/meta_reference/model_parallel.py
@@ -11,7 +11,7 @@ from functools import partial
 from typing import Generator, List, Optional
 
 from llama_models.llama3.api.chat_format import ChatFormat
-from llama_models.llama3.api.datatypes import Message
+from llama_models.llama3.api.datatypes import Message, ToolPromptFormat
 from llama_models.llama3.api.tokenizer import Tokenizer
 from llama_models.sku_list import resolve_model
 
@@ -27,6 +27,7 @@ class InferenceArgs:
     top_p: float
     max_gen_len: int
     logprobs: bool
+    tool_prompt_format: ToolPromptFormat
 
 
 class ModelRunner:
@@ -41,6 +42,7 @@ class ModelRunner:
             task.top_p,
             task.max_gen_len,
             task.logprobs,
+            task.tool_prompt_format,
         )
 
 
@@ -93,6 +95,7 @@ class LlamaModelParallelGenerator:
         top_p: float = 0.9,
         max_gen_len: Optional[int] = None,
         logprobs: bool = False,
+        tool_prompt_format: ToolPromptFormat = ToolPromptFormat.json,
     ) -> Generator:
         req_obj = InferenceArgs(
             messages=deepcopy(messages),
@@ -100,6 +103,7 @@ class LlamaModelParallelGenerator:
             top_p=top_p,
             max_gen_len=max_gen_len,
             logprobs=logprobs,
+            tool_prompt_format=tool_prompt_format,
         )
 
         gen = self.group.run_inference(req_obj)
diff --git a/llama_toolchain/inference/prepare_messages.py b/llama_toolchain/inference/prepare_messages.py
new file mode 100644
index 000000000..92e94f8d2
--- /dev/null
+++ b/llama_toolchain/inference/prepare_messages.py
@@ -0,0 +1,84 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.inference.api import *  # noqa: F403
+from llama_models.llama3.prompt_templates import (
+    BuiltinToolGenerator,
+    FunctionTagCustomToolGenerator,
+    JsonCustomToolGenerator,
+    SystemDefaultGenerator,
+)
+
+
+def prepare_messages(request: ChatCompletionRequest) -> List[Message]:
+
+    assert request.tool_choice == ToolChoice.auto, "Only `ToolChoice.auto` supported"
+
+    existing_messages = request.messages
+    existing_system_message = None
+    if existing_messages[0].role == Role.system.value:
+        existing_system_message = existing_messages.pop(0)
+
+    assert (
+        existing_messages[0].role != Role.system.value
+    ), "Should only have 1 system message"
+
+    messages = []
+
+    default_gen = SystemDefaultGenerator()
+    default_template = default_gen.gen()
+
+    sys_content = ""
+
+    tool_template = None
+    if request.tools:
+        tool_gen = BuiltinToolGenerator()
+        tool_template = tool_gen.gen(request.tools)
+
+        sys_content += tool_template.render()
+        sys_content += "\n"
+
+    sys_content += default_template.render()
+
+    if existing_system_message:
+        # TODO: this fn is needed in many places
+        def _process(c):
+            if isinstance(c, str):
+                return c
+            else:
+                return "<media>"
+
+        sys_content += "\n"
+
+        if isinstance(existing_system_message.content, str):
+            sys_content += _process(existing_system_message.content)
+        elif isinstance(existing_system_message.content, list):
+            sys_content += "\n".join(
+                [_process(c) for c in existing_system_message.content]
+            )
+
+    messages.append(SystemMessage(content=sys_content))
+
+    has_custom_tools = any(isinstance(dfn.tool_name, str) for dfn in request.tools)
+    if has_custom_tools:
+        if request.tool_prompt_format == ToolPromptFormat.json:
+            tool_gen = JsonCustomToolGenerator()
+        elif request.tool_prompt_format == ToolPromptFormat.function_tag:
+            tool_gen = FunctionTagCustomToolGenerator()
+        else:
+            raise ValueError(
+                f"Non supported ToolPromptFormat {request.tool_prompt_format}"
+            )
+
+        custom_tools = [t for t in request.tools if isinstance(t.tool_name, str)]
+        custom_template = tool_gen.gen(custom_tools)
+        messages.append(UserMessage(content=custom_template.render()))
+
+    # Add back existing messages from the request
+    messages += existing_messages
+
+    return messages
diff --git a/llama_toolchain/inference/providers.py b/llama_toolchain/inference/providers.py
index 6a8b97e81..832e3e1a2 100644
--- a/llama_toolchain/inference/providers.py
+++ b/llama_toolchain/inference/providers.py
@@ -6,7 +6,7 @@
 
 from typing import List
 
-from llama_toolchain.distribution.datatypes import Api, InlineProviderSpec, ProviderSpec
+from llama_toolchain.core.datatypes import *  # noqa: F403
 
 
 def available_inference_providers() -> List[ProviderSpec]:
@@ -27,14 +27,13 @@ def available_inference_providers() -> List[ProviderSpec]:
             module="llama_toolchain.inference.meta_reference",
             config_class="llama_toolchain.inference.meta_reference.MetaReferenceImplConfig",
         ),
-        InlineProviderSpec(
+        remote_provider_spec(
             api=Api.inference,
-            provider_id="meta-ollama",
-            pip_packages=[
-                "ollama",
-            ],
-            module="llama_toolchain.inference.ollama",
-            config_class="llama_toolchain.inference.ollama.OllamaImplConfig",
+            adapter=AdapterSpec(
+                adapter_id="ollama",
+                pip_packages=["ollama"],
+                module="llama_toolchain.inference.adapters.ollama",
+            ),
         ),
         InlineProviderSpec(
             api=Api.inference,
diff --git a/llama_toolchain/inference/quantization/loader.py b/llama_toolchain/inference/quantization/loader.py
index 3645344aa..54827dce9 100644
--- a/llama_toolchain/inference/quantization/loader.py
+++ b/llama_toolchain/inference/quantization/loader.py
@@ -14,12 +14,12 @@ import torch
 
 from fairscale.nn.model_parallel.mappings import reduce_from_model_parallel_region
 from llama_models.llama3.api.model import Transformer, TransformerBlock
+from llama_toolchain.inference.api import QuantizationType
 
 from llama_toolchain.inference.api.config import (
     CheckpointQuantizationFormat,
     MetaReferenceImplConfig,
 )
-from llama_toolchain.inference.api.datatypes import QuantizationType
 
 from termcolor import cprint
 from torch import Tensor
diff --git a/llama_toolchain/memory/api/__init__.py b/llama_toolchain/memory/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/memory/api/__init__.py
+++ b/llama_toolchain/memory/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/memory/api/api.py b/llama_toolchain/memory/api/api.py
new file mode 100644
index 000000000..70c7aa7ec
--- /dev/null
+++ b/llama_toolchain/memory/api/api.py
@@ -0,0 +1,157 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import List, Optional, Protocol
+
+from llama_models.schema_utils import json_schema_type, webmethod
+
+from pydantic import BaseModel, Field
+from typing_extensions import Annotated
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+
+
+@json_schema_type
+class MemoryBankDocument(BaseModel):
+    document_id: str
+    content: InterleavedTextMedia | URL
+    mime_type: str
+    metadata: Dict[str, Any] = Field(default_factory=dict)
+
+
+@json_schema_type
+class MemoryBankType(Enum):
+    vector = "vector"
+    keyvalue = "keyvalue"
+    keyword = "keyword"
+    graph = "graph"
+
+
+class VectorMemoryBankConfig(BaseModel):
+    type: Literal[MemoryBankType.vector.value] = MemoryBankType.vector.value
+    embedding_model: str
+    chunk_size_in_tokens: int
+    overlap_size_in_tokens: Optional[int] = None
+
+
+class KeyValueMemoryBankConfig(BaseModel):
+    type: Literal[MemoryBankType.keyvalue.value] = MemoryBankType.keyvalue.value
+
+
+class KeywordMemoryBankConfig(BaseModel):
+    type: Literal[MemoryBankType.keyword.value] = MemoryBankType.keyword.value
+
+
+class GraphMemoryBankConfig(BaseModel):
+    type: Literal[MemoryBankType.graph.value] = MemoryBankType.graph.value
+
+
+MemoryBankConfig = Annotated[
+    Union[
+        VectorMemoryBankConfig,
+        KeyValueMemoryBankConfig,
+        KeywordMemoryBankConfig,
+        GraphMemoryBankConfig,
+    ],
+    Field(discriminator="type"),
+]
+
+
+class Chunk(BaseModel):
+    content: InterleavedTextMedia
+    token_count: int
+    document_id: str
+
+
+@json_schema_type
+class QueryDocumentsResponse(BaseModel):
+    chunks: List[Chunk]
+    scores: List[float]
+
+
+@json_schema_type
+class QueryAPI(Protocol):
+    @webmethod(route="/query_documents")
+    def query_documents(
+        self,
+        query: InterleavedTextMedia,
+        params: Optional[Dict[str, Any]] = None,
+    ) -> QueryDocumentsResponse: ...
+
+
+@json_schema_type
+class MemoryBank(BaseModel):
+    bank_id: str
+    name: str
+    config: MemoryBankConfig
+    # if there's a pre-existing (reachable-from-distribution) store which supports QueryAPI
+    url: Optional[URL] = None
+
+
+class Memory(Protocol):
+    @webmethod(route="/memory_banks/create")
+    async def create_memory_bank(
+        self,
+        name: str,
+        config: MemoryBankConfig,
+        url: Optional[URL] = None,
+    ) -> MemoryBank: ...
+
+    @webmethod(route="/memory_banks/list", method="GET")
+    async def list_memory_banks(self) -> List[MemoryBank]: ...
+
+    @webmethod(route="/memory_banks/get", method="GET")
+    async def get_memory_bank(self, bank_id: str) -> Optional[MemoryBank]: ...
+
+    @webmethod(route="/memory_banks/drop", method="DELETE")
+    async def drop_memory_bank(
+        self,
+        bank_id: str,
+    ) -> str: ...
+
+    # this will just block now until documents are inserted, but it should
+    # probably return a Job instance which can be polled for completion
+    @webmethod(route="/memory_bank/insert")
+    async def insert_documents(
+        self,
+        bank_id: str,
+        documents: List[MemoryBankDocument],
+        ttl_seconds: Optional[int] = None,
+    ) -> None: ...
+
+    @webmethod(route="/memory_bank/update")
+    async def update_documents(
+        self,
+        bank_id: str,
+        documents: List[MemoryBankDocument],
+    ) -> None: ...
+
+    @webmethod(route="/memory_bank/query")
+    async def query_documents(
+        self,
+        bank_id: str,
+        query: InterleavedTextMedia,
+        params: Optional[Dict[str, Any]] = None,
+    ) -> QueryDocumentsResponse: ...
+
+    @webmethod(route="/memory_bank/documents/get", method="GET")
+    async def get_documents(
+        self,
+        bank_id: str,
+        document_ids: List[str],
+    ) -> List[MemoryBankDocument]: ...
+
+    @webmethod(route="/memory_bank/documents/delete", method="DELETE")
+    async def delete_documents(
+        self,
+        bank_id: str,
+        document_ids: List[str],
+    ) -> None: ...
diff --git a/llama_toolchain/memory/api/datatypes.py b/llama_toolchain/memory/api/datatypes.py
deleted file mode 100644
index 878090c46..000000000
--- a/llama_toolchain/memory/api/datatypes.py
+++ /dev/null
@@ -1,25 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from typing import Any, Dict
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel
-
-
-@json_schema_type
-class MemoryBank(BaseModel):
-    memory_bank_id: str
-    memory_bank_name: str
-
-
-@json_schema_type
-class MemoryBankDocument(BaseModel):
-    document_id: str
-    content: bytes
-    metadata: Dict[str, Any]
-    mime_type: str
diff --git a/llama_toolchain/memory/api/endpoints.py b/llama_toolchain/memory/api/endpoints.py
deleted file mode 100644
index d8ac0e90c..000000000
--- a/llama_toolchain/memory/api/endpoints.py
+++ /dev/null
@@ -1,61 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from typing import List, Protocol
-
-from llama_models.schema_utils import webmethod
-
-from .datatypes import *  # noqa: F403
-
-
-class MemoryBanks(Protocol):
-    @webmethod(route="/memory_banks/create")
-    def create_memory_bank(
-        self,
-        bank_id: str,
-        bank_name: str,
-        documents: List[MemoryBankDocument],
-    ) -> None: ...
-
-    @webmethod(route="/memory_banks/list")
-    def get_memory_banks(self) -> List[MemoryBank]: ...
-
-    @webmethod(route="/memory_banks/get")
-    def get_memory_bank(self, bank_id: str) -> List[MemoryBank]: ...
-
-    @webmethod(route="/memory_banks/drop")
-    def delete_memory_bank(
-        self,
-        bank_id: str,
-    ) -> str: ...
-
-    @webmethod(route="/memory_bank/insert")
-    def insert_memory_documents(
-        self,
-        bank_id: str,
-        documents: List[MemoryBankDocument],
-    ) -> None: ...
-
-    @webmethod(route="/memory_bank/update")
-    def update_memory_documents(
-        self,
-        bank_id: str,
-        documents: List[MemoryBankDocument],
-    ) -> None: ...
-
-    @webmethod(route="/memory_bank/get")
-    def get_memory_documents(
-        self,
-        bank_id: str,
-        document_uuids: List[str],
-    ) -> List[MemoryBankDocument]: ...
-
-    @webmethod(route="/memory_bank/delete")
-    def delete_memory_documents(
-        self,
-        bank_id: str,
-        document_uuids: List[str],
-    ) -> List[str]: ...
diff --git a/llama_toolchain/memory/client.py b/llama_toolchain/memory/client.py
new file mode 100644
index 000000000..4401276fa
--- /dev/null
+++ b/llama_toolchain/memory/client.py
@@ -0,0 +1,181 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import asyncio
+
+from typing import Any, Dict, List, Optional
+
+import fire
+import httpx
+
+from llama_toolchain.core.datatypes import RemoteProviderConfig
+
+from .api import *  # noqa: F403
+
+
+async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Memory:
+    return MemoryClient(config.url)
+
+
+class MemoryClient(Memory):
+    def __init__(self, base_url: str):
+        self.base_url = base_url
+
+    async def initialize(self) -> None:
+        pass
+
+    async def shutdown(self) -> None:
+        pass
+
+    async def get_memory_bank(self, bank_id: str) -> Optional[MemoryBank]:
+        async with httpx.AsyncClient() as client:
+            r = await client.get(
+                f"{self.base_url}/memory_banks/get",
+                params={
+                    "bank_id": bank_id,
+                },
+                headers={"Content-Type": "application/json"},
+                timeout=20,
+            )
+            r.raise_for_status()
+            d = r.json()
+            if not d:
+                return None
+            return MemoryBank(**d)
+
+    async def create_memory_bank(
+        self,
+        name: str,
+        config: MemoryBankConfig,
+        url: Optional[URL] = None,
+    ) -> MemoryBank:
+        async with httpx.AsyncClient() as client:
+            r = await client.post(
+                f"{self.base_url}/memory_banks/create",
+                json={
+                    "name": name,
+                    "config": config.dict(),
+                    "url": url,
+                },
+                headers={"Content-Type": "application/json"},
+                timeout=20,
+            )
+            r.raise_for_status()
+            d = r.json()
+            if not d:
+                return None
+            return MemoryBank(**d)
+
+    async def insert_documents(
+        self,
+        bank_id: str,
+        documents: List[MemoryBankDocument],
+    ) -> None:
+        async with httpx.AsyncClient() as client:
+            r = await client.post(
+                f"{self.base_url}/memory_bank/insert",
+                json={
+                    "bank_id": bank_id,
+                    "documents": [d.dict() for d in documents],
+                },
+                headers={"Content-Type": "application/json"},
+                timeout=20,
+            )
+            r.raise_for_status()
+
+    async def query_documents(
+        self,
+        bank_id: str,
+        query: InterleavedTextMedia,
+        params: Optional[Dict[str, Any]] = None,
+    ) -> QueryDocumentsResponse:
+        async with httpx.AsyncClient() as client:
+            r = await client.post(
+                f"{self.base_url}/memory_bank/query",
+                json={
+                    "bank_id": bank_id,
+                    "query": query,
+                    "params": params,
+                },
+                headers={"Content-Type": "application/json"},
+                timeout=20,
+            )
+            r.raise_for_status()
+            return QueryDocumentsResponse(**r.json())
+
+
+async def run_main(host: str, port: int, stream: bool):
+    client = MemoryClient(f"http://{host}:{port}")
+
+    # create a memory bank
+    bank = await client.create_memory_bank(
+        name="test_bank",
+        config=VectorMemoryBankConfig(
+            bank_id="test_bank",
+            embedding_model="dragon-roberta-query-2",
+            chunk_size_in_tokens=512,
+            overlap_size_in_tokens=64,
+        ),
+    )
+    print(bank)
+
+    retrieved_bank = await client.get_memory_bank(bank.bank_id)
+    assert retrieved_bank is not None
+    assert retrieved_bank.config.embedding_model == "dragon-roberta-query-2"
+
+    urls = [
+        "memory_optimizations.rst",
+        "chat.rst",
+        "llama3.rst",
+        "datasets.rst",
+        "qat_finetune.rst",
+        "lora_finetune.rst",
+    ]
+    documents = [
+        MemoryBankDocument(
+            document_id=f"num-{i}",
+            content=URL(
+                uri=f"https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/{url}"
+            ),
+            mime_type="text/plain",
+        )
+        for i, url in enumerate(urls)
+    ]
+
+    # insert some documents
+    await client.insert_documents(
+        bank_id=bank.bank_id,
+        documents=documents,
+    )
+
+    # query the documents
+    response = await client.query_documents(
+        bank_id=bank.bank_id,
+        query=[
+            "How do I use Lora?",
+        ],
+    )
+    for chunk, score in zip(response.chunks, response.scores):
+        print(f"Score: {score}")
+        print(f"Chunk:\n========\n{chunk}\n========\n")
+
+    response = await client.query_documents(
+        bank_id=bank.bank_id,
+        query=[
+            "Tell me more about llama3 and torchtune",
+        ],
+    )
+    for chunk, score in zip(response.chunks, response.scores):
+        print(f"Score: {score}")
+        print(f"Chunk:\n========\n{chunk}\n========\n")
+
+
+def main(host: str, port: int, stream: bool = True):
+    asyncio.run(run_main(host, port, stream))
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
diff --git a/llama_toolchain/agentic_system/tools/custom/__init__.py b/llama_toolchain/memory/meta_reference/__init__.py
similarity index 100%
rename from llama_toolchain/agentic_system/tools/custom/__init__.py
rename to llama_toolchain/memory/meta_reference/__init__.py
diff --git a/llama_toolchain/memory/meta_reference/faiss/__init__.py b/llama_toolchain/memory/meta_reference/faiss/__init__.py
new file mode 100644
index 000000000..16c383be3
--- /dev/null
+++ b/llama_toolchain/memory/meta_reference/faiss/__init__.py
@@ -0,0 +1,19 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from .config import FaissImplConfig
+
+
+async def get_provider_impl(config: FaissImplConfig, _deps):
+    from .faiss import FaissMemoryImpl
+
+    assert isinstance(
+        config, FaissImplConfig
+    ), f"Unexpected config type: {type(config)}"
+
+    impl = FaissMemoryImpl(config)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/inference/ollama/config.py b/llama_toolchain/memory/meta_reference/faiss/config.py
similarity index 58%
rename from llama_toolchain/inference/ollama/config.py
rename to llama_toolchain/memory/meta_reference/faiss/config.py
index 10d109822..b1c94c889 100644
--- a/llama_toolchain/inference/ollama/config.py
+++ b/llama_toolchain/memory/meta_reference/faiss/config.py
@@ -5,12 +5,9 @@
 # the root directory of this source tree.
 
 from llama_models.schema_utils import json_schema_type
-from pydantic import BaseModel, Field
+
+from pydantic import BaseModel
 
 
 @json_schema_type
-class OllamaImplConfig(BaseModel):
-    url: str = Field(
-        default="http://localhost:11434",
-        description="The URL for the ollama server",
-    )
+class FaissImplConfig(BaseModel): ...
diff --git a/llama_toolchain/memory/meta_reference/faiss/faiss.py b/llama_toolchain/memory/meta_reference/faiss/faiss.py
new file mode 100644
index 000000000..422674939
--- /dev/null
+++ b/llama_toolchain/memory/meta_reference/faiss/faiss.py
@@ -0,0 +1,194 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import uuid
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Tuple
+
+import faiss
+import httpx
+import numpy as np
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_models.llama3.api.tokenizer import Tokenizer
+
+from llama_toolchain.memory.api import *  # noqa: F403
+from .config import FaissImplConfig
+
+
+async def content_from_doc(doc: MemoryBankDocument) -> str:
+    if isinstance(doc.content, URL):
+        async with httpx.AsyncClient() as client:
+            r = await client.get(doc.content.uri)
+            return r.text
+
+    return interleaved_text_media_as_str(doc.content)
+
+
+def make_overlapped_chunks(
+    text: str, window_len: int, overlap_len: int
+) -> List[Tuple[str, int]]:
+    tokenizer = Tokenizer.get_instance()
+    tokens = tokenizer.encode(text, bos=False, eos=False)
+
+    chunks = []
+    for i in range(0, len(tokens), window_len - overlap_len):
+        toks = tokens[i : i + window_len]
+        chunk = tokenizer.decode(toks)
+        chunks.append((chunk, len(toks)))
+
+    return chunks
+
+
+@dataclass
+class BankState:
+    bank: MemoryBank
+    index: Optional[faiss.IndexFlatL2] = None
+    doc_by_id: Dict[str, MemoryBankDocument] = field(default_factory=dict)
+    id_by_index: Dict[int, str] = field(default_factory=dict)
+    chunk_by_index: Dict[int, str] = field(default_factory=dict)
+
+    async def insert_documents(
+        self,
+        model: "SentenceTransformer",
+        documents: List[MemoryBankDocument],
+    ) -> None:
+        tokenizer = Tokenizer.get_instance()
+        chunk_size = self.bank.config.chunk_size_in_tokens
+
+        for doc in documents:
+            indexlen = len(self.id_by_index)
+            self.doc_by_id[doc.document_id] = doc
+
+            content = await content_from_doc(doc)
+            chunks = make_overlapped_chunks(
+                content,
+                self.bank.config.chunk_size_in_tokens,
+                self.bank.config.overlap_size_in_tokens
+                or (self.bank.config.chunk_size_in_tokens // 4),
+            )
+            embeddings = model.encode([x[0] for x in chunks]).astype(np.float32)
+            await self._ensure_index(embeddings.shape[1])
+
+            self.index.add(embeddings)
+            for i, chunk in enumerate(chunks):
+                self.chunk_by_index[indexlen + i] = Chunk(
+                    content=chunk[0],
+                    token_count=chunk[1],
+                    document_id=doc.document_id,
+                )
+                print(f"Adding chunk #{indexlen + i} tokens={chunk[1]}")
+                self.id_by_index[indexlen + i] = doc.document_id
+
+    async def query_documents(
+        self,
+        model: "SentenceTransformer",
+        query: InterleavedTextMedia,
+        params: Optional[Dict[str, Any]] = None,
+    ) -> QueryDocumentsResponse:
+        if params is None:
+            params = {}
+        k = params.get("max_chunks", 3)
+
+        def _process(c) -> str:
+            if isinstance(c, str):
+                return c
+            else:
+                return "<media>"
+
+        if isinstance(query, list):
+            query_str = " ".join([_process(c) for c in query])
+        else:
+            query_str = _process(query)
+
+        query_vector = model.encode([query_str])[0]
+        distances, indices = self.index.search(
+            query_vector.reshape(1, -1).astype(np.float32), k
+        )
+
+        chunks = []
+        scores = []
+        for d, i in zip(distances[0], indices[0]):
+            if i < 0:
+                continue
+            chunks.append(self.chunk_by_index[int(i)])
+            scores.append(1.0 / float(d))
+
+        return QueryDocumentsResponse(chunks=chunks, scores=scores)
+
+    async def _ensure_index(self, dimension: int) -> faiss.IndexFlatL2:
+        if self.index is None:
+            self.index = faiss.IndexFlatL2(dimension)
+        return self.index
+
+
+class FaissMemoryImpl(Memory):
+    def __init__(self, config: FaissImplConfig) -> None:
+        self.config = config
+        self.model = None
+        self.states = {}
+
+    async def initialize(self) -> None: ...
+
+    async def shutdown(self) -> None: ...
+
+    async def create_memory_bank(
+        self,
+        name: str,
+        config: MemoryBankConfig,
+        url: Optional[URL] = None,
+    ) -> MemoryBank:
+        assert url is None, "URL is not supported for this implementation"
+        assert (
+            config.type == MemoryBankType.vector.value
+        ), f"Only vector banks are supported {config.type}"
+
+        bank_id = str(uuid.uuid4())
+        bank = MemoryBank(
+            bank_id=bank_id,
+            name=name,
+            config=config,
+            url=url,
+        )
+        state = BankState(bank=bank)
+        self.states[bank_id] = state
+        return bank
+
+    async def get_memory_bank(self, bank_id: str) -> Optional[MemoryBank]:
+        if bank_id not in self.states:
+            return None
+        return self.states[bank_id].bank
+
+    async def insert_documents(
+        self,
+        bank_id: str,
+        documents: List[MemoryBankDocument],
+        ttl_seconds: Optional[int] = None,
+    ) -> None:
+        assert bank_id in self.states, f"Bank {bank_id} not found"
+        state = self.states[bank_id]
+
+        await state.insert_documents(self.get_model(), documents)
+
+    async def query_documents(
+        self,
+        bank_id: str,
+        query: InterleavedTextMedia,
+        params: Optional[Dict[str, Any]] = None,
+    ) -> QueryDocumentsResponse:
+        assert bank_id in self.states, f"Bank {bank_id} not found"
+        state = self.states[bank_id]
+
+        return await state.query_documents(self.get_model(), query, params)
+
+    def get_model(self) -> "SentenceTransformer":
+        from sentence_transformers import SentenceTransformer
+
+        if self.model is None:
+            print("Loading sentence transformer")
+            self.model = SentenceTransformer("all-MiniLM-L6-v2")
+
+        return self.model
diff --git a/llama_toolchain/memory/providers.py b/llama_toolchain/memory/providers.py
new file mode 100644
index 000000000..f8675c344
--- /dev/null
+++ b/llama_toolchain/memory/providers.py
@@ -0,0 +1,25 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import List
+
+from llama_toolchain.core.datatypes import Api, InlineProviderSpec, ProviderSpec
+
+
+def available_memory_providers() -> List[ProviderSpec]:
+    return [
+        InlineProviderSpec(
+            api=Api.memory,
+            provider_id="meta-reference-faiss",
+            pip_packages=[
+                "blobfile",
+                "faiss-cpu",
+                "sentence-transformers",
+            ],
+            module="llama_toolchain.memory.meta_reference.faiss",
+            config_class="llama_toolchain.memory.meta_reference.faiss.FaissImplConfig",
+        ),
+    ]
diff --git a/llama_toolchain/observability/api/__init__.py b/llama_toolchain/observability/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/observability/api/__init__.py
+++ b/llama_toolchain/observability/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/observability/api/endpoints.py b/llama_toolchain/observability/api/api.py
similarity index 70%
rename from llama_toolchain/observability/api/endpoints.py
rename to llama_toolchain/observability/api/api.py
index 3f993ac2d..86a5cc703 100644
--- a/llama_toolchain/observability/api/endpoints.py
+++ b/llama_toolchain/observability/api/api.py
@@ -5,12 +5,79 @@
 # the root directory of this source tree.
 
 from datetime import datetime
-from typing import Any, Dict, List, Optional, Protocol
+from enum import Enum
+
+from typing import Any, Dict, List, Optional, Protocol, Union
 
 from llama_models.schema_utils import json_schema_type, webmethod
+
 from pydantic import BaseModel
-from llama_models.llama3.api.datatypes import *  # noqa: F403
-from .datatypes import *  # noqa: F403
+
+
+@json_schema_type
+class ExperimentStatus(Enum):
+    NOT_STARTED = "not_started"
+    RUNNING = "running"
+    COMPLETED = "completed"
+    FAILED = "failed"
+
+
+@json_schema_type
+class Experiment(BaseModel):
+    id: str
+    name: str
+    status: ExperimentStatus
+    created_at: datetime
+    updated_at: datetime
+    metadata: Dict[str, Any]
+
+
+@json_schema_type
+class Run(BaseModel):
+    id: str
+    experiment_id: str
+    status: str
+    started_at: datetime
+    ended_at: Optional[datetime]
+    metadata: Dict[str, Any]
+
+
+@json_schema_type
+class Metric(BaseModel):
+    name: str
+    value: Union[float, int, str, bool]
+    timestamp: datetime
+    run_id: str
+
+
+@json_schema_type
+class Log(BaseModel):
+    message: str
+    level: str
+    timestamp: datetime
+    additional_info: Dict[str, Any]
+
+
+@json_schema_type
+class ArtifactType(Enum):
+    MODEL = "model"
+    DATASET = "dataset"
+    CHECKPOINT = "checkpoint"
+    PLOT = "plot"
+    METRIC = "metric"
+    CONFIG = "config"
+    CODE = "code"
+    OTHER = "other"
+
+
+@json_schema_type
+class Artifact(BaseModel):
+    id: str
+    name: str
+    type: ArtifactType
+    size: int
+    created_at: datetime
+    metadata: Dict[str, Any]
 
 
 @json_schema_type
diff --git a/llama_toolchain/observability/api/datatypes.py b/llama_toolchain/observability/api/datatypes.py
deleted file mode 100644
index 42f95b64c..000000000
--- a/llama_toolchain/observability/api/datatypes.py
+++ /dev/null
@@ -1,80 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from datetime import datetime
-from enum import Enum
-
-from typing import Any, Dict, Optional, Union
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel
-
-
-@json_schema_type
-class ExperimentStatus(Enum):
-    NOT_STARTED = "not_started"
-    RUNNING = "running"
-    COMPLETED = "completed"
-    FAILED = "failed"
-
-
-@json_schema_type
-class Experiment(BaseModel):
-    id: str
-    name: str
-    status: ExperimentStatus
-    created_at: datetime
-    updated_at: datetime
-    metadata: Dict[str, Any]
-
-
-@json_schema_type
-class Run(BaseModel):
-    id: str
-    experiment_id: str
-    status: str
-    started_at: datetime
-    ended_at: Optional[datetime]
-    metadata: Dict[str, Any]
-
-
-@json_schema_type
-class Metric(BaseModel):
-    name: str
-    value: Union[float, int, str, bool]
-    timestamp: datetime
-    run_id: str
-
-
-@json_schema_type
-class Log(BaseModel):
-    message: str
-    level: str
-    timestamp: datetime
-    additional_info: Dict[str, Any]
-
-
-@json_schema_type
-class ArtifactType(Enum):
-    MODEL = "model"
-    DATASET = "dataset"
-    CHECKPOINT = "checkpoint"
-    PLOT = "plot"
-    METRIC = "metric"
-    CONFIG = "config"
-    CODE = "code"
-    OTHER = "other"
-
-
-@json_schema_type
-class Artifact(BaseModel):
-    id: str
-    name: str
-    type: ArtifactType
-    size: int
-    created_at: datetime
-    metadata: Dict[str, Any]
diff --git a/llama_toolchain/post_training/api/__init__.py b/llama_toolchain/post_training/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/post_training/api/__init__.py
+++ b/llama_toolchain/post_training/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/post_training/api/endpoints.py b/llama_toolchain/post_training/api/api.py
similarity index 68%
rename from llama_toolchain/post_training/api/endpoints.py
rename to llama_toolchain/post_training/api/api.py
index f0536ee4c..447a729fb 100644
--- a/llama_toolchain/post_training/api/endpoints.py
+++ b/llama_toolchain/post_training/api/api.py
@@ -5,6 +5,7 @@
 # the root directory of this source tree.
 
 from datetime import datetime
+from enum import Enum
 
 from typing import Any, Dict, List, Optional, Protocol
 
@@ -13,9 +14,90 @@ from llama_models.schema_utils import json_schema_type, webmethod
 from pydantic import BaseModel, Field
 
 from llama_models.llama3.api.datatypes import *  # noqa: F403
-from llama_toolchain.dataset.api.datatypes import *  # noqa: F403
+from llama_toolchain.dataset.api import *  # noqa: F403
 from llama_toolchain.common.training_types import *  # noqa: F403
-from .datatypes import *  # noqa: F403
+
+
+class OptimizerType(Enum):
+    adam = "adam"
+    adamw = "adamw"
+    sgd = "sgd"
+
+
+@json_schema_type
+class OptimizerConfig(BaseModel):
+    optimizer_type: OptimizerType
+    lr: float
+    lr_min: float
+    weight_decay: float
+
+
+@json_schema_type
+class TrainingConfig(BaseModel):
+    n_epochs: int
+    batch_size: int
+    shuffle: bool
+    n_iters: int
+
+    enable_activation_checkpointing: bool
+    memory_efficient_fsdp_wrap: bool
+    fsdp_cpu_offload: bool
+
+
+@json_schema_type
+class FinetuningAlgorithm(Enum):
+    full = "full"
+    lora = "lora"
+    qlora = "qlora"
+    dora = "dora"
+
+
+@json_schema_type
+class LoraFinetuningConfig(BaseModel):
+    lora_attn_modules: List[str]
+    apply_lora_to_mlp: bool
+    apply_lora_to_output: bool
+    rank: int
+    alpha: int
+
+
+@json_schema_type
+class QLoraFinetuningConfig(LoraFinetuningConfig):
+    pass
+
+
+@json_schema_type
+class DoraFinetuningConfig(LoraFinetuningConfig):
+    pass
+
+
+@json_schema_type
+class PostTrainingJobLogStream(BaseModel):
+    """Stream of logs from a finetuning job."""
+
+    job_uuid: str
+    log_lines: List[str]
+
+
+@json_schema_type
+class PostTrainingJobStatus(Enum):
+    running = "running"
+    completed = "completed"
+    failed = "failed"
+    scheduled = "scheduled"
+
+
+@json_schema_type
+class RLHFAlgorithm(Enum):
+    dpo = "dpo"
+
+
+@json_schema_type
+class DPOAlignmentConfig(BaseModel):
+    reward_scale: float
+    reward_clip: float
+    epsilon: float
+    gamma: float
 
 
 @json_schema_type
diff --git a/llama_toolchain/post_training/api/datatypes.py b/llama_toolchain/post_training/api/datatypes.py
deleted file mode 100644
index 45a259f03..000000000
--- a/llama_toolchain/post_training/api/datatypes.py
+++ /dev/null
@@ -1,94 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-from typing import List
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel
-
-
-class OptimizerType(Enum):
-    adam = "adam"
-    adamw = "adamw"
-    sgd = "sgd"
-
-
-@json_schema_type
-class OptimizerConfig(BaseModel):
-    optimizer_type: OptimizerType
-    lr: float
-    lr_min: float
-    weight_decay: float
-
-
-@json_schema_type
-class TrainingConfig(BaseModel):
-    n_epochs: int
-    batch_size: int
-    shuffle: bool
-    n_iters: int
-
-    enable_activation_checkpointing: bool
-    memory_efficient_fsdp_wrap: bool
-    fsdp_cpu_offload: bool
-
-
-@json_schema_type
-class FinetuningAlgorithm(Enum):
-    full = "full"
-    lora = "lora"
-    qlora = "qlora"
-    dora = "dora"
-
-
-@json_schema_type
-class LoraFinetuningConfig(BaseModel):
-    lora_attn_modules: List[str]
-    apply_lora_to_mlp: bool
-    apply_lora_to_output: bool
-    rank: int
-    alpha: int
-
-
-@json_schema_type
-class QLoraFinetuningConfig(LoraFinetuningConfig):
-    pass
-
-
-@json_schema_type
-class DoraFinetuningConfig(LoraFinetuningConfig):
-    pass
-
-
-@json_schema_type
-class PostTrainingJobLogStream(BaseModel):
-    """Stream of logs from a finetuning job."""
-
-    job_uuid: str
-    log_lines: List[str]
-
-
-@json_schema_type
-class PostTrainingJobStatus(Enum):
-    running = "running"
-    completed = "completed"
-    failed = "failed"
-    scheduled = "scheduled"
-
-
-@json_schema_type
-class RLHFAlgorithm(Enum):
-    dpo = "dpo"
-
-
-@json_schema_type
-class DPOAlignmentConfig(BaseModel):
-    reward_scale: float
-    reward_clip: float
-    epsilon: float
-    gamma: float
diff --git a/llama_toolchain/reward_scoring/api/__init__.py b/llama_toolchain/reward_scoring/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/reward_scoring/api/__init__.py
+++ b/llama_toolchain/reward_scoring/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/reward_scoring/api/endpoints.py b/llama_toolchain/reward_scoring/api/api.py
similarity index 63%
rename from llama_toolchain/reward_scoring/api/endpoints.py
rename to llama_toolchain/reward_scoring/api/api.py
index 657e7b325..c91931f09 100644
--- a/llama_toolchain/reward_scoring/api/endpoints.py
+++ b/llama_toolchain/reward_scoring/api/api.py
@@ -5,9 +5,30 @@
 # the root directory of this source tree.
 
 from typing import List, Protocol, Union
-from .datatypes import *  # noqa: F403
 
-from llama_models.schema_utils import webmethod
+from llama_models.schema_utils import json_schema_type, webmethod
+
+from pydantic import BaseModel
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+
+
+@json_schema_type
+class ScoredMessage(BaseModel):
+    message: Message
+    score: float
+
+
+@json_schema_type
+class DialogGenerations(BaseModel):
+    dialog: List[Message]
+    sampled_generations: List[Message]
+
+
+@json_schema_type
+class ScoredDialogGenerations(BaseModel):
+    dialog: List[Message]
+    scored_generations: List[ScoredMessage]
 
 
 @json_schema_type
diff --git a/llama_toolchain/reward_scoring/api/datatypes.py b/llama_toolchain/reward_scoring/api/datatypes.py
deleted file mode 100644
index 2ce698d47..000000000
--- a/llama_toolchain/reward_scoring/api/datatypes.py
+++ /dev/null
@@ -1,31 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from typing import List
-
-from llama_models.schema_utils import json_schema_type
-
-from pydantic import BaseModel
-
-from llama_models.llama3.api.datatypes import *  # noqa: F403
-
-
-@json_schema_type
-class ScoredMessage(BaseModel):
-    message: Message
-    score: float
-
-
-@json_schema_type
-class DialogGenerations(BaseModel):
-    dialog: List[Message]
-    sampled_generations: List[Message]
-
-
-@json_schema_type
-class ScoredDialogGenerations(BaseModel):
-    dialog: List[Message]
-    scored_generations: List[ScoredMessage]
diff --git a/llama_toolchain/safety/api/__init__.py b/llama_toolchain/safety/api/__init__.py
index 4cefa053f..a7e55ba91 100644
--- a/llama_toolchain/safety/api/__init__.py
+++ b/llama_toolchain/safety/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa
-from .endpoints import *  # noqa
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/safety/api/datatypes.py b/llama_toolchain/safety/api/api.py
similarity index 75%
rename from llama_toolchain/safety/api/datatypes.py
rename to llama_toolchain/safety/api/api.py
index 5deecc2b3..96682d172 100644
--- a/llama_toolchain/safety/api/datatypes.py
+++ b/llama_toolchain/safety/api/api.py
@@ -5,13 +5,12 @@
 # the root directory of this source tree.
 
 from enum import Enum
-from typing import Dict, Optional, Union
-
-from llama_models.llama3.api.datatypes import ToolParamDefinition
-from llama_models.schema_utils import json_schema_type
+from typing import Dict, List, Optional, Protocol, Union
 
+from llama_models.schema_utils import json_schema_type, webmethod
 from pydantic import BaseModel, validator
 
+from llama_models.llama3.api.datatypes import *  # noqa: F403
 from llama_toolchain.common.deployment_types import RestAPIExecutionConfig
 
 
@@ -70,3 +69,22 @@ class ShieldResponse(BaseModel):
             except ValueError:
                 return v
         return v
+
+
+@json_schema_type
+class RunShieldRequest(BaseModel):
+    messages: List[Message]
+    shields: List[ShieldDefinition]
+
+
+@json_schema_type
+class RunShieldResponse(BaseModel):
+    responses: List[ShieldResponse]
+
+
+class Safety(Protocol):
+    @webmethod(route="/safety/run_shields")
+    async def run_shields(
+        self,
+        request: RunShieldRequest,
+    ) -> RunShieldResponse: ...
diff --git a/llama_toolchain/safety/api/endpoints.py b/llama_toolchain/safety/api/endpoints.py
deleted file mode 100644
index a282a7968..000000000
--- a/llama_toolchain/safety/api/endpoints.py
+++ /dev/null
@@ -1,32 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from .datatypes import *  # noqa: F403
-from typing import List, Protocol
-
-from llama_models.llama3.api.datatypes import Message
-
-# this dependency is annoying and we need a forked up version anyway
-from llama_models.schema_utils import webmethod
-
-
-@json_schema_type
-class RunShieldRequest(BaseModel):
-    messages: List[Message]
-    shields: List[ShieldDefinition]
-
-
-@json_schema_type
-class RunShieldResponse(BaseModel):
-    responses: List[ShieldResponse]
-
-
-class Safety(Protocol):
-    @webmethod(route="/safety/run_shields")
-    async def run_shields(
-        self,
-        request: RunShieldRequest,
-    ) -> RunShieldResponse: ...
diff --git a/llama_toolchain/safety/client.py b/llama_toolchain/safety/client.py
index 5d86f9291..0cf7deae8 100644
--- a/llama_toolchain/safety/client.py
+++ b/llama_toolchain/safety/client.py
@@ -5,29 +5,32 @@
 # the root directory of this source tree.
 
 import asyncio
+import json
+
+from typing import Any
 
 import fire
 import httpx
 
 from llama_models.llama3.api.datatypes import UserMessage
+from pydantic import BaseModel
 from termcolor import cprint
 
-from .api import (
-    BuiltinShield,
-    RunShieldRequest,
-    RunShieldResponse,
-    Safety,
-    ShieldDefinition,
-)
+from llama_toolchain.core.datatypes import RemoteProviderConfig
+
+from .api import *  # noqa: F403
 
 
-async def get_client_impl(base_url: str):
-    return SafetyClient(base_url)
+async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Safety:
+    return SafetyClient(config.url)
+
+
+def encodable_dict(d: BaseModel):
+    return json.loads(d.json())
 
 
 class SafetyClient(Safety):
     def __init__(self, base_url: str):
-        print(f"Initializing client for {base_url}")
         self.base_url = base_url
 
     async def initialize(self) -> None:
@@ -40,7 +43,9 @@ class SafetyClient(Safety):
         async with httpx.AsyncClient() as client:
             response = await client.post(
                 f"{self.base_url}/safety/run_shields",
-                data=request.json(),
+                json={
+                    "request": encodable_dict(request),
+                },
                 headers={"Content-Type": "application/json"},
                 timeout=20,
             )
diff --git a/llama_toolchain/safety/meta_reference/__init__.py b/llama_toolchain/safety/meta_reference/__init__.py
index f874f3dad..ad175ce46 100644
--- a/llama_toolchain/safety/meta_reference/__init__.py
+++ b/llama_toolchain/safety/meta_reference/__init__.py
@@ -4,5 +4,14 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .config import SafetyConfig  # noqa
-from .safety import get_provider_impl  # noqa
+from .config import SafetyConfig
+
+
+async def get_provider_impl(config: SafetyConfig, _deps):
+    from .safety import MetaReferenceSafetyImpl
+
+    assert isinstance(config, SafetyConfig), f"Unexpected config type: {type(config)}"
+
+    impl = MetaReferenceSafetyImpl(config)
+    await impl.initialize()
+    return impl
diff --git a/llama_toolchain/safety/meta_reference/safety.py b/llama_toolchain/safety/meta_reference/safety.py
index 8f63b14f2..e71ac09a2 100644
--- a/llama_toolchain/safety/meta_reference/safety.py
+++ b/llama_toolchain/safety/meta_reference/safety.py
@@ -5,12 +5,10 @@
 # the root directory of this source tree.
 
 import asyncio
-from typing import Dict
 
 from llama_models.sku_list import resolve_model
 
 from llama_toolchain.common.model_utils import model_local_dir
-from llama_toolchain.distribution.datatypes import Api, ProviderSpec
 from llama_toolchain.safety.api import *  # noqa
 
 from .config import SafetyConfig
@@ -25,14 +23,6 @@ from .shields import (
 )
 
 
-async def get_provider_impl(config: SafetyConfig, _deps: Dict[Api, ProviderSpec]):
-    assert isinstance(config, SafetyConfig), f"Unexpected config type: {type(config)}"
-
-    impl = MetaReferenceSafetyImpl(config)
-    await impl.initialize()
-    return impl
-
-
 def resolve_and_get_path(model_name: str) -> str:
     model = resolve_model(model_name)
     assert model is not None, f"Could not resolve model {model_name}"
diff --git a/llama_toolchain/safety/meta_reference/shields/base.py b/llama_toolchain/safety/meta_reference/shields/base.py
index 245373b13..ed939212d 100644
--- a/llama_toolchain/safety/meta_reference/shields/base.py
+++ b/llama_toolchain/safety/meta_reference/shields/base.py
@@ -5,10 +5,10 @@
 # the root directory of this source tree.
 
 from abc import ABC, abstractmethod
-from typing import List, Union
+from typing import List
 
-from llama_models.llama3.api.datatypes import Attachment, Message
-from llama_toolchain.safety.api.datatypes import *  # noqa: F403
+from llama_models.llama3.api.datatypes import interleaved_text_media_as_str, Message
+from llama_toolchain.safety.api import *  # noqa: F403
 
 CANNED_RESPONSE_TEXT = "I can't answer that. Can I help with something else?"
 
@@ -30,18 +30,7 @@ class ShieldBase(ABC):
 
 
 def message_content_as_str(message: Message) -> str:
-    def _to_str(content: Union[str, Attachment]) -> str:
-        if isinstance(content, str):
-            return content
-        elif isinstance(content, Attachment):
-            return f"File: {str(content.url)}"
-        else:
-            raise
-
-    if isinstance(message.content, list) or isinstance(message.content, tuple):
-        return "\n".join([_to_str(c) for c in message.content])
-    else:
-        return _to_str(message.content)
+    return interleaved_text_media_as_str(message.content)
 
 
 # For shields that operate on simple strings
diff --git a/llama_toolchain/safety/meta_reference/shields/code_scanner.py b/llama_toolchain/safety/meta_reference/shields/code_scanner.py
index f78260ff1..564d15a53 100644
--- a/llama_toolchain/safety/meta_reference/shields/code_scanner.py
+++ b/llama_toolchain/safety/meta_reference/shields/code_scanner.py
@@ -8,7 +8,7 @@ from codeshield.cs import CodeShield
 from termcolor import cprint
 
 from .base import ShieldResponse, TextShield
-from llama_toolchain.safety.api.datatypes import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
 
 
 class CodeScannerShield(TextShield):
diff --git a/llama_toolchain/safety/meta_reference/shields/llama_guard.py b/llama_toolchain/safety/meta_reference/shields/llama_guard.py
index a78b8127d..fe04baa00 100644
--- a/llama_toolchain/safety/meta_reference/shields/llama_guard.py
+++ b/llama_toolchain/safety/meta_reference/shields/llama_guard.py
@@ -14,7 +14,7 @@ from llama_models.llama3.api.datatypes import Message, Role
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
 from .base import CANNED_RESPONSE_TEXT, OnViolationAction, ShieldBase, ShieldResponse
-from llama_toolchain.safety.api.datatypes import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
 
 SAFE_RESPONSE = "safe"
 _INSTANCE = None
diff --git a/llama_toolchain/safety/meta_reference/shields/prompt_guard.py b/llama_toolchain/safety/meta_reference/shields/prompt_guard.py
index b9f5dd5a5..a1097a6f7 100644
--- a/llama_toolchain/safety/meta_reference/shields/prompt_guard.py
+++ b/llama_toolchain/safety/meta_reference/shields/prompt_guard.py
@@ -14,7 +14,7 @@ from termcolor import cprint
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
 
 from .base import message_content_as_str, OnViolationAction, ShieldResponse, TextShield
-from llama_toolchain.safety.api.datatypes import *  # noqa: F403
+from llama_toolchain.safety.api import *  # noqa: F403
 
 
 class PromptGuardShield(TextShield):
diff --git a/llama_toolchain/safety/providers.py b/llama_toolchain/safety/providers.py
index 40691e376..dfacf3f67 100644
--- a/llama_toolchain/safety/providers.py
+++ b/llama_toolchain/safety/providers.py
@@ -6,7 +6,7 @@
 
 from typing import List
 
-from llama_toolchain.distribution.datatypes import Api, InlineProviderSpec, ProviderSpec
+from llama_toolchain.core.datatypes import Api, InlineProviderSpec, ProviderSpec
 
 
 def available_safety_providers() -> List[ProviderSpec]:
diff --git a/llama_toolchain/stack.py b/llama_toolchain/stack.py
index 88a54976c..6ec05896d 100644
--- a/llama_toolchain/stack.py
+++ b/llama_toolchain/stack.py
@@ -9,6 +9,7 @@ from llama_toolchain.agentic_system.api import *  # noqa: F403
 from llama_toolchain.dataset.api import *  # noqa: F403
 from llama_toolchain.evaluations.api import *  # noqa: F403
 from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.batch_inference.api import *  # noqa: F403
 from llama_toolchain.memory.api import *  # noqa: F403
 from llama_toolchain.observability.api import *  # noqa: F403
 from llama_toolchain.post_training.api import *  # noqa: F403
@@ -18,13 +19,14 @@ from llama_toolchain.synthetic_data_generation.api import *  # noqa: F403
 
 class LlamaStack(
     Inference,
+    BatchInference,
     AgenticSystem,
     RewardScoring,
     SyntheticDataGeneration,
     Datasets,
     Observability,
     PostTraining,
-    MemoryBanks,
+    Memory,
     Evaluations,
 ):
     pass
diff --git a/llama_toolchain/synthetic_data_generation/api/__init__.py b/llama_toolchain/synthetic_data_generation/api/__init__.py
index 647bd4a5f..a7e55ba91 100644
--- a/llama_toolchain/synthetic_data_generation/api/__init__.py
+++ b/llama_toolchain/synthetic_data_generation/api/__init__.py
@@ -4,5 +4,4 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
-from .datatypes import *  # noqa: F401 F403
-from .endpoints import *  # noqa: F401 F403
+from .api import *  # noqa: F401 F403
diff --git a/llama_toolchain/synthetic_data_generation/api/endpoints.py b/llama_toolchain/synthetic_data_generation/api/api.py
similarity index 80%
rename from llama_toolchain/synthetic_data_generation/api/endpoints.py
rename to llama_toolchain/synthetic_data_generation/api/api.py
index d6b9c83d5..44b8327a9 100644
--- a/llama_toolchain/synthetic_data_generation/api/endpoints.py
+++ b/llama_toolchain/synthetic_data_generation/api/api.py
@@ -4,6 +4,8 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 
+from enum import Enum
+
 from typing import Any, Dict, List, Optional, Protocol
 
 from llama_models.schema_utils import json_schema_type, webmethod
@@ -11,8 +13,18 @@ from llama_models.schema_utils import json_schema_type, webmethod
 from pydantic import BaseModel
 
 from llama_models.llama3.api.datatypes import *  # noqa: F403
-from llama_toolchain.reward_scoring.api.datatypes import *  # noqa: F403
-from .datatypes import *  # noqa: F403
+from llama_toolchain.reward_scoring.api import *  # noqa: F403
+
+
+class FilteringFunction(Enum):
+    """The type of filtering function."""
+
+    none = "none"
+    random = "random"
+    top_k = "top_k"
+    top_p = "top_p"
+    top_k_top_p = "top_k_top_p"
+    sigmoid = "sigmoid"
 
 
 @json_schema_type
diff --git a/llama_toolchain/synthetic_data_generation/api/datatypes.py b/llama_toolchain/synthetic_data_generation/api/datatypes.py
deleted file mode 100644
index 1cef6653b..000000000
--- a/llama_toolchain/synthetic_data_generation/api/datatypes.py
+++ /dev/null
@@ -1,18 +0,0 @@
-# Copyright (c) Meta Platforms, Inc. and affiliates.
-# All rights reserved.
-#
-# This source code is licensed under the terms described in the LICENSE file in
-# the root directory of this source tree.
-
-from enum import Enum
-
-
-class FilteringFunction(Enum):
-    """The type of filtering function."""
-
-    none = "none"
-    random = "random"
-    top_k = "top_k"
-    top_p = "top_p"
-    top_k_top_p = "top_k_top_p"
-    sigmoid = "sigmoid"
diff --git a/llama_toolchain/distribution/__init__.py b/llama_toolchain/tools/__init__.py
similarity index 100%
rename from llama_toolchain/distribution/__init__.py
rename to llama_toolchain/tools/__init__.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/base.py b/llama_toolchain/tools/base.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/base.py
rename to llama_toolchain/tools/base.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/builtin.py b/llama_toolchain/tools/builtin.py
similarity index 97%
rename from llama_toolchain/agentic_system/meta_reference/tools/builtin.py
rename to llama_toolchain/tools/builtin.py
index c13af125f..f2ddeefa7 100644
--- a/llama_toolchain/agentic_system/meta_reference/tools/builtin.py
+++ b/llama_toolchain/tools/builtin.py
@@ -22,6 +22,7 @@ from .ipython_tool.code_execution import (
 )
 
 from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.agentic_system.api import *  # noqa: F403
 
 from .base import BaseTool
 
@@ -32,7 +33,7 @@ def interpret_content_as_attachment(content: str) -> Optional[Attachment]:
         snippet = match.group(1)
         data = json.loads(snippet)
         return Attachment(
-            url=URL(uri="file://" + data["filepath"]), mime_type=data["mimetype"]
+            content=URL(uri="file://" + data["filepath"]), mime_type=data["mimetype"]
         )
 
     return None
@@ -55,9 +56,6 @@ class SingleMessageBuiltinTool(BaseTool):
             tool_name=tool_call.tool_name,
             content=response,
         )
-        if attachment := interpret_content_as_attachment(response):
-            message.content = attachment
-
         return [message]
 
     @abstractmethod
@@ -316,7 +314,4 @@ class CodeInterpreterTool(BaseTool):
             tool_name=tool_call.tool_name,
             content="\n".join(pieces),
         )
-        if attachment := interpret_content_as_attachment(res["stdout"]):
-            message.content = attachment
-
         return [message]
diff --git a/llama_toolchain/inference/together/__init__.py b/llama_toolchain/tools/custom/__init__.py
similarity index 67%
rename from llama_toolchain/inference/together/__init__.py
rename to llama_toolchain/tools/custom/__init__.py
index 5be75efcc..756f351d8 100644
--- a/llama_toolchain/inference/together/__init__.py
+++ b/llama_toolchain/tools/custom/__init__.py
@@ -3,6 +3,3 @@
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
-
-from .config import TogetherImplConfig  # noqa
-from .together import get_provider_impl  # noqa
diff --git a/llama_toolchain/agentic_system/tools/custom/datatypes.py b/llama_toolchain/tools/custom/datatypes.py
similarity index 86%
rename from llama_toolchain/agentic_system/tools/custom/datatypes.py
rename to llama_toolchain/tools/custom/datatypes.py
index 174b55241..05b142d6f 100644
--- a/llama_toolchain/agentic_system/tools/custom/datatypes.py
+++ b/llama_toolchain/tools/custom/datatypes.py
@@ -12,11 +12,6 @@ from typing import Dict, List
 from llama_models.llama3.api.datatypes import *  # noqa: F403
 from llama_toolchain.agentic_system.api import *  # noqa: F403
 
-# TODO: this is symptomatic of us needing to pull more tooling related utilities
-from llama_toolchain.agentic_system.meta_reference.tools.builtin import (
-    interpret_content_as_attachment,
-)
-
 
 class CustomTool:
     """
@@ -59,9 +54,9 @@ class CustomTool:
             }
         )
 
-    def get_tool_definition(self) -> AgenticSystemToolDefinition:
-        return AgenticSystemToolDefinition(
-            tool_name=self.get_name(),
+    def get_tool_definition(self) -> FunctionCallToolDefinition:
+        return FunctionCallToolDefinition(
+            function_name=self.get_name(),
             description=self.get_description(),
             parameters=self.get_params_definition(),
         )
@@ -96,9 +91,6 @@ class SingleMessageCustomTool(CustomTool):
             tool_name=tool_call.tool_name,
             content=response_str,
         )
-        if attachment := interpret_content_as_attachment(response_str):
-            message.content = attachment
-
         return [message]
 
     @abstractmethod
diff --git a/llama_toolchain/inference/fireworks/__init__.py b/llama_toolchain/tools/ipython_tool/__init__.py
similarity index 67%
rename from llama_toolchain/inference/fireworks/__init__.py
rename to llama_toolchain/tools/ipython_tool/__init__.py
index baeb758ad..756f351d8 100644
--- a/llama_toolchain/inference/fireworks/__init__.py
+++ b/llama_toolchain/tools/ipython_tool/__init__.py
@@ -3,6 +3,3 @@
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
-
-from .config import FireworksImplConfig  # noqa
-from .fireworks import get_provider_impl  # noqa
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/code_env_prefix.py b/llama_toolchain/tools/ipython_tool/code_env_prefix.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/code_env_prefix.py
rename to llama_toolchain/tools/ipython_tool/code_env_prefix.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/code_execution.py b/llama_toolchain/tools/ipython_tool/code_execution.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/code_execution.py
rename to llama_toolchain/tools/ipython_tool/code_execution.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/matplotlib_custom_backend.py b/llama_toolchain/tools/ipython_tool/matplotlib_custom_backend.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/matplotlib_custom_backend.py
rename to llama_toolchain/tools/ipython_tool/matplotlib_custom_backend.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/utils.py b/llama_toolchain/tools/ipython_tool/utils.py
similarity index 100%
rename from llama_toolchain/agentic_system/meta_reference/tools/ipython_tool/utils.py
rename to llama_toolchain/tools/ipython_tool/utils.py
diff --git a/llama_toolchain/agentic_system/meta_reference/tools/safety.py b/llama_toolchain/tools/safety.py
similarity index 93%
rename from llama_toolchain/agentic_system/meta_reference/tools/safety.py
rename to llama_toolchain/tools/safety.py
index aab67801d..24051af8a 100644
--- a/llama_toolchain/agentic_system/meta_reference/tools/safety.py
+++ b/llama_toolchain/tools/safety.py
@@ -9,8 +9,7 @@ from typing import List
 from llama_toolchain.agentic_system.meta_reference.safety import ShieldRunnerMixin
 
 from llama_toolchain.inference.api import Message
-from llama_toolchain.safety.api.datatypes import ShieldDefinition
-from llama_toolchain.safety.api.endpoints import Safety
+from llama_toolchain.safety.api import Safety, ShieldDefinition
 
 from .builtin import BaseTool
 
diff --git a/requirements.txt b/requirements.txt
index 4b6693550..bf61af71b 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -5,3 +5,4 @@ huggingface-hub
 llama-models
 pydantic
 requests
+termcolor
diff --git a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html b/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html
index f8dab9ec3..46594bbed 100644
--- a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html
+++ b/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html
@@ -21,7 +21,7 @@
     "info": {
         "title": "[DRAFT] Llama Stack Specification",
         "version": "0.0.1",
-        "description": "This is the specification of the llama stack that provides\n                a set of endpoints and their corresponding interfaces that are tailored to\n                best leverage Llama Models. The specification is still in draft and subject to change.\n                Generated at 2024-08-21 14:16:38.313950"
+        "description": "This is the specification of the llama stack that provides\n                a set of endpoints and their corresponding interfaces that are tailored to\n                best leverage Llama Models. The specification is still in draft and subject to change.\n                Generated at 2024-09-03 21:42:33.579455"
     },
     "servers": [
         {
@@ -29,50 +29,7 @@
         }
     ],
     "paths": {
-        "/agentic_system/memory_bank/attach": {
-            "post": {
-                "responses": {
-                    "200": {
-                        "description": "OK"
-                    }
-                },
-                "tags": [
-                    "AgenticSystem"
-                ],
-                "parameters": [
-                    {
-                        "name": "agent_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    },
-                    {
-                        "name": "session_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    }
-                ],
-                "requestBody": {
-                    "content": {
-                        "application/json": {
-                            "schema": {
-                                "type": "array",
-                                "items": {
-                                    "type": "string"
-                                }
-                            }
-                        }
-                    },
-                    "required": true
-                }
-            }
-        },
-        "/inference/batch_chat_completion": {
+        "/batch_inference/chat_completion": {
             "post": {
                 "responses": {
                     "200": {
@@ -87,7 +44,7 @@
                     }
                 },
                 "tags": [
-                    "Inference"
+                    "BatchInference"
                 ],
                 "parameters": [],
                 "requestBody": {
@@ -102,7 +59,7 @@
                 }
             }
         },
-        "/inference/batch_completion": {
+        "/batch_inference/completion": {
             "post": {
                 "responses": {
                     "200": {
@@ -117,7 +74,7 @@
                     }
                 },
                 "tags": [
-                    "Inference"
+                    "BatchInference"
                 ],
                 "parameters": [],
                 "requestBody": {
@@ -258,7 +215,7 @@
                     "content": {
                         "application/json": {
                             "schema": {
-                                "$ref": "#/components/schemas/AgenticSystemCreateRequest"
+                                "$ref": "#/components/schemas/AgentConfig"
                             }
                         }
                     },
@@ -267,7 +224,7 @@
             }
         },
         "/agentic_system/session/create": {
-            "post": {
+            "get": {
                 "responses": {
                     "200": {
                         "description": "OK",
@@ -283,17 +240,24 @@
                 "tags": [
                     "AgenticSystem"
                 ],
-                "parameters": [],
-                "requestBody": {
-                    "content": {
-                        "application/json": {
-                            "schema": {
-                                "$ref": "#/components/schemas/AgenticSystemSessionCreateRequest"
-                            }
+                "parameters": [
+                    {
+                        "name": "agent_id",
+                        "in": "query",
+                        "required": true,
+                        "schema": {
+                            "type": "string"
                         }
                     },
-                    "required": true
-                }
+                    {
+                        "name": "session_name",
+                        "in": "query",
+                        "required": true,
+                        "schema": {
+                            "type": "string"
+                        }
+                    }
+                ]
             }
         },
         "/agentic_system/turn/create": {
@@ -302,7 +266,7 @@
                     "200": {
                         "description": "OK",
                         "content": {
-                            "application/json": {
+                            "text/event-stream": {
                                 "schema": {
                                     "$ref": "#/components/schemas/AgenticSystemTurnResponseStreamChunk"
                                 }
@@ -383,23 +347,22 @@
             "post": {
                 "responses": {
                     "200": {
-                        "description": "OK"
+                        "description": "OK",
+                        "content": {
+                            "application/json": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/MemoryBank"
+                                }
+                            }
+                        }
                     }
                 },
                 "tags": [
-                    "MemoryBanks"
+                    "Memory"
                 ],
                 "parameters": [
                     {
-                        "name": "bank_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    },
-                    {
-                        "name": "bank_name",
+                        "name": "name",
                         "in": "query",
                         "required": true,
                         "schema": {
@@ -411,10 +374,7 @@
                     "content": {
                         "application/json": {
                             "schema": {
-                                "type": "array",
-                                "items": {
-                                    "$ref": "#/components/schemas/MemoryBankDocument"
-                                }
+                                "$ref": "#/components/schemas/CreateMemoryBankRequest"
                             }
                         }
                     },
@@ -526,51 +486,15 @@
                 ]
             }
         },
-        "/memory_banks/drop": {
-            "delete": {
-                "responses": {
-                    "200": {
-                        "description": "OK",
-                        "content": {
-                            "application/json": {
-                                "schema": {
-                                    "type": "string"
-                                }
-                            }
-                        }
-                    }
-                },
-                "tags": [
-                    "MemoryBanks"
-                ],
-                "parameters": [
-                    {
-                        "name": "bank_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    }
-                ]
-            }
-        },
-        "/memory_bank/delete": {
+        "/memory_bank/documents/delete": {
             "post": {
                 "responses": {
                     "200": {
-                        "description": "OK",
-                        "content": {
-                            "application/jsonl": {
-                                "schema": {
-                                    "type": "string"
-                                }
-                            }
-                        }
+                        "description": "OK"
                     }
                 },
                 "tags": [
-                    "MemoryBanks"
+                    "Memory"
                 ],
                 "parameters": [
                     {
@@ -597,27 +521,55 @@
                 }
             }
         },
-        "/agentic_system/memory_bank/detach": {
-            "post": {
+        "/memory_banks/drop": {
+            "delete": {
                 "responses": {
                     "200": {
-                        "description": "OK"
+                        "description": "OK",
+                        "content": {
+                            "application/json": {
+                                "schema": {
+                                    "type": "string"
+                                }
+                            }
+                        }
                     }
                 },
                 "tags": [
-                    "AgenticSystem"
+                    "Memory"
                 ],
                 "parameters": [
                     {
-                        "name": "agent_id",
+                        "name": "bank_id",
                         "in": "query",
                         "required": true,
                         "schema": {
                             "type": "string"
                         }
-                    },
+                    }
+                ]
+            }
+        },
+        "/inference/embeddings": {
+            "post": {
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "content": {
+                            "application/json": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/EmbeddingsResponse"
+                                }
+                            }
+                        }
+                    }
+                },
+                "tags": [
+                    "Inference"
+                ],
+                "parameters": [
                     {
-                        "name": "session_id",
+                        "name": "model",
                         "in": "query",
                         "required": true,
                         "schema": {
@@ -631,7 +583,17 @@
                             "schema": {
                                 "type": "array",
                                 "items": {
-                                    "type": "string"
+                                    "oneOf": [
+                                        {
+                                            "type": "string"
+                                        },
+                                        {
+                                            "type": "array",
+                                            "items": {
+                                                "type": "string"
+                                            }
+                                        }
+                                    ]
                                 }
                             }
                         }
@@ -927,6 +889,48 @@
                 ]
             }
         },
+        "/memory_bank/documents/get": {
+            "post": {
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "content": {
+                            "application/jsonl": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/MemoryBankDocument"
+                                }
+                            }
+                        }
+                    }
+                },
+                "tags": [
+                    "Memory"
+                ],
+                "parameters": [
+                    {
+                        "name": "bank_id",
+                        "in": "query",
+                        "required": true,
+                        "schema": {
+                            "type": "string"
+                        }
+                    }
+                ],
+                "requestBody": {
+                    "content": {
+                        "application/json": {
+                            "schema": {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            }
+                        }
+                    },
+                    "required": true
+                }
+            }
+        },
         "/evaluate/job/artifacts": {
             "get": {
                 "responses": {
@@ -1099,16 +1103,23 @@
                     "200": {
                         "description": "OK",
                         "content": {
-                            "application/jsonl": {
+                            "application/json": {
                                 "schema": {
-                                    "$ref": "#/components/schemas/MemoryBank"
+                                    "oneOf": [
+                                        {
+                                            "$ref": "#/components/schemas/MemoryBank"
+                                        },
+                                        {
+                                            "type": "null"
+                                        }
+                                    ]
                                 }
                             }
                         }
                     }
                 },
                 "tags": [
-                    "MemoryBanks"
+                    "Memory"
                 ],
                 "parameters": [
                     {
@@ -1122,68 +1133,6 @@
                 ]
             }
         },
-        "/memory_banks/list": {
-            "get": {
-                "responses": {
-                    "200": {
-                        "description": "OK",
-                        "content": {
-                            "application/jsonl": {
-                                "schema": {
-                                    "$ref": "#/components/schemas/MemoryBank"
-                                }
-                            }
-                        }
-                    }
-                },
-                "tags": [
-                    "MemoryBanks"
-                ],
-                "parameters": []
-            }
-        },
-        "/memory_bank/get": {
-            "post": {
-                "responses": {
-                    "200": {
-                        "description": "OK",
-                        "content": {
-                            "application/jsonl": {
-                                "schema": {
-                                    "$ref": "#/components/schemas/MemoryBankDocument"
-                                }
-                            }
-                        }
-                    }
-                },
-                "tags": [
-                    "MemoryBanks"
-                ],
-                "parameters": [
-                    {
-                        "name": "bank_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    }
-                ],
-                "requestBody": {
-                    "content": {
-                        "application/json": {
-                            "schema": {
-                                "type": "array",
-                                "items": {
-                                    "type": "string"
-                                }
-                            }
-                        }
-                    },
-                    "required": true
-                }
-            }
-        },
         "/runs/metrics": {
             "get": {
                 "responses": {
@@ -1328,7 +1277,7 @@
                     }
                 },
                 "tags": [
-                    "MemoryBanks"
+                    "Memory"
                 ],
                 "parameters": [
                     {
@@ -1338,6 +1287,14 @@
                         "schema": {
                             "type": "string"
                         }
+                    },
+                    {
+                        "name": "ttl_seconds",
+                        "in": "query",
+                        "required": false,
+                        "schema": {
+                            "type": "integer"
+                        }
                     }
                 ],
                 "requestBody": {
@@ -1404,6 +1361,26 @@
                 "parameters": []
             }
         },
+        "/memory_banks/list": {
+            "get": {
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "content": {
+                            "application/jsonl": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/MemoryBank"
+                                }
+                            }
+                        }
+                    }
+                },
+                "tags": [
+                    "Memory"
+                ],
+                "parameters": []
+            }
+        },
         "/logging/log_messages": {
             "post": {
                 "responses": {
@@ -1480,6 +1457,45 @@
                 }
             }
         },
+        "/memory_bank/query": {
+            "post": {
+                "responses": {
+                    "200": {
+                        "description": "OK",
+                        "content": {
+                            "application/json": {
+                                "schema": {
+                                    "$ref": "#/components/schemas/QueryDocumentsResponse"
+                                }
+                            }
+                        }
+                    }
+                },
+                "tags": [
+                    "Memory"
+                ],
+                "parameters": [
+                    {
+                        "name": "bank_id",
+                        "in": "query",
+                        "required": true,
+                        "schema": {
+                            "type": "string"
+                        }
+                    }
+                ],
+                "requestBody": {
+                    "content": {
+                        "application/json": {
+                            "schema": {
+                                "$ref": "#/components/schemas/QueryDocumentsRequest"
+                            }
+                        }
+                    },
+                    "required": true
+                }
+            }
+        },
         "/reward_scoring/score": {
             "post": {
                 "responses": {
@@ -1570,6 +1586,41 @@
                 }
             }
         },
+        "/memory_bank/update": {
+            "post": {
+                "responses": {
+                    "200": {
+                        "description": "OK"
+                    }
+                },
+                "tags": [
+                    "Memory"
+                ],
+                "parameters": [
+                    {
+                        "name": "bank_id",
+                        "in": "query",
+                        "required": true,
+                        "schema": {
+                            "type": "string"
+                        }
+                    }
+                ],
+                "requestBody": {
+                    "content": {
+                        "application/json": {
+                            "schema": {
+                                "type": "array",
+                                "items": {
+                                    "$ref": "#/components/schemas/MemoryBankDocument"
+                                }
+                            }
+                        }
+                    },
+                    "required": true
+                }
+            }
+        },
         "/experiments/update": {
             "post": {
                 "responses": {
@@ -1600,41 +1651,6 @@
                 }
             }
         },
-        "/memory_bank/update": {
-            "post": {
-                "responses": {
-                    "200": {
-                        "description": "OK"
-                    }
-                },
-                "tags": [
-                    "MemoryBanks"
-                ],
-                "parameters": [
-                    {
-                        "name": "bank_id",
-                        "in": "query",
-                        "required": true,
-                        "schema": {
-                            "type": "string"
-                        }
-                    }
-                ],
-                "requestBody": {
-                    "content": {
-                        "application/json": {
-                            "schema": {
-                                "type": "array",
-                                "items": {
-                                    "$ref": "#/components/schemas/MemoryBankDocument"
-                                }
-                            }
-                        }
-                    },
-                    "required": true
-                }
-            }
-        },
         "/runs/update": {
             "post": {
                 "responses": {
@@ -1699,22 +1715,6 @@
     "jsonSchemaDialect": "https://json-schema.org/draft/2020-12/schema",
     "components": {
         "schemas": {
-            "Attachment": {
-                "type": "object",
-                "properties": {
-                    "url": {
-                        "$ref": "#/components/schemas/URL"
-                    },
-                    "mime_type": {
-                        "type": "string"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "url",
-                    "mime_type"
-                ]
-            },
             "BatchChatCompletionRequest": {
                 "type": "object",
                 "properties": {
@@ -1746,12 +1746,18 @@
                     "sampling_params": {
                         "$ref": "#/components/schemas/SamplingParams"
                     },
-                    "available_tools": {
+                    "tools": {
                         "type": "array",
                         "items": {
                             "$ref": "#/components/schemas/ToolDefinition"
                         }
                     },
+                    "tool_choice": {
+                        "$ref": "#/components/schemas/ToolChoice"
+                    },
+                    "tool_prompt_format": {
+                        "$ref": "#/components/schemas/ToolPromptFormat"
+                    },
                     "logprobs": {
                         "type": "object",
                         "properties": {
@@ -1789,20 +1795,10 @@
                             {
                                 "type": "string"
                             },
-                            {
-                                "$ref": "#/components/schemas/Attachment"
-                            },
                             {
                                 "type": "array",
                                 "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
+                                    "type": "string"
                                 }
                             }
                         ]
@@ -1880,20 +1876,10 @@
                             {
                                 "type": "string"
                             },
-                            {
-                                "$ref": "#/components/schemas/Attachment"
-                            },
                             {
                                 "type": "array",
                                 "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
+                                    "type": "string"
                                 }
                             }
                         ]
@@ -1995,6 +1981,13 @@
                     "arguments"
                 ]
             },
+            "ToolChoice": {
+                "type": "string",
+                "enum": [
+                    "auto",
+                    "required"
+                ]
+            },
             "ToolDefinition": {
                 "type": "object",
                 "properties": {
@@ -2041,6 +2034,15 @@
                     "param_type"
                 ]
             },
+            "ToolPromptFormat": {
+                "type": "string",
+                "enum": [
+                    "json",
+                    "function_tag"
+                ],
+                "title": "This Enum refers to the prompt format for calling custom / zero shot tools",
+                "description": "`json` --\n    Refers to the json format for calling tools.\n    The json format takes the form like\n    {\n        \"type\": \"function\",\n        \"function\" : {\n            \"name\": \"function_name\",\n            \"description\": \"function_description\",\n            \"parameters\": {...}\n        }\n    }\n\n`function_tag` --\n    This is an example of how you could define\n    your own user defined format for making tool calls.\n    The function_tag format looks like this,\n    <function=function_name>(parameters)</function>\n\nThe detailed prompts for each of these formats are added to llama cli"
+            },
             "ToolResponseMessage": {
                 "type": "object",
                 "properties": {
@@ -2066,20 +2068,10 @@
                             {
                                 "type": "string"
                             },
-                            {
-                                "$ref": "#/components/schemas/Attachment"
-                            },
                             {
                                 "type": "array",
                                 "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
+                                    "type": "string"
                                 }
                             }
                         ]
@@ -2093,11 +2085,6 @@
                     "content"
                 ]
             },
-            "URL": {
-                "type": "string",
-                "format": "uri",
-                "pattern": "^(https?://|file://|data:)"
-            },
             "UserMessage": {
                 "type": "object",
                 "properties": {
@@ -2111,19 +2098,22 @@
                                 "type": "string"
                             },
                             {
-                                "$ref": "#/components/schemas/Attachment"
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            }
+                        ]
+                    },
+                    "context": {
+                        "oneOf": [
+                            {
+                                "type": "string"
                             },
                             {
                                 "type": "array",
                                 "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
+                                    "type": "string"
                                 }
                             }
                         ]
@@ -2163,20 +2153,10 @@
                                 {
                                     "type": "string"
                                 },
-                                {
-                                    "$ref": "#/components/schemas/Attachment"
-                                },
                                 {
                                     "type": "array",
                                     "items": {
-                                        "oneOf": [
-                                            {
-                                                "type": "string"
-                                            },
-                                            {
-                                                "$ref": "#/components/schemas/Attachment"
-                                            }
-                                        ]
+                                        "type": "string"
                                     }
                                 }
                             ]
@@ -2244,12 +2224,18 @@
                     "sampling_params": {
                         "$ref": "#/components/schemas/SamplingParams"
                     },
-                    "available_tools": {
+                    "tools": {
                         "type": "array",
                         "items": {
                             "$ref": "#/components/schemas/ToolDefinition"
                         }
                     },
+                    "tool_choice": {
+                        "$ref": "#/components/schemas/ToolChoice"
+                    },
+                    "tool_prompt_format": {
+                        "$ref": "#/components/schemas/ToolPromptFormat"
+                    },
                     "stream": {
                         "type": "boolean"
                     },
@@ -2381,20 +2367,10 @@
                             {
                                 "type": "string"
                             },
-                            {
-                                "$ref": "#/components/schemas/Attachment"
-                            },
                             {
                                 "type": "array",
                                 "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
+                                    "type": "string"
                                 }
                             }
                         ]
@@ -2443,102 +2419,71 @@
                 ],
                 "title": "streamed completion response."
             },
-            "AgenticSystemCreateRequest": {
+            "AgentConfig": {
                 "type": "object",
                 "properties": {
+                    "sampling_params": {
+                        "$ref": "#/components/schemas/SamplingParams"
+                    },
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "tools": {
+                        "type": "array",
+                        "items": {
+                            "oneOf": [
+                                {
+                                    "$ref": "#/components/schemas/BraveSearchToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/WolframAlphaToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/PhotogenToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/CodeInterpreterToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/FunctionCallToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/MemoryToolDefinition"
+                                }
+                            ]
+                        }
+                    },
+                    "tool_choice": {
+                        "$ref": "#/components/schemas/ToolChoice"
+                    },
+                    "tool_prompt_format": {
+                        "$ref": "#/components/schemas/ToolPromptFormat"
+                    },
                     "model": {
                         "type": "string"
                     },
-                    "instance_config": {
-                        "$ref": "#/components/schemas/AgenticSystemInstanceConfig"
+                    "instructions": {
+                        "type": "string"
                     }
                 },
                 "additionalProperties": false,
                 "required": [
                     "model",
-                    "instance_config"
-                ]
-            },
-            "AgenticSystemInstanceConfig": {
-                "type": "object",
-                "properties": {
-                    "instructions": {
-                        "type": "string"
-                    },
-                    "sampling_params": {
-                        "$ref": "#/components/schemas/SamplingParams"
-                    },
-                    "available_tools": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/AgenticSystemToolDefinition"
-                        }
-                    },
-                    "input_shields": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/ShieldDefinition"
-                        }
-                    },
-                    "output_shields": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/ShieldDefinition"
-                        }
-                    },
-                    "debug_prefix_messages": {
-                        "type": "array",
-                        "items": {
-                            "oneOf": [
-                                {
-                                    "$ref": "#/components/schemas/UserMessage"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/SystemMessage"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/ToolResponseMessage"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/CompletionMessage"
-                                }
-                            ]
-                        }
-                    },
-                    "tool_prompt_format": {
-                        "$ref": "#/components/schemas/ToolPromptFormat"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
                     "instructions"
                 ]
             },
-            "AgenticSystemToolDefinition": {
+            "BraveSearchToolDefinition": {
                 "type": "object",
                 "properties": {
-                    "tool_name": {
-                        "oneOf": [
-                            {
-                                "$ref": "#/components/schemas/BuiltinTool"
-                            },
-                            {
-                                "type": "string"
-                            }
-                        ]
-                    },
-                    "description": {
-                        "type": "string"
-                    },
-                    "parameters": {
-                        "type": "object",
-                        "additionalProperties": {
-                            "$ref": "#/components/schemas/ToolParamDefinition"
-                        }
-                    },
-                    "execution_config": {
-                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
-                    },
                     "input_shields": {
                         "type": "array",
                         "items": {
@@ -2550,11 +2495,18 @@
                         "items": {
                             "$ref": "#/components/schemas/ShieldDefinition"
                         }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "brave_search"
+                    },
+                    "remote_execution": {
+                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
                     }
                 },
                 "additionalProperties": false,
                 "required": [
-                    "tool_name"
+                    "type"
                 ]
             },
             "BuiltinShield": {
@@ -2567,6 +2519,204 @@
                     "jailbreak_shield"
                 ]
             },
+            "CodeInterpreterToolDefinition": {
+                "type": "object",
+                "properties": {
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "code_interpreter"
+                    },
+                    "enable_inline_code_execution": {
+                        "type": "boolean"
+                    },
+                    "remote_execution": {
+                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "type",
+                    "enable_inline_code_execution"
+                ]
+            },
+            "FunctionCallToolDefinition": {
+                "type": "object",
+                "properties": {
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "function_call"
+                    },
+                    "function_name": {
+                        "type": "string"
+                    },
+                    "description": {
+                        "type": "string"
+                    },
+                    "parameters": {
+                        "type": "object",
+                        "additionalProperties": {
+                            "$ref": "#/components/schemas/ToolParamDefinition"
+                        }
+                    },
+                    "remote_execution": {
+                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "type",
+                    "function_name",
+                    "description",
+                    "parameters"
+                ]
+            },
+            "MemoryToolDefinition": {
+                "type": "object",
+                "properties": {
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "memory"
+                    },
+                    "memory_bank_configs": {
+                        "type": "array",
+                        "items": {
+                            "oneOf": [
+                                {
+                                    "type": "object",
+                                    "properties": {
+                                        "bank_id": {
+                                            "type": "string"
+                                        },
+                                        "type": {
+                                            "type": "string",
+                                            "const": "vector"
+                                        }
+                                    },
+                                    "additionalProperties": false,
+                                    "required": [
+                                        "bank_id",
+                                        "type"
+                                    ]
+                                },
+                                {
+                                    "type": "object",
+                                    "properties": {
+                                        "bank_id": {
+                                            "type": "string"
+                                        },
+                                        "type": {
+                                            "type": "string",
+                                            "const": "keyvalue"
+                                        },
+                                        "keys": {
+                                            "type": "array",
+                                            "items": {
+                                                "type": "string"
+                                            }
+                                        }
+                                    },
+                                    "additionalProperties": false,
+                                    "required": [
+                                        "bank_id",
+                                        "type",
+                                        "keys"
+                                    ]
+                                },
+                                {
+                                    "type": "object",
+                                    "properties": {
+                                        "bank_id": {
+                                            "type": "string"
+                                        },
+                                        "type": {
+                                            "type": "string",
+                                            "const": "keyword"
+                                        }
+                                    },
+                                    "additionalProperties": false,
+                                    "required": [
+                                        "bank_id",
+                                        "type"
+                                    ]
+                                },
+                                {
+                                    "type": "object",
+                                    "properties": {
+                                        "bank_id": {
+                                            "type": "string"
+                                        },
+                                        "type": {
+                                            "type": "string",
+                                            "const": "graph"
+                                        },
+                                        "entities": {
+                                            "type": "array",
+                                            "items": {
+                                                "type": "string"
+                                            }
+                                        }
+                                    },
+                                    "additionalProperties": false,
+                                    "required": [
+                                        "bank_id",
+                                        "type",
+                                        "entities"
+                                    ]
+                                }
+                            ]
+                        }
+                    },
+                    "max_tokens_in_context": {
+                        "type": "integer"
+                    },
+                    "max_chunks": {
+                        "type": "integer"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "type",
+                    "memory_bank_configs",
+                    "max_tokens_in_context",
+                    "max_chunks"
+                ]
+            },
             "OnViolationAction": {
                 "type": "integer",
                 "enum": [
@@ -2575,6 +2725,34 @@
                     2
                 ]
             },
+            "PhotogenToolDefinition": {
+                "type": "object",
+                "properties": {
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "photogen"
+                    },
+                    "remote_execution": {
+                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "type"
+                ]
+            },
             "RestAPIExecutionConfig": {
                 "type": "object",
                 "properties": {
@@ -2653,41 +2831,49 @@
                     "on_violation_action"
                 ]
             },
-            "ToolPromptFormat": {
+            "URL": {
                 "type": "string",
-                "enum": [
-                    "json",
-                    "function_tag"
-                ],
-                "title": "This Enum refers to the prompt format for calling zero shot tools",
-                "description": "`json` --\n    Refers to the json format for calling tools.\n    The json format takes the form like\n    {\n        \"type\": \"function\",\n        \"function\" : {\n            \"name\": \"function_name\",\n            \"description\": \"function_description\",\n            \"parameters\": {...}\n        }\n    }\n\n`function_tag` --\n    This is an example of how you could define\n    your own user defined format for making tool calls.\n    The function_tag format looks like this,\n    <function=function_name>(parameters)</function>\n\nThe detailed prompts for each of these formats are defined in `system_prompt.py`"
+                "format": "uri",
+                "pattern": "^(https?://|file://|data:)"
+            },
+            "WolframAlphaToolDefinition": {
+                "type": "object",
+                "properties": {
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "type": {
+                        "type": "string",
+                        "const": "wolfram_alpha"
+                    },
+                    "remote_execution": {
+                        "$ref": "#/components/schemas/RestAPIExecutionConfig"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "type"
+                ]
             },
             "AgenticSystemCreateResponse": {
                 "type": "object",
                 "properties": {
-                    "system_id": {
+                    "agent_id": {
                         "type": "string"
                     }
                 },
                 "additionalProperties": false,
                 "required": [
-                    "system_id"
-                ]
-            },
-            "AgenticSystemSessionCreateRequest": {
-                "type": "object",
-                "properties": {
-                    "system_id": {
-                        "type": "string"
-                    },
-                    "session_name": {
-                        "type": "string"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "system_id",
-                    "session_name"
+                    "agent_id"
                 ]
             },
             "AgenticSystemSessionCreateResponse": {
@@ -2705,7 +2891,56 @@
             "AgenticSystemTurnCreateRequest": {
                 "type": "object",
                 "properties": {
-                    "system_id": {
+                    "sampling_params": {
+                        "$ref": "#/components/schemas/SamplingParams"
+                    },
+                    "input_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "output_shields": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ShieldDefinition"
+                        }
+                    },
+                    "tools": {
+                        "type": "array",
+                        "items": {
+                            "oneOf": [
+                                {
+                                    "$ref": "#/components/schemas/BraveSearchToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/WolframAlphaToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/PhotogenToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/CodeInterpreterToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/FunctionCallToolDefinition"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/MemoryToolDefinition"
+                                }
+                            ]
+                        }
+                    },
+                    "tool_choice": {
+                        "$ref": "#/components/schemas/ToolChoice"
+                    },
+                    "tool_prompt_format": {
+                        "$ref": "#/components/schemas/ToolPromptFormat"
+                    },
+                    "instructions": {
+                        "type": "string"
+                    },
+                    "agent_id": {
                         "type": "string"
                     },
                     "session_id": {
@@ -2724,22 +2959,550 @@
                             ]
                         }
                     },
+                    "attachments": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/Attachment"
+                        }
+                    },
                     "stream": {
                         "type": "boolean"
-                    },
-                    "override_config": {
-                        "$ref": "#/components/schemas/AgenticSystemInstanceConfig"
                     }
                 },
                 "additionalProperties": false,
                 "required": [
-                    "system_id",
+                    "agent_id",
                     "session_id",
                     "messages"
                 ]
             },
+            "Attachment": {
+                "type": "object",
+                "properties": {
+                    "content": {
+                        "oneOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            },
+                            {
+                                "$ref": "#/components/schemas/URL"
+                            }
+                        ]
+                    },
+                    "mime_type": {
+                        "type": "string"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "content",
+                    "mime_type"
+                ]
+            },
+            "AgenticSystemTurnResponseEvent": {
+                "type": "object",
+                "properties": {
+                    "payload": {
+                        "oneOf": [
+                            {
+                                "$ref": "#/components/schemas/AgenticSystemTurnResponseStepStartPayload"
+                            },
+                            {
+                                "$ref": "#/components/schemas/AgenticSystemTurnResponseStepProgressPayload"
+                            },
+                            {
+                                "$ref": "#/components/schemas/AgenticSystemTurnResponseStepCompletePayload"
+                            },
+                            {
+                                "$ref": "#/components/schemas/AgenticSystemTurnResponseTurnStartPayload"
+                            },
+                            {
+                                "$ref": "#/components/schemas/AgenticSystemTurnResponseTurnCompletePayload"
+                            }
+                        ]
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "payload"
+                ],
+                "title": "Streamed agent execution response."
+            },
+            "AgenticSystemTurnResponseStepCompletePayload": {
+                "type": "object",
+                "properties": {
+                    "event_type": {
+                        "type": "string",
+                        "const": "step_complete"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "enum": [
+                            "inference",
+                            "tool_execution",
+                            "shield_call",
+                            "memory_retrieval"
+                        ]
+                    },
+                    "step_details": {
+                        "oneOf": [
+                            {
+                                "$ref": "#/components/schemas/InferenceStep"
+                            },
+                            {
+                                "$ref": "#/components/schemas/ToolExecutionStep"
+                            },
+                            {
+                                "$ref": "#/components/schemas/ShieldCallStep"
+                            },
+                            {
+                                "$ref": "#/components/schemas/MemoryRetrievalStep"
+                            }
+                        ]
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event_type",
+                    "step_type",
+                    "step_details"
+                ]
+            },
+            "AgenticSystemTurnResponseStepProgressPayload": {
+                "type": "object",
+                "properties": {
+                    "event_type": {
+                        "type": "string",
+                        "const": "step_progress"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "enum": [
+                            "inference",
+                            "tool_execution",
+                            "shield_call",
+                            "memory_retrieval"
+                        ]
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "model_response_text_delta": {
+                        "type": "string"
+                    },
+                    "tool_call_delta": {
+                        "$ref": "#/components/schemas/ToolCallDelta"
+                    },
+                    "tool_response_text_delta": {
+                        "type": "string"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event_type",
+                    "step_type",
+                    "step_id"
+                ]
+            },
+            "AgenticSystemTurnResponseStepStartPayload": {
+                "type": "object",
+                "properties": {
+                    "event_type": {
+                        "type": "string",
+                        "const": "step_start"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "enum": [
+                            "inference",
+                            "tool_execution",
+                            "shield_call",
+                            "memory_retrieval"
+                        ]
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "metadata": {
+                        "type": "object",
+                        "additionalProperties": {
+                            "oneOf": [
+                                {
+                                    "type": "null"
+                                },
+                                {
+                                    "type": "boolean"
+                                },
+                                {
+                                    "type": "number"
+                                },
+                                {
+                                    "type": "string"
+                                },
+                                {
+                                    "type": "array"
+                                },
+                                {
+                                    "type": "object"
+                                }
+                            ]
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event_type",
+                    "step_type",
+                    "step_id"
+                ]
+            },
             "AgenticSystemTurnResponseStreamChunk": {
-                "description": "Server side event (SSE) stream of these events"
+                "type": "object",
+                "properties": {
+                    "event": {
+                        "$ref": "#/components/schemas/AgenticSystemTurnResponseEvent"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event"
+                ]
+            },
+            "AgenticSystemTurnResponseTurnCompletePayload": {
+                "type": "object",
+                "properties": {
+                    "event_type": {
+                        "type": "string",
+                        "const": "turn_complete"
+                    },
+                    "turn": {
+                        "$ref": "#/components/schemas/Turn"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event_type",
+                    "turn"
+                ]
+            },
+            "AgenticSystemTurnResponseTurnStartPayload": {
+                "type": "object",
+                "properties": {
+                    "event_type": {
+                        "type": "string",
+                        "const": "turn_start"
+                    },
+                    "turn_id": {
+                        "type": "string"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "event_type",
+                    "turn_id"
+                ]
+            },
+            "InferenceStep": {
+                "type": "object",
+                "properties": {
+                    "turn_id": {
+                        "type": "string"
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "started_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "completed_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "const": "inference"
+                    },
+                    "model_response": {
+                        "$ref": "#/components/schemas/CompletionMessage"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "turn_id",
+                    "step_id",
+                    "step_type",
+                    "model_response"
+                ]
+            },
+            "MemoryRetrievalStep": {
+                "type": "object",
+                "properties": {
+                    "turn_id": {
+                        "type": "string"
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "started_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "completed_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "const": "memory_retrieval"
+                    },
+                    "memory_bank_ids": {
+                        "type": "array",
+                        "items": {
+                            "type": "string"
+                        }
+                    },
+                    "inserted_context": {
+                        "oneOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            }
+                        ]
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "turn_id",
+                    "step_id",
+                    "step_type",
+                    "memory_bank_ids",
+                    "inserted_context"
+                ]
+            },
+            "ShieldCallStep": {
+                "type": "object",
+                "properties": {
+                    "turn_id": {
+                        "type": "string"
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "started_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "completed_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "const": "shield_call"
+                    },
+                    "response": {
+                        "$ref": "#/components/schemas/ShieldResponse"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "turn_id",
+                    "step_id",
+                    "step_type",
+                    "response"
+                ]
+            },
+            "ShieldResponse": {
+                "type": "object",
+                "properties": {
+                    "shield_type": {
+                        "oneOf": [
+                            {
+                                "$ref": "#/components/schemas/BuiltinShield"
+                            },
+                            {
+                                "type": "string"
+                            }
+                        ]
+                    },
+                    "is_violation": {
+                        "type": "boolean"
+                    },
+                    "violation_type": {
+                        "type": "string"
+                    },
+                    "violation_return_message": {
+                        "type": "string"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "shield_type",
+                    "is_violation"
+                ]
+            },
+            "ToolExecutionStep": {
+                "type": "object",
+                "properties": {
+                    "turn_id": {
+                        "type": "string"
+                    },
+                    "step_id": {
+                        "type": "string"
+                    },
+                    "started_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "completed_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "step_type": {
+                        "type": "string",
+                        "const": "tool_execution"
+                    },
+                    "tool_calls": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ToolCall"
+                        }
+                    },
+                    "tool_responses": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/ToolResponse"
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "turn_id",
+                    "step_id",
+                    "step_type",
+                    "tool_calls",
+                    "tool_responses"
+                ]
+            },
+            "ToolResponse": {
+                "type": "object",
+                "properties": {
+                    "call_id": {
+                        "type": "string"
+                    },
+                    "tool_name": {
+                        "oneOf": [
+                            {
+                                "$ref": "#/components/schemas/BuiltinTool"
+                            },
+                            {
+                                "type": "string"
+                            }
+                        ]
+                    },
+                    "content": {
+                        "oneOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            }
+                        ]
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "call_id",
+                    "tool_name",
+                    "content"
+                ]
+            },
+            "Turn": {
+                "type": "object",
+                "properties": {
+                    "turn_id": {
+                        "type": "string"
+                    },
+                    "session_id": {
+                        "type": "string"
+                    },
+                    "input_messages": {
+                        "type": "array",
+                        "items": {
+                            "oneOf": [
+                                {
+                                    "$ref": "#/components/schemas/UserMessage"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/ToolResponseMessage"
+                                }
+                            ]
+                        }
+                    },
+                    "steps": {
+                        "type": "array",
+                        "items": {
+                            "oneOf": [
+                                {
+                                    "$ref": "#/components/schemas/InferenceStep"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/ToolExecutionStep"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/ShieldCallStep"
+                                },
+                                {
+                                    "$ref": "#/components/schemas/MemoryRetrievalStep"
+                                }
+                            ]
+                        }
+                    },
+                    "output_message": {
+                        "$ref": "#/components/schemas/CompletionMessage"
+                    },
+                    "output_attachments": {
+                        "type": "array",
+                        "items": {
+                            "$ref": "#/components/schemas/Attachment"
+                        }
+                    },
+                    "started_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    },
+                    "completed_at": {
+                        "type": "string",
+                        "format": "date-time"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "turn_id",
+                    "session_id",
+                    "input_messages",
+                    "steps",
+                    "output_message",
+                    "output_attachments",
+                    "started_at"
+                ],
+                "title": "A single turn in an interaction with an Agentic System."
             },
             "CreateDatasetRequest": {
                 "type": "object",
@@ -2915,51 +3678,170 @@
                     "failed"
                 ]
             },
-            "MemoryBankDocument": {
+            "CreateMemoryBankRequest": {
                 "type": "object",
                 "properties": {
-                    "document_id": {
-                        "type": "string"
+                    "config": {
+                        "oneOf": [
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "vector"
+                                    },
+                                    "embedding_model": {
+                                        "type": "string"
+                                    },
+                                    "chunk_size_in_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "overlap_size_in_tokens": {
+                                        "type": "integer"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type",
+                                    "embedding_model",
+                                    "chunk_size_in_tokens"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "keyvalue"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "keyword"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "graph"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            }
+                        ]
                     },
-                    "content": {
-                        "type": "string",
-                        "contentEncoding": "base64"
-                    },
-                    "metadata": {
-                        "type": "object",
-                        "additionalProperties": {
-                            "oneOf": [
-                                {
-                                    "type": "null"
-                                },
-                                {
-                                    "type": "boolean"
-                                },
-                                {
-                                    "type": "number"
-                                },
-                                {
-                                    "type": "string"
-                                },
-                                {
-                                    "type": "array"
-                                },
-                                {
-                                    "type": "object"
-                                }
-                            ]
-                        }
-                    },
-                    "mime_type": {
-                        "type": "string"
+                    "url": {
+                        "$ref": "#/components/schemas/URL"
                     }
                 },
                 "additionalProperties": false,
                 "required": [
-                    "document_id",
-                    "content",
-                    "metadata",
-                    "mime_type"
+                    "config"
+                ]
+            },
+            "MemoryBank": {
+                "type": "object",
+                "properties": {
+                    "bank_id": {
+                        "type": "string"
+                    },
+                    "name": {
+                        "type": "string"
+                    },
+                    "config": {
+                        "oneOf": [
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "vector"
+                                    },
+                                    "embedding_model": {
+                                        "type": "string"
+                                    },
+                                    "chunk_size_in_tokens": {
+                                        "type": "integer"
+                                    },
+                                    "overlap_size_in_tokens": {
+                                        "type": "integer"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type",
+                                    "embedding_model",
+                                    "chunk_size_in_tokens"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "keyvalue"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "keyword"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            },
+                            {
+                                "type": "object",
+                                "properties": {
+                                    "type": {
+                                        "type": "string",
+                                        "const": "graph"
+                                    }
+                                },
+                                "additionalProperties": false,
+                                "required": [
+                                    "type"
+                                ]
+                            }
+                        ]
+                    },
+                    "url": {
+                        "$ref": "#/components/schemas/URL"
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "bank_id",
+                    "name",
+                    "config"
                 ]
             },
             "CreateRunRequest": {
@@ -3054,6 +3936,24 @@
                     "metadata"
                 ]
             },
+            "EmbeddingsResponse": {
+                "type": "object",
+                "properties": {
+                    "embeddings": {
+                        "type": "array",
+                        "items": {
+                            "type": "array",
+                            "items": {
+                                "type": "number"
+                            }
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "embeddings"
+                ]
+            },
             "Checkpoint": {
                 "description": "Checkpoint created during training runs"
             },
@@ -3178,89 +4078,6 @@
                 ],
                 "title": "Request to evaluate text generation."
             },
-            "InferenceStep": {
-                "type": "object",
-                "properties": {
-                    "turn_id": {
-                        "type": "string"
-                    },
-                    "step_id": {
-                        "type": "string"
-                    },
-                    "started_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "completed_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "step_type": {
-                        "type": "string",
-                        "const": "inference"
-                    },
-                    "model_response": {
-                        "$ref": "#/components/schemas/CompletionMessage"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "turn_id",
-                    "step_id",
-                    "step_type",
-                    "model_response"
-                ]
-            },
-            "MemoryRetrievalStep": {
-                "type": "object",
-                "properties": {
-                    "turn_id": {
-                        "type": "string"
-                    },
-                    "step_id": {
-                        "type": "string"
-                    },
-                    "started_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "completed_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "step_type": {
-                        "type": "string",
-                        "const": "memory_retrieval"
-                    },
-                    "memory_bank_ids": {
-                        "type": "array",
-                        "items": {
-                            "type": "string"
-                        }
-                    },
-                    "documents": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/MemoryBankDocument"
-                        }
-                    },
-                    "scores": {
-                        "type": "array",
-                        "items": {
-                            "type": "number"
-                        }
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "turn_id",
-                    "step_id",
-                    "step_type",
-                    "memory_bank_ids",
-                    "documents",
-                    "scores"
-                ]
-            },
             "Session": {
                 "type": "object",
                 "properties": {
@@ -3279,6 +4096,9 @@
                     "started_at": {
                         "type": "string",
                         "format": "date-time"
+                    },
+                    "memory_bank": {
+                        "$ref": "#/components/schemas/MemoryBank"
                     }
                 },
                 "additionalProperties": false,
@@ -3290,222 +4110,6 @@
                 ],
                 "title": "A single session of an interaction with an Agentic System."
             },
-            "ShieldCallStep": {
-                "type": "object",
-                "properties": {
-                    "turn_id": {
-                        "type": "string"
-                    },
-                    "step_id": {
-                        "type": "string"
-                    },
-                    "started_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "completed_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "step_type": {
-                        "type": "string",
-                        "const": "shield_call"
-                    },
-                    "response": {
-                        "$ref": "#/components/schemas/ShieldResponse"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "turn_id",
-                    "step_id",
-                    "step_type",
-                    "response"
-                ]
-            },
-            "ShieldResponse": {
-                "type": "object",
-                "properties": {
-                    "shield_type": {
-                        "oneOf": [
-                            {
-                                "$ref": "#/components/schemas/BuiltinShield"
-                            },
-                            {
-                                "type": "string"
-                            }
-                        ]
-                    },
-                    "is_violation": {
-                        "type": "boolean"
-                    },
-                    "violation_type": {
-                        "type": "string"
-                    },
-                    "violation_return_message": {
-                        "type": "string"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "shield_type",
-                    "is_violation"
-                ]
-            },
-            "ToolExecutionStep": {
-                "type": "object",
-                "properties": {
-                    "turn_id": {
-                        "type": "string"
-                    },
-                    "step_id": {
-                        "type": "string"
-                    },
-                    "started_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "completed_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "step_type": {
-                        "type": "string",
-                        "const": "tool_execution"
-                    },
-                    "tool_calls": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/ToolCall"
-                        }
-                    },
-                    "tool_responses": {
-                        "type": "array",
-                        "items": {
-                            "$ref": "#/components/schemas/ToolResponse"
-                        }
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "turn_id",
-                    "step_id",
-                    "step_type",
-                    "tool_calls",
-                    "tool_responses"
-                ]
-            },
-            "ToolResponse": {
-                "type": "object",
-                "properties": {
-                    "call_id": {
-                        "type": "string"
-                    },
-                    "tool_name": {
-                        "oneOf": [
-                            {
-                                "$ref": "#/components/schemas/BuiltinTool"
-                            },
-                            {
-                                "type": "string"
-                            }
-                        ]
-                    },
-                    "content": {
-                        "oneOf": [
-                            {
-                                "type": "string"
-                            },
-                            {
-                                "$ref": "#/components/schemas/Attachment"
-                            },
-                            {
-                                "type": "array",
-                                "items": {
-                                    "oneOf": [
-                                        {
-                                            "type": "string"
-                                        },
-                                        {
-                                            "$ref": "#/components/schemas/Attachment"
-                                        }
-                                    ]
-                                }
-                            }
-                        ]
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "call_id",
-                    "tool_name",
-                    "content"
-                ]
-            },
-            "Turn": {
-                "type": "object",
-                "properties": {
-                    "turn_id": {
-                        "type": "string"
-                    },
-                    "session_id": {
-                        "type": "string"
-                    },
-                    "input_messages": {
-                        "type": "array",
-                        "items": {
-                            "oneOf": [
-                                {
-                                    "$ref": "#/components/schemas/UserMessage"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/ToolResponseMessage"
-                                }
-                            ]
-                        }
-                    },
-                    "steps": {
-                        "type": "array",
-                        "items": {
-                            "oneOf": [
-                                {
-                                    "$ref": "#/components/schemas/InferenceStep"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/ToolExecutionStep"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/ShieldCallStep"
-                                },
-                                {
-                                    "$ref": "#/components/schemas/MemoryRetrievalStep"
-                                }
-                            ]
-                        }
-                    },
-                    "output_message": {
-                        "$ref": "#/components/schemas/CompletionMessage"
-                    },
-                    "started_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    },
-                    "completed_at": {
-                        "type": "string",
-                        "format": "date-time"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "turn_id",
-                    "session_id",
-                    "input_messages",
-                    "steps",
-                    "output_message",
-                    "started_at"
-                ],
-                "title": "A single turn in an interaction with an Agentic System."
-            },
             "AgenticSystemStepResponse": {
                 "type": "object",
                 "properties": {
@@ -3599,6 +4203,65 @@
                     "other"
                 ]
             },
+            "MemoryBankDocument": {
+                "type": "object",
+                "properties": {
+                    "document_id": {
+                        "type": "string"
+                    },
+                    "content": {
+                        "oneOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            },
+                            {
+                                "$ref": "#/components/schemas/URL"
+                            }
+                        ]
+                    },
+                    "mime_type": {
+                        "type": "string"
+                    },
+                    "metadata": {
+                        "type": "object",
+                        "additionalProperties": {
+                            "oneOf": [
+                                {
+                                    "type": "null"
+                                },
+                                {
+                                    "type": "boolean"
+                                },
+                                {
+                                    "type": "number"
+                                },
+                                {
+                                    "type": "string"
+                                },
+                                {
+                                    "type": "array"
+                                },
+                                {
+                                    "type": "object"
+                                }
+                            ]
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "document_id",
+                    "content",
+                    "mime_type",
+                    "metadata"
+                ]
+            },
             "EvaluationJobArtifactsResponse": {
                 "type": "object",
                 "properties": {
@@ -3720,22 +4383,6 @@
                     "additional_info"
                 ]
             },
-            "MemoryBank": {
-                "type": "object",
-                "properties": {
-                    "memory_bank_id": {
-                        "type": "string"
-                    },
-                    "memory_bank_name": {
-                        "type": "string"
-                    }
-                },
-                "additionalProperties": false,
-                "required": [
-                    "memory_bank_id",
-                    "memory_bank_name"
-                ]
-            },
             "Metric": {
                 "type": "object",
                 "properties": {
@@ -4121,6 +4768,102 @@
                     "fsdp_cpu_offload"
                 ]
             },
+            "QueryDocumentsRequest": {
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "oneOf": [
+                            {
+                                "type": "string"
+                            },
+                            {
+                                "type": "array",
+                                "items": {
+                                    "type": "string"
+                                }
+                            }
+                        ]
+                    },
+                    "params": {
+                        "type": "object",
+                        "additionalProperties": {
+                            "oneOf": [
+                                {
+                                    "type": "null"
+                                },
+                                {
+                                    "type": "boolean"
+                                },
+                                {
+                                    "type": "number"
+                                },
+                                {
+                                    "type": "string"
+                                },
+                                {
+                                    "type": "array"
+                                },
+                                {
+                                    "type": "object"
+                                }
+                            ]
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "query"
+                ]
+            },
+            "QueryDocumentsResponse": {
+                "type": "object",
+                "properties": {
+                    "chunks": {
+                        "type": "array",
+                        "items": {
+                            "type": "object",
+                            "properties": {
+                                "content": {
+                                    "oneOf": [
+                                        {
+                                            "type": "string"
+                                        },
+                                        {
+                                            "type": "array",
+                                            "items": {
+                                                "type": "string"
+                                            }
+                                        }
+                                    ]
+                                },
+                                "token_count": {
+                                    "type": "integer"
+                                },
+                                "document_id": {
+                                    "type": "string"
+                                }
+                            },
+                            "additionalProperties": false,
+                            "required": [
+                                "content",
+                                "token_count",
+                                "document_id"
+                            ]
+                        }
+                    },
+                    "scores": {
+                        "type": "array",
+                        "items": {
+                            "type": "number"
+                        }
+                    }
+                },
+                "additionalProperties": false,
+                "required": [
+                    "chunks",
+                    "scores"
+                ]
+            },
             "DialogGenerations": {
                 "type": "object",
                 "properties": {
@@ -4703,36 +5446,35 @@
         }
     ],
     "tags": [
-        {
-            "name": "RewardScoring"
-        },
-        {
-            "name": "Datasets"
-        },
-        {
-            "name": "Observability"
-        },
         {
             "name": "AgenticSystem"
         },
         {
-            "name": "Inference"
-        },
-        {
-            "name": "Evaluations"
+            "name": "Memory"
         },
         {
             "name": "SyntheticDataGeneration"
         },
+        {
+            "name": "Inference"
+        },
+        {
+            "name": "Observability"
+        },
+        {
+            "name": "BatchInference"
+        },
+        {
+            "name": "Datasets"
+        },
+        {
+            "name": "RewardScoring"
+        },
         {
             "name": "PostTraining"
         },
         {
-            "name": "MemoryBanks"
-        },
-        {
-            "name": "Attachment",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/Attachment\" />"
+            "name": "Evaluations"
         },
         {
             "name": "BatchChatCompletionRequest",
@@ -4766,6 +5508,10 @@
             "name": "ToolCall",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolCall\" />"
         },
+        {
+            "name": "ToolChoice",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolChoice\" />"
+        },
         {
             "name": "ToolDefinition",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolDefinition\" />"
@@ -4775,12 +5521,12 @@
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolParamDefinition\" />"
         },
         {
-            "name": "ToolResponseMessage",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolResponseMessage\" />"
+            "name": "ToolPromptFormat",
+            "description": "This Enum refers to the prompt format for calling custom / zero shot tools\n\n`json` --\n    Refers to the json format for calling tools.\n    The json format takes the form like\n    {\n        \"type\": \"function\",\n        \"function\" : {\n            \"name\": \"function_name\",\n            \"description\": \"function_description\",\n            \"parameters\": {...}\n        }\n    }\n\n`function_tag` --\n    This is an example of how you could define\n    your own user defined format for making tool calls.\n    The function_tag format looks like this,\n    <function=function_name>(parameters)</function>\n\nThe detailed prompts for each of these formats are added to llama cli\n\n<SchemaDefinition schemaRef=\"#/components/schemas/ToolPromptFormat\" />"
         },
         {
-            "name": "URL",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/URL\" />"
+            "name": "ToolResponseMessage",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolResponseMessage\" />"
         },
         {
             "name": "UserMessage",
@@ -4835,25 +5581,37 @@
             "description": "streamed completion response.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/CompletionResponseStreamChunk\" />"
         },
         {
-            "name": "AgenticSystemCreateRequest",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemCreateRequest\" />"
+            "name": "AgentConfig",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgentConfig\" />"
         },
         {
-            "name": "AgenticSystemInstanceConfig",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemInstanceConfig\" />"
-        },
-        {
-            "name": "AgenticSystemToolDefinition",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemToolDefinition\" />"
+            "name": "BraveSearchToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/BraveSearchToolDefinition\" />"
         },
         {
             "name": "BuiltinShield",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/BuiltinShield\" />"
         },
+        {
+            "name": "CodeInterpreterToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/CodeInterpreterToolDefinition\" />"
+        },
+        {
+            "name": "FunctionCallToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/FunctionCallToolDefinition\" />"
+        },
+        {
+            "name": "MemoryToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryToolDefinition\" />"
+        },
         {
             "name": "OnViolationAction",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/OnViolationAction\" />"
         },
+        {
+            "name": "PhotogenToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/PhotogenToolDefinition\" />"
+        },
         {
             "name": "RestAPIExecutionConfig",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/RestAPIExecutionConfig\" />"
@@ -4867,17 +5625,17 @@
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ShieldDefinition\" />"
         },
         {
-            "name": "ToolPromptFormat",
-            "description": "This Enum refers to the prompt format for calling zero shot tools\n\n`json` --\n    Refers to the json format for calling tools.\n    The json format takes the form like\n    {\n        \"type\": \"function\",\n        \"function\" : {\n            \"name\": \"function_name\",\n            \"description\": \"function_description\",\n            \"parameters\": {...}\n        }\n    }\n\n`function_tag` --\n    This is an example of how you could define\n    your own user defined format for making tool calls.\n    The function_tag format looks like this,\n    <function=function_name>(parameters)</function>\n\nThe detailed prompts for each of these formats are defined in `system_prompt.py`\n\n<SchemaDefinition schemaRef=\"#/components/schemas/ToolPromptFormat\" />"
+            "name": "URL",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/URL\" />"
+        },
+        {
+            "name": "WolframAlphaToolDefinition",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/WolframAlphaToolDefinition\" />"
         },
         {
             "name": "AgenticSystemCreateResponse",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemCreateResponse\" />"
         },
-        {
-            "name": "AgenticSystemSessionCreateRequest",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemSessionCreateRequest\" />"
-        },
         {
             "name": "AgenticSystemSessionCreateResponse",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemSessionCreateResponse\" />"
@@ -4886,9 +5644,65 @@
             "name": "AgenticSystemTurnCreateRequest",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnCreateRequest\" />"
         },
+        {
+            "name": "Attachment",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/Attachment\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseEvent",
+            "description": "Streamed agent execution response.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseEvent\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseStepCompletePayload",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseStepCompletePayload\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseStepProgressPayload",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseStepProgressPayload\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseStepStartPayload",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseStepStartPayload\" />"
+        },
         {
             "name": "AgenticSystemTurnResponseStreamChunk",
-            "description": "Server side event (SSE) stream of these events\n\n<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseStreamChunk\" />"
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseStreamChunk\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseTurnCompletePayload",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseTurnCompletePayload\" />"
+        },
+        {
+            "name": "AgenticSystemTurnResponseTurnStartPayload",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemTurnResponseTurnStartPayload\" />"
+        },
+        {
+            "name": "InferenceStep",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/InferenceStep\" />"
+        },
+        {
+            "name": "MemoryRetrievalStep",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryRetrievalStep\" />"
+        },
+        {
+            "name": "ShieldCallStep",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ShieldCallStep\" />"
+        },
+        {
+            "name": "ShieldResponse",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ShieldResponse\" />"
+        },
+        {
+            "name": "ToolExecutionStep",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolExecutionStep\" />"
+        },
+        {
+            "name": "ToolResponse",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolResponse\" />"
+        },
+        {
+            "name": "Turn",
+            "description": "A single turn in an interaction with an Agentic System.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/Turn\" />"
         },
         {
             "name": "CreateDatasetRequest",
@@ -4915,8 +5729,12 @@
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ExperimentStatus\" />"
         },
         {
-            "name": "MemoryBankDocument",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryBankDocument\" />"
+            "name": "CreateMemoryBankRequest",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/CreateMemoryBankRequest\" />"
+        },
+        {
+            "name": "MemoryBank",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryBank\" />"
         },
         {
             "name": "CreateRunRequest",
@@ -4926,6 +5744,10 @@
             "name": "Run",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/Run\" />"
         },
+        {
+            "name": "EmbeddingsResponse",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/EmbeddingsResponse\" />"
+        },
         {
             "name": "Checkpoint",
             "description": "Checkpoint created during training runs\n\n<SchemaDefinition schemaRef=\"#/components/schemas/Checkpoint\" />"
@@ -4946,38 +5768,10 @@
             "name": "EvaluateTextGenerationRequest",
             "description": "Request to evaluate text generation.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/EvaluateTextGenerationRequest\" />"
         },
-        {
-            "name": "InferenceStep",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/InferenceStep\" />"
-        },
-        {
-            "name": "MemoryRetrievalStep",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryRetrievalStep\" />"
-        },
         {
             "name": "Session",
             "description": "A single session of an interaction with an Agentic System.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/Session\" />"
         },
-        {
-            "name": "ShieldCallStep",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ShieldCallStep\" />"
-        },
-        {
-            "name": "ShieldResponse",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ShieldResponse\" />"
-        },
-        {
-            "name": "ToolExecutionStep",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolExecutionStep\" />"
-        },
-        {
-            "name": "ToolResponse",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ToolResponse\" />"
-        },
-        {
-            "name": "Turn",
-            "description": "A single turn in an interaction with an Agentic System.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/Turn\" />"
-        },
         {
             "name": "AgenticSystemStepResponse",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/AgenticSystemStepResponse\" />"
@@ -4990,6 +5784,10 @@
             "name": "ArtifactType",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/ArtifactType\" />"
         },
+        {
+            "name": "MemoryBankDocument",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryBankDocument\" />"
+        },
         {
             "name": "EvaluationJobArtifactsResponse",
             "description": "Artifacts of a evaluation job.\n\n<SchemaDefinition schemaRef=\"#/components/schemas/EvaluationJobArtifactsResponse\" />"
@@ -5010,10 +5808,6 @@
             "name": "Log",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/Log\" />"
         },
-        {
-            "name": "MemoryBank",
-            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/MemoryBank\" />"
-        },
         {
             "name": "Metric",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/Metric\" />"
@@ -5066,6 +5860,14 @@
             "name": "TrainingConfig",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/TrainingConfig\" />"
         },
+        {
+            "name": "QueryDocumentsRequest",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/QueryDocumentsRequest\" />"
+        },
+        {
+            "name": "QueryDocumentsResponse",
+            "description": "<SchemaDefinition schemaRef=\"#/components/schemas/QueryDocumentsResponse\" />"
+        },
         {
             "name": "DialogGenerations",
             "description": "<SchemaDefinition schemaRef=\"#/components/schemas/DialogGenerations\" />"
@@ -5132,10 +5934,11 @@
             "name": "Operations",
             "tags": [
                 "AgenticSystem",
+                "BatchInference",
                 "Datasets",
                 "Evaluations",
                 "Inference",
-                "MemoryBanks",
+                "Memory",
                 "Observability",
                 "PostTraining",
                 "RewardScoring",
@@ -5145,15 +5948,18 @@
         {
             "name": "Types",
             "tags": [
-                "AgenticSystemCreateRequest",
+                "AgentConfig",
                 "AgenticSystemCreateResponse",
-                "AgenticSystemInstanceConfig",
-                "AgenticSystemSessionCreateRequest",
                 "AgenticSystemSessionCreateResponse",
                 "AgenticSystemStepResponse",
-                "AgenticSystemToolDefinition",
                 "AgenticSystemTurnCreateRequest",
+                "AgenticSystemTurnResponseEvent",
+                "AgenticSystemTurnResponseStepCompletePayload",
+                "AgenticSystemTurnResponseStepProgressPayload",
+                "AgenticSystemTurnResponseStepStartPayload",
                 "AgenticSystemTurnResponseStreamChunk",
+                "AgenticSystemTurnResponseTurnCompletePayload",
+                "AgenticSystemTurnResponseTurnStartPayload",
                 "Artifact",
                 "ArtifactType",
                 "Attachment",
@@ -5161,6 +5967,7 @@
                 "BatchChatCompletionResponse",
                 "BatchCompletionRequest",
                 "BatchCompletionResponse",
+                "BraveSearchToolDefinition",
                 "BuiltinShield",
                 "BuiltinTool",
                 "ChatCompletionRequest",
@@ -5168,15 +5975,18 @@
                 "ChatCompletionResponseEventType",
                 "ChatCompletionResponseStreamChunk",
                 "Checkpoint",
+                "CodeInterpreterToolDefinition",
                 "CompletionMessage",
                 "CompletionRequest",
                 "CompletionResponseStreamChunk",
                 "CreateDatasetRequest",
                 "CreateExperimentRequest",
+                "CreateMemoryBankRequest",
                 "CreateRunRequest",
                 "DPOAlignmentConfig",
                 "DialogGenerations",
                 "DoraFinetuningConfig",
+                "EmbeddingsResponse",
                 "EvaluateQuestionAnsweringRequest",
                 "EvaluateSummarizationRequest",
                 "EvaluateTextGenerationRequest",
@@ -5187,6 +5997,7 @@
                 "Experiment",
                 "ExperimentStatus",
                 "FinetuningAlgorithm",
+                "FunctionCallToolDefinition",
                 "InferenceStep",
                 "Log",
                 "LogMessagesRequest",
@@ -5196,9 +6007,11 @@
                 "MemoryBank",
                 "MemoryBankDocument",
                 "MemoryRetrievalStep",
+                "MemoryToolDefinition",
                 "Metric",
                 "OnViolationAction",
                 "OptimizerConfig",
+                "PhotogenToolDefinition",
                 "PostTrainingJob",
                 "PostTrainingJobArtifactsResponse",
                 "PostTrainingJobLogStream",
@@ -5207,6 +6020,8 @@
                 "PostTrainingRLHFRequest",
                 "PostTrainingSFTRequest",
                 "QLoraFinetuningConfig",
+                "QueryDocumentsRequest",
+                "QueryDocumentsResponse",
                 "RLHFAlgorithm",
                 "RestAPIExecutionConfig",
                 "RestAPIMethod",
@@ -5229,6 +6044,7 @@
                 "ToolCall",
                 "ToolCallDelta",
                 "ToolCallParseStatus",
+                "ToolChoice",
                 "ToolDefinition",
                 "ToolExecutionStep",
                 "ToolParamDefinition",
@@ -5243,7 +6059,8 @@
                 "UpdateExperimentRequest",
                 "UpdateRunRequest",
                 "UploadArtifactRequest",
-                "UserMessage"
+                "UserMessage",
+                "WolframAlphaToolDefinition"
             ]
         }
     ]
diff --git a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml b/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml
index 7cfb22669..d4bd3cbfc 100644
--- a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml
+++ b/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml
@@ -1,67 +1,48 @@
 components:
   responses: {}
   schemas:
-    AgenticSystemCreateRequest:
+    AgentConfig:
       additionalProperties: false
       properties:
-        instance_config:
-          $ref: '#/components/schemas/AgenticSystemInstanceConfig'
-        model:
-          type: string
-      required:
-      - model
-      - instance_config
-      type: object
-    AgenticSystemCreateResponse:
-      additionalProperties: false
-      properties:
-        system_id:
-          type: string
-      required:
-      - system_id
-      type: object
-    AgenticSystemInstanceConfig:
-      additionalProperties: false
-      properties:
-        available_tools:
-          items:
-            $ref: '#/components/schemas/AgenticSystemToolDefinition'
-          type: array
-        debug_prefix_messages:
-          items:
-            oneOf:
-            - $ref: '#/components/schemas/UserMessage'
-            - $ref: '#/components/schemas/SystemMessage'
-            - $ref: '#/components/schemas/ToolResponseMessage'
-            - $ref: '#/components/schemas/CompletionMessage'
-          type: array
         input_shields:
           items:
             $ref: '#/components/schemas/ShieldDefinition'
           type: array
         instructions:
           type: string
+        model:
+          type: string
         output_shields:
           items:
             $ref: '#/components/schemas/ShieldDefinition'
           type: array
         sampling_params:
           $ref: '#/components/schemas/SamplingParams'
+        tool_choice:
+          $ref: '#/components/schemas/ToolChoice'
         tool_prompt_format:
           $ref: '#/components/schemas/ToolPromptFormat'
+        tools:
+          items:
+            oneOf:
+            - $ref: '#/components/schemas/BraveSearchToolDefinition'
+            - $ref: '#/components/schemas/WolframAlphaToolDefinition'
+            - $ref: '#/components/schemas/PhotogenToolDefinition'
+            - $ref: '#/components/schemas/CodeInterpreterToolDefinition'
+            - $ref: '#/components/schemas/FunctionCallToolDefinition'
+            - $ref: '#/components/schemas/MemoryToolDefinition'
+          type: array
       required:
+      - model
       - instructions
       type: object
-    AgenticSystemSessionCreateRequest:
+    AgenticSystemCreateResponse:
       additionalProperties: false
       properties:
-        session_name:
-          type: string
-        system_id:
+        agent_id:
           type: string
       required:
-      - system_id
-      - session_name
+      - agent_id
       type: object
     AgenticSystemSessionCreateResponse:
       additionalProperties: false
@@ -83,56 +64,182 @@ components:
       required:
       - step
       type: object
-    AgenticSystemToolDefinition:
+    AgenticSystemTurnCreateRequest:
       additionalProperties: false
       properties:
-        description:
+        agent_id:
           type: string
-        execution_config:
-          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        attachments:
+          items:
+            $ref: '#/components/schemas/Attachment'
+          type: array
         input_shields:
           items:
             $ref: '#/components/schemas/ShieldDefinition'
           type: array
-        output_shields:
-          items:
-            $ref: '#/components/schemas/ShieldDefinition'
-          type: array
-        parameters:
-          additionalProperties:
-            $ref: '#/components/schemas/ToolParamDefinition'
-          type: object
-        tool_name:
-          oneOf:
-          - $ref: '#/components/schemas/BuiltinTool'
-          - type: string
-      required:
-      - tool_name
-      type: object
-    AgenticSystemTurnCreateRequest:
-      additionalProperties: false
-      properties:
+        instructions:
+          type: string
         messages:
           items:
             oneOf:
             - $ref: '#/components/schemas/UserMessage'
             - $ref: '#/components/schemas/ToolResponseMessage'
           type: array
-        override_config:
-          $ref: '#/components/schemas/AgenticSystemInstanceConfig'
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        sampling_params:
+          $ref: '#/components/schemas/SamplingParams'
         session_id:
           type: string
         stream:
           type: boolean
-        system_id:
-          type: string
+        tool_choice:
+          $ref: '#/components/schemas/ToolChoice'
+        tool_prompt_format:
+          $ref: '#/components/schemas/ToolPromptFormat'
+        tools:
+          items:
+            oneOf:
+            - $ref: '#/components/schemas/BraveSearchToolDefinition'
+            - $ref: '#/components/schemas/WolframAlphaToolDefinition'
+            - $ref: '#/components/schemas/PhotogenToolDefinition'
+            - $ref: '#/components/schemas/CodeInterpreterToolDefinition'
+            - $ref: '#/components/schemas/FunctionCallToolDefinition'
+            - $ref: '#/components/schemas/MemoryToolDefinition'
+          type: array
       required:
-      - system_id
+      - agent_id
       - session_id
       - messages
       type: object
+    AgenticSystemTurnResponseEvent:
+      additionalProperties: false
+      properties:
+        payload:
+          oneOf:
+          - $ref: '#/components/schemas/AgenticSystemTurnResponseStepStartPayload'
+          - $ref: '#/components/schemas/AgenticSystemTurnResponseStepProgressPayload'
+          - $ref: '#/components/schemas/AgenticSystemTurnResponseStepCompletePayload'
+          - $ref: '#/components/schemas/AgenticSystemTurnResponseTurnStartPayload'
+          - $ref: '#/components/schemas/AgenticSystemTurnResponseTurnCompletePayload'
+      required:
+      - payload
+      title: Streamed agent execution response.
+      type: object
+    AgenticSystemTurnResponseStepCompletePayload:
+      additionalProperties: false
+      properties:
+        event_type:
+          const: step_complete
+          type: string
+        step_details:
+          oneOf:
+          - $ref: '#/components/schemas/InferenceStep'
+          - $ref: '#/components/schemas/ToolExecutionStep'
+          - $ref: '#/components/schemas/ShieldCallStep'
+          - $ref: '#/components/schemas/MemoryRetrievalStep'
+        step_type:
+          enum:
+          - inference
+          - tool_execution
+          - shield_call
+          - memory_retrieval
+          type: string
+      required:
+      - event_type
+      - step_type
+      - step_details
+      type: object
+    AgenticSystemTurnResponseStepProgressPayload:
+      additionalProperties: false
+      properties:
+        event_type:
+          const: step_progress
+          type: string
+        model_response_text_delta:
+          type: string
+        step_id:
+          type: string
+        step_type:
+          enum:
+          - inference
+          - tool_execution
+          - shield_call
+          - memory_retrieval
+          type: string
+        tool_call_delta:
+          $ref: '#/components/schemas/ToolCallDelta'
+        tool_response_text_delta:
+          type: string
+      required:
+      - event_type
+      - step_type
+      - step_id
+      type: object
+    AgenticSystemTurnResponseStepStartPayload:
+      additionalProperties: false
+      properties:
+        event_type:
+          const: step_start
+          type: string
+        metadata:
+          additionalProperties:
+            oneOf:
+            - type: 'null'
+            - type: boolean
+            - type: number
+            - type: string
+            - type: array
+            - type: object
+          type: object
+        step_id:
+          type: string
+        step_type:
+          enum:
+          - inference
+          - tool_execution
+          - shield_call
+          - memory_retrieval
+          type: string
+      required:
+      - event_type
+      - step_type
+      - step_id
+      type: object
     AgenticSystemTurnResponseStreamChunk:
-      description: Server side event (SSE) stream of these events
+      additionalProperties: false
+      properties:
+        event:
+          $ref: '#/components/schemas/AgenticSystemTurnResponseEvent'
+      required:
+      - event
+      type: object
+    AgenticSystemTurnResponseTurnCompletePayload:
+      additionalProperties: false
+      properties:
+        event_type:
+          const: turn_complete
+          type: string
+        turn:
+          $ref: '#/components/schemas/Turn'
+      required:
+      - event_type
+      - turn
+      type: object
+    AgenticSystemTurnResponseTurnStartPayload:
+      additionalProperties: false
+      properties:
+        event_type:
+          const: turn_start
+          type: string
+        turn_id:
+          type: string
+      required:
+      - event_type
+      - turn_id
+      type: object
     Artifact:
       additionalProperties: false
       properties:
@@ -179,21 +286,22 @@ components:
     Attachment:
       additionalProperties: false
       properties:
+        content:
+          oneOf:
+          - type: string
+          - items:
+              type: string
+            type: array
+          - $ref: '#/components/schemas/URL'
         mime_type:
           type: string
-        url:
-          $ref: '#/components/schemas/URL'
       required:
-      - url
+      - content
       - mime_type
       type: object
     BatchChatCompletionRequest:
       additionalProperties: false
       properties:
-        available_tools:
-          items:
-            $ref: '#/components/schemas/ToolDefinition'
-          type: array
         logprobs:
           additionalProperties: false
           properties:
@@ -214,6 +322,14 @@ components:
           type: string
         sampling_params:
           $ref: '#/components/schemas/SamplingParams'
+        tool_choice:
+          $ref: '#/components/schemas/ToolChoice'
+        tool_prompt_format:
+          $ref: '#/components/schemas/ToolPromptFormat'
+        tools:
+          items:
+            $ref: '#/components/schemas/ToolDefinition'
+          type: array
       required:
       - model
       - messages_batch
@@ -235,11 +351,8 @@ components:
           items:
             oneOf:
             - type: string
-            - $ref: '#/components/schemas/Attachment'
             - items:
-                oneOf:
-                - type: string
-                - $ref: '#/components/schemas/Attachment'
+                type: string
               type: array
           type: array
         logprobs:
@@ -266,6 +379,25 @@ components:
       required:
       - completion_message_batch
       type: object
+    BraveSearchToolDefinition:
+      additionalProperties: false
+      properties:
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        remote_execution:
+          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        type:
+          const: brave_search
+          type: string
+      required:
+      - type
+      type: object
     BuiltinShield:
       enum:
       - llama_guard
@@ -284,10 +416,6 @@ components:
     ChatCompletionRequest:
       additionalProperties: false
       properties:
-        available_tools:
-          items:
-            $ref: '#/components/schemas/ToolDefinition'
-          type: array
         logprobs:
           additionalProperties: false
           properties:
@@ -308,6 +436,14 @@ components:
           $ref: '#/components/schemas/SamplingParams'
         stream:
           type: boolean
+        tool_choice:
+          $ref: '#/components/schemas/ToolChoice'
+        tool_prompt_format:
+          $ref: '#/components/schemas/ToolPromptFormat'
+        tools:
+          items:
+            $ref: '#/components/schemas/ToolDefinition'
+          type: array
       required:
       - model
       - messages
@@ -349,17 +485,36 @@ components:
       type: object
     Checkpoint:
       description: Checkpoint created during training runs
+    CodeInterpreterToolDefinition:
+      additionalProperties: false
+      properties:
+        enable_inline_code_execution:
+          type: boolean
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        remote_execution:
+          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        type:
+          const: code_interpreter
+          type: string
+      required:
+      - type
+      - enable_inline_code_execution
+      type: object
     CompletionMessage:
       additionalProperties: false
       properties:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
             type: array
         role:
           const: assistant
@@ -382,11 +537,8 @@ components:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
             type: array
         logprobs:
           additionalProperties: false
@@ -449,6 +601,56 @@ components:
       required:
       - name
       type: object
+    CreateMemoryBankRequest:
+      additionalProperties: false
+      properties:
+        config:
+          oneOf:
+          - additionalProperties: false
+            properties:
+              chunk_size_in_tokens:
+                type: integer
+              embedding_model:
+                type: string
+              overlap_size_in_tokens:
+                type: integer
+              type:
+                const: vector
+                type: string
+            required:
+            - type
+            - embedding_model
+            - chunk_size_in_tokens
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: keyvalue
+                type: string
+            required:
+            - type
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: keyword
+                type: string
+            required:
+            - type
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: graph
+                type: string
+            required:
+            - type
+            type: object
+        url:
+          $ref: '#/components/schemas/URL'
+      required:
+      - config
+      type: object
     CreateRunRequest:
       additionalProperties: false
       properties:
@@ -529,6 +731,18 @@ components:
       - rank
       - alpha
       type: object
+    EmbeddingsResponse:
+      additionalProperties: false
+      properties:
+        embeddings:
+          items:
+            items:
+              type: number
+            type: array
+          type: array
+      required:
+      - embeddings
+      type: object
     EvaluateQuestionAnsweringRequest:
       additionalProperties: false
       properties:
@@ -688,6 +902,36 @@ components:
       - qlora
       - dora
       type: string
+    FunctionCallToolDefinition:
+      additionalProperties: false
+      properties:
+        description:
+          type: string
+        function_name:
+          type: string
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        parameters:
+          additionalProperties:
+            $ref: '#/components/schemas/ToolParamDefinition'
+          type: object
+        remote_execution:
+          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        type:
+          const: function_call
+          type: string
+      required:
+      - type
+      - function_name
+      - description
+      - parameters
+      type: object
     InferenceStep:
       additionalProperties: false
       properties:
@@ -806,20 +1050,69 @@ components:
     MemoryBank:
       additionalProperties: false
       properties:
-        memory_bank_id:
+        bank_id:
           type: string
-        memory_bank_name:
+        config:
+          oneOf:
+          - additionalProperties: false
+            properties:
+              chunk_size_in_tokens:
+                type: integer
+              embedding_model:
+                type: string
+              overlap_size_in_tokens:
+                type: integer
+              type:
+                const: vector
+                type: string
+            required:
+            - type
+            - embedding_model
+            - chunk_size_in_tokens
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: keyvalue
+                type: string
+            required:
+            - type
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: keyword
+                type: string
+            required:
+            - type
+            type: object
+          - additionalProperties: false
+            properties:
+              type:
+                const: graph
+                type: string
+            required:
+            - type
+            type: object
+        name:
           type: string
+        url:
+          $ref: '#/components/schemas/URL'
       required:
-      - memory_bank_id
-      - memory_bank_name
+      - bank_id
+      - name
+      - config
       type: object
     MemoryBankDocument:
       additionalProperties: false
       properties:
         content:
-          contentEncoding: base64
-          type: string
+          oneOf:
+          - type: string
+          - items:
+              type: string
+            type: array
+          - $ref: '#/components/schemas/URL'
         document_id:
           type: string
         metadata:
@@ -837,8 +1130,8 @@ components:
       required:
       - document_id
       - content
-      - metadata
       - mime_type
+      - metadata
       type: object
     MemoryRetrievalStep:
       additionalProperties: false
@@ -846,18 +1139,16 @@ components:
         completed_at:
           format: date-time
           type: string
-        documents:
-          items:
-            $ref: '#/components/schemas/MemoryBankDocument'
-          type: array
+        inserted_context:
+          oneOf:
+          - type: string
+          - items:
+              type: string
+            type: array
         memory_bank_ids:
           items:
             type: string
           type: array
-        scores:
-          items:
-            type: number
-          type: array
         started_at:
           format: date-time
           type: string
@@ -873,8 +1164,89 @@ components:
       - step_id
       - step_type
       - memory_bank_ids
-      - documents
-      - scores
+      - inserted_context
+      type: object
+    MemoryToolDefinition:
+      additionalProperties: false
+      properties:
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        max_chunks:
+          type: integer
+        max_tokens_in_context:
+          type: integer
+        memory_bank_configs:
+          items:
+            oneOf:
+            - additionalProperties: false
+              properties:
+                bank_id:
+                  type: string
+                type:
+                  const: vector
+                  type: string
+              required:
+              - bank_id
+              - type
+              type: object
+            - additionalProperties: false
+              properties:
+                bank_id:
+                  type: string
+                keys:
+                  items:
+                    type: string
+                  type: array
+                type:
+                  const: keyvalue
+                  type: string
+              required:
+              - bank_id
+              - type
+              - keys
+              type: object
+            - additionalProperties: false
+              properties:
+                bank_id:
+                  type: string
+                type:
+                  const: keyword
+                  type: string
+              required:
+              - bank_id
+              - type
+              type: object
+            - additionalProperties: false
+              properties:
+                bank_id:
+                  type: string
+                entities:
+                  items:
+                    type: string
+                  type: array
+                type:
+                  const: graph
+                  type: string
+              required:
+              - bank_id
+              - type
+              - entities
+              type: object
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        type:
+          const: memory
+          type: string
+      required:
+      - type
+      - memory_bank_configs
+      - max_tokens_in_context
+      - max_chunks
       type: object
     Metric:
       additionalProperties: false
@@ -925,6 +1297,25 @@ components:
       - lr_min
       - weight_decay
       type: object
+    PhotogenToolDefinition:
+      additionalProperties: false
+      properties:
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        remote_execution:
+          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        type:
+          const: photogen
+          type: string
+      required:
+      - type
+      type: object
     PostTrainingJob:
       additionalProperties: false
       properties:
@@ -1133,6 +1524,59 @@ components:
       - rank
       - alpha
       type: object
+    QueryDocumentsRequest:
+      additionalProperties: false
+      properties:
+        params:
+          additionalProperties:
+            oneOf:
+            - type: 'null'
+            - type: boolean
+            - type: number
+            - type: string
+            - type: array
+            - type: object
+          type: object
+        query:
+          oneOf:
+          - type: string
+          - items:
+              type: string
+            type: array
+      required:
+      - query
+      type: object
+    QueryDocumentsResponse:
+      additionalProperties: false
+      properties:
+        chunks:
+          items:
+            additionalProperties: false
+            properties:
+              content:
+                oneOf:
+                - type: string
+                - items:
+                    type: string
+                  type: array
+              document_id:
+                type: string
+              token_count:
+                type: integer
+            required:
+            - content
+            - token_count
+            - document_id
+            type: object
+          type: array
+        scores:
+          items:
+            type: number
+          type: array
+      required:
+      - chunks
+      - scores
+      type: object
     RLHFAlgorithm:
       enum:
       - dpo
@@ -1287,6 +1731,8 @@ components:
     Session:
       additionalProperties: false
       properties:
+        memory_bank:
+          $ref: '#/components/schemas/MemoryBank'
         session_id:
           type: string
         session_name:
@@ -1430,11 +1876,8 @@ components:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
             type: array
         role:
           const: system
@@ -1512,6 +1955,11 @@ components:
       - failure
       - success
       type: string
+    ToolChoice:
+      enum:
+      - auto
+      - required
+      type: string
     ToolDefinition:
       additionalProperties: false
       properties:
@@ -1579,11 +2027,12 @@ components:
         : {...}\n        }\n    }\n\n`function_tag` --\n    This is an example of\
         \ how you could define\n    your own user defined format for making tool calls.\n\
         \    The function_tag format looks like this,\n    <function=function_name>(parameters)</function>\n\
-        \nThe detailed prompts for each of these formats are defined in `system_prompt.py`"
+        \nThe detailed prompts for each of these formats are added to llama cli"
       enum:
       - json
       - function_tag
-      title: This Enum refers to the prompt format for calling zero shot tools
+      title: This Enum refers to the prompt format for calling custom / zero shot
+        tools
       type: string
     ToolResponse:
       additionalProperties: false
@@ -1593,11 +2042,8 @@ components:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
             type: array
         tool_name:
           oneOf:
@@ -1616,11 +2062,8 @@ components:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
             type: array
         role:
           const: ipython
@@ -1705,6 +2148,10 @@ components:
             - $ref: '#/components/schemas/UserMessage'
             - $ref: '#/components/schemas/ToolResponseMessage'
           type: array
+        output_attachments:
+          items:
+            $ref: '#/components/schemas/Attachment'
+          type: array
         output_message:
           $ref: '#/components/schemas/CompletionMessage'
         session_id:
@@ -1728,6 +2175,7 @@ components:
       - input_messages
       - steps
       - output_message
+      - output_attachments
       - started_at
       title: A single turn in an interaction with an Agentic System.
       type: object
@@ -1812,11 +2260,14 @@ components:
         content:
           oneOf:
           - type: string
-          - $ref: '#/components/schemas/Attachment'
           - items:
-              oneOf:
-              - type: string
-              - $ref: '#/components/schemas/Attachment'
+              type: string
+            type: array
+        context:
+          oneOf:
+          - type: string
+          - items:
+              type: string
             type: array
         role:
           const: user
@@ -1825,11 +2276,30 @@ components:
       - role
       - content
       type: object
+    WolframAlphaToolDefinition:
+      additionalProperties: false
+      properties:
+        input_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        output_shields:
+          items:
+            $ref: '#/components/schemas/ShieldDefinition'
+          type: array
+        remote_execution:
+          $ref: '#/components/schemas/RestAPIExecutionConfig'
+        type:
+          const: wolfram_alpha
+          type: string
+      required:
+      - type
+      type: object
 info:
   description: "This is the specification of the llama stack that provides\n     \
     \           a set of endpoints and their corresponding interfaces that are tailored\
     \ to\n                best leverage Llama Models. The specification is still in\
-    \ draft and subject to change.\n                Generated at 2024-08-21 14:16:38.313950"
+    \ draft and subject to change.\n                Generated at 2024-09-03 21:42:33.579455"
   title: '[DRAFT] Llama Stack Specification'
   version: 0.0.1
 jsonSchemaDialect: https://json-schema.org/draft/2020-12/schema
@@ -1842,7 +2312,7 @@ paths:
         content:
           application/json:
             schema:
-              $ref: '#/components/schemas/AgenticSystemCreateRequest'
+              $ref: '#/components/schemas/AgentConfig'
         required: true
       responses:
         '200':
@@ -1866,67 +2336,19 @@ paths:
           description: OK
       tags:
       - AgenticSystem
-  /agentic_system/memory_bank/attach:
-    post:
-      parameters:
-      - in: query
-        name: agent_id
-        required: true
-        schema:
-          type: string
-      - in: query
-        name: session_id
-        required: true
-        schema:
-          type: string
-      requestBody:
-        content:
-          application/json:
-            schema:
-              items:
-                type: string
-              type: array
-        required: true
-      responses:
-        '200':
-          description: OK
-      tags:
-      - AgenticSystem
-  /agentic_system/memory_bank/detach:
-    post:
-      parameters:
-      - in: query
-        name: agent_id
-        required: true
-        schema:
-          type: string
-      - in: query
-        name: session_id
-        required: true
-        schema:
-          type: string
-      requestBody:
-        content:
-          application/json:
-            schema:
-              items:
-                type: string
-              type: array
-        required: true
-      responses:
-        '200':
-          description: OK
-      tags:
-      - AgenticSystem
   /agentic_system/session/create:
-    post:
-      parameters: []
-      requestBody:
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/AgenticSystemSessionCreateRequest'
+    get:
+      parameters:
+      - in: query
+        name: agent_id
         required: true
+        schema:
+          type: string
+      - in: query
+        name: session_name
+        required: true
+        schema:
+          type: string
       responses:
         '200':
           content:
@@ -2025,7 +2447,7 @@ paths:
       responses:
         '200':
           content:
-            application/json:
+            text/event-stream:
               schema:
                 $ref: '#/components/schemas/AgenticSystemTurnResponseStreamChunk'
           description: OK
@@ -2070,6 +2492,42 @@ paths:
           description: OK
       tags:
       - Observability
+  /batch_inference/chat_completion:
+    post:
+      parameters: []
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/BatchChatCompletionRequest'
+        required: true
+      responses:
+        '200':
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/BatchChatCompletionResponse'
+          description: OK
+      tags:
+      - BatchInference
+  /batch_inference/completion:
+    post:
+      parameters: []
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/BatchCompletionRequest'
+        required: true
+      responses:
+        '200':
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/BatchCompletionResponse'
+          description: OK
+      tags:
+      - BatchInference
   /datasets/create:
     post:
       parameters: []
@@ -2362,42 +2820,6 @@ paths:
           description: OK
       tags:
       - Observability
-  /inference/batch_chat_completion:
-    post:
-      parameters: []
-      requestBody:
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/BatchChatCompletionRequest'
-        required: true
-      responses:
-        '200':
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/BatchChatCompletionResponse'
-          description: OK
-      tags:
-      - Inference
-  /inference/batch_completion:
-    post:
-      parameters: []
-      requestBody:
-        content:
-          application/json:
-            schema:
-              $ref: '#/components/schemas/BatchCompletionRequest'
-        required: true
-      responses:
-        '200':
-          content:
-            application/json:
-              schema:
-                $ref: '#/components/schemas/BatchCompletionResponse'
-          description: OK
-      tags:
-      - Inference
   /inference/chat_completion:
     post:
       parameters: []
@@ -2434,6 +2856,35 @@ paths:
           description: streamed completion response.
       tags:
       - Inference
+  /inference/embeddings:
+    post:
+      parameters:
+      - in: query
+        name: model
+        required: true
+        schema:
+          type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              items:
+                oneOf:
+                - type: string
+                - items:
+                    type: string
+                  type: array
+              type: array
+        required: true
+      responses:
+        '200':
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/EmbeddingsResponse'
+          description: OK
+      tags:
+      - Inference
   /logging/get_logs:
     post:
       parameters: []
@@ -2466,7 +2917,7 @@ paths:
           description: OK
       tags:
       - Observability
-  /memory_bank/delete:
+  /memory_bank/documents/delete:
     post:
       parameters:
       - in: query
@@ -2484,14 +2935,10 @@ paths:
         required: true
       responses:
         '200':
-          content:
-            application/jsonl:
-              schema:
-                type: string
           description: OK
       tags:
-      - MemoryBanks
-  /memory_bank/get:
+      - Memory
+  /memory_bank/documents/get:
     post:
       parameters:
       - in: query
@@ -2515,7 +2962,7 @@ paths:
                 $ref: '#/components/schemas/MemoryBankDocument'
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /memory_bank/insert:
     post:
       parameters:
@@ -2524,6 +2971,11 @@ paths:
         required: true
         schema:
           type: string
+      - in: query
+        name: ttl_seconds
+        required: false
+        schema:
+          type: integer
       requestBody:
         content:
           application/json:
@@ -2536,7 +2988,30 @@ paths:
         '200':
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
+  /memory_bank/query:
+    post:
+      parameters:
+      - in: query
+        name: bank_id
+        required: true
+        schema:
+          type: string
+      requestBody:
+        content:
+          application/json:
+            schema:
+              $ref: '#/components/schemas/QueryDocumentsRequest'
+        required: true
+      responses:
+        '200':
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/QueryDocumentsResponse'
+          description: OK
+      tags:
+      - Memory
   /memory_bank/update:
     post:
       parameters:
@@ -2557,17 +3032,12 @@ paths:
         '200':
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /memory_banks/create:
     post:
       parameters:
       - in: query
-        name: bank_id
-        required: true
-        schema:
-          type: string
-      - in: query
-        name: bank_name
+        name: name
         required: true
         schema:
           type: string
@@ -2575,15 +3045,17 @@ paths:
         content:
           application/json:
             schema:
-              items:
-                $ref: '#/components/schemas/MemoryBankDocument'
-              type: array
+              $ref: '#/components/schemas/CreateMemoryBankRequest'
         required: true
       responses:
         '200':
+          content:
+            application/json:
+              schema:
+                $ref: '#/components/schemas/MemoryBank'
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /memory_banks/drop:
     delete:
       parameters:
@@ -2600,7 +3072,7 @@ paths:
                 type: string
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /memory_banks/get:
     get:
       parameters:
@@ -2612,12 +3084,14 @@ paths:
       responses:
         '200':
           content:
-            application/jsonl:
+            application/json:
               schema:
-                $ref: '#/components/schemas/MemoryBank'
+                oneOf:
+                - $ref: '#/components/schemas/MemoryBank'
+                - type: 'null'
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /memory_banks/list:
     get:
       parameters: []
@@ -2629,7 +3103,7 @@ paths:
                 $ref: '#/components/schemas/MemoryBank'
           description: OK
       tags:
-      - MemoryBanks
+      - Memory
   /post_training/job/artifacts:
     get:
       parameters:
@@ -2832,17 +3306,16 @@ security:
 servers:
 - url: http://any-hosted-llama-stack.com
 tags:
-- name: RewardScoring
-- name: Datasets
-- name: Observability
 - name: AgenticSystem
-- name: Inference
-- name: Evaluations
+- name: Memory
 - name: SyntheticDataGeneration
+- name: Inference
+- name: Observability
+- name: BatchInference
+- name: Datasets
+- name: RewardScoring
 - name: PostTraining
-- name: MemoryBanks
-- description: <SchemaDefinition schemaRef="#/components/schemas/Attachment" />
-  name: Attachment
+- name: Evaluations
 - description: <SchemaDefinition schemaRef="#/components/schemas/BatchChatCompletionRequest"
     />
   name: BatchChatCompletionRequest
@@ -2862,16 +3335,27 @@ tags:
   name: SystemMessage
 - description: <SchemaDefinition schemaRef="#/components/schemas/ToolCall" />
   name: ToolCall
+- description: <SchemaDefinition schemaRef="#/components/schemas/ToolChoice" />
+  name: ToolChoice
 - description: <SchemaDefinition schemaRef="#/components/schemas/ToolDefinition" />
   name: ToolDefinition
 - description: <SchemaDefinition schemaRef="#/components/schemas/ToolParamDefinition"
     />
   name: ToolParamDefinition
+- description: "This Enum refers to the prompt format for calling custom / zero shot\
+    \ tools\n\n`json` --\n    Refers to the json format for calling tools.\n    The\
+    \ json format takes the form like\n    {\n        \"type\": \"function\",\n  \
+    \      \"function\" : {\n            \"name\": \"function_name\",\n          \
+    \  \"description\": \"function_description\",\n            \"parameters\": {...}\n\
+    \        }\n    }\n\n`function_tag` --\n    This is an example of how you could\
+    \ define\n    your own user defined format for making tool calls.\n    The function_tag\
+    \ format looks like this,\n    <function=function_name>(parameters)</function>\n\
+    \nThe detailed prompts for each of these formats are added to llama cli\n\n<SchemaDefinition\
+    \ schemaRef=\"#/components/schemas/ToolPromptFormat\" />"
+  name: ToolPromptFormat
 - description: <SchemaDefinition schemaRef="#/components/schemas/ToolResponseMessage"
     />
   name: ToolResponseMessage
-- description: <SchemaDefinition schemaRef="#/components/schemas/URL" />
-  name: URL
 - description: <SchemaDefinition schemaRef="#/components/schemas/UserMessage" />
   name: UserMessage
 - description: <SchemaDefinition schemaRef="#/components/schemas/BatchChatCompletionResponse"
@@ -2917,20 +3401,28 @@ tags:
     <SchemaDefinition schemaRef="#/components/schemas/CompletionResponseStreamChunk"
     />'
   name: CompletionResponseStreamChunk
-- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemCreateRequest"
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgentConfig" />
+  name: AgentConfig
+- description: <SchemaDefinition schemaRef="#/components/schemas/BraveSearchToolDefinition"
     />
-  name: AgenticSystemCreateRequest
-- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemInstanceConfig"
-    />
-  name: AgenticSystemInstanceConfig
-- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemToolDefinition"
-    />
-  name: AgenticSystemToolDefinition
+  name: BraveSearchToolDefinition
 - description: <SchemaDefinition schemaRef="#/components/schemas/BuiltinShield" />
   name: BuiltinShield
+- description: <SchemaDefinition schemaRef="#/components/schemas/CodeInterpreterToolDefinition"
+    />
+  name: CodeInterpreterToolDefinition
+- description: <SchemaDefinition schemaRef="#/components/schemas/FunctionCallToolDefinition"
+    />
+  name: FunctionCallToolDefinition
+- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryToolDefinition"
+    />
+  name: MemoryToolDefinition
 - description: <SchemaDefinition schemaRef="#/components/schemas/OnViolationAction"
     />
   name: OnViolationAction
+- description: <SchemaDefinition schemaRef="#/components/schemas/PhotogenToolDefinition"
+    />
+  name: PhotogenToolDefinition
 - description: <SchemaDefinition schemaRef="#/components/schemas/RestAPIExecutionConfig"
     />
   name: RestAPIExecutionConfig
@@ -2939,35 +3431,65 @@ tags:
 - description: <SchemaDefinition schemaRef="#/components/schemas/ShieldDefinition"
     />
   name: ShieldDefinition
-- description: "This Enum refers to the prompt format for calling zero shot tools\n\
-    \n`json` --\n    Refers to the json format for calling tools.\n    The json format\
-    \ takes the form like\n    {\n        \"type\": \"function\",\n        \"function\"\
-    \ : {\n            \"name\": \"function_name\",\n            \"description\":\
-    \ \"function_description\",\n            \"parameters\": {...}\n        }\n  \
-    \  }\n\n`function_tag` --\n    This is an example of how you could define\n  \
-    \  your own user defined format for making tool calls.\n    The function_tag format\
-    \ looks like this,\n    <function=function_name>(parameters)</function>\n\nThe\
-    \ detailed prompts for each of these formats are defined in `system_prompt.py`\n\
-    \n<SchemaDefinition schemaRef=\"#/components/schemas/ToolPromptFormat\" />"
-  name: ToolPromptFormat
+- description: <SchemaDefinition schemaRef="#/components/schemas/URL" />
+  name: URL
+- description: <SchemaDefinition schemaRef="#/components/schemas/WolframAlphaToolDefinition"
+    />
+  name: WolframAlphaToolDefinition
 - description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemCreateResponse"
     />
   name: AgenticSystemCreateResponse
-- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemSessionCreateRequest"
-    />
-  name: AgenticSystemSessionCreateRequest
 - description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemSessionCreateResponse"
     />
   name: AgenticSystemSessionCreateResponse
 - description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnCreateRequest"
     />
   name: AgenticSystemTurnCreateRequest
-- description: 'Server side event (SSE) stream of these events
+- description: <SchemaDefinition schemaRef="#/components/schemas/Attachment" />
+  name: Attachment
+- description: 'Streamed agent execution response.
 
 
-    <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseStreamChunk"
+    <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseEvent"
     />'
+  name: AgenticSystemTurnResponseEvent
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseStepCompletePayload"
+    />
+  name: AgenticSystemTurnResponseStepCompletePayload
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseStepProgressPayload"
+    />
+  name: AgenticSystemTurnResponseStepProgressPayload
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseStepStartPayload"
+    />
+  name: AgenticSystemTurnResponseStepStartPayload
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseStreamChunk"
+    />
   name: AgenticSystemTurnResponseStreamChunk
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseTurnCompletePayload"
+    />
+  name: AgenticSystemTurnResponseTurnCompletePayload
+- description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemTurnResponseTurnStartPayload"
+    />
+  name: AgenticSystemTurnResponseTurnStartPayload
+- description: <SchemaDefinition schemaRef="#/components/schemas/InferenceStep" />
+  name: InferenceStep
+- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryRetrievalStep"
+    />
+  name: MemoryRetrievalStep
+- description: <SchemaDefinition schemaRef="#/components/schemas/ShieldCallStep" />
+  name: ShieldCallStep
+- description: <SchemaDefinition schemaRef="#/components/schemas/ShieldResponse" />
+  name: ShieldResponse
+- description: <SchemaDefinition schemaRef="#/components/schemas/ToolExecutionStep"
+    />
+  name: ToolExecutionStep
+- description: <SchemaDefinition schemaRef="#/components/schemas/ToolResponse" />
+  name: ToolResponse
+- description: 'A single turn in an interaction with an Agentic System.
+
+
+    <SchemaDefinition schemaRef="#/components/schemas/Turn" />'
+  name: Turn
 - description: 'Request to create a dataset.
 
 
@@ -2989,14 +3511,19 @@ tags:
 - description: <SchemaDefinition schemaRef="#/components/schemas/ExperimentStatus"
     />
   name: ExperimentStatus
-- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryBankDocument"
+- description: <SchemaDefinition schemaRef="#/components/schemas/CreateMemoryBankRequest"
     />
-  name: MemoryBankDocument
+  name: CreateMemoryBankRequest
+- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryBank" />
+  name: MemoryBank
 - description: <SchemaDefinition schemaRef="#/components/schemas/CreateRunRequest"
     />
   name: CreateRunRequest
 - description: <SchemaDefinition schemaRef="#/components/schemas/Run" />
   name: Run
+- description: <SchemaDefinition schemaRef="#/components/schemas/EmbeddingsResponse"
+    />
+  name: EmbeddingsResponse
 - description: 'Checkpoint created during training runs
 
 
@@ -3022,30 +3549,11 @@ tags:
     <SchemaDefinition schemaRef="#/components/schemas/EvaluateTextGenerationRequest"
     />'
   name: EvaluateTextGenerationRequest
-- description: <SchemaDefinition schemaRef="#/components/schemas/InferenceStep" />
-  name: InferenceStep
-- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryRetrievalStep"
-    />
-  name: MemoryRetrievalStep
 - description: 'A single session of an interaction with an Agentic System.
 
 
     <SchemaDefinition schemaRef="#/components/schemas/Session" />'
   name: Session
-- description: <SchemaDefinition schemaRef="#/components/schemas/ShieldCallStep" />
-  name: ShieldCallStep
-- description: <SchemaDefinition schemaRef="#/components/schemas/ShieldResponse" />
-  name: ShieldResponse
-- description: <SchemaDefinition schemaRef="#/components/schemas/ToolExecutionStep"
-    />
-  name: ToolExecutionStep
-- description: <SchemaDefinition schemaRef="#/components/schemas/ToolResponse" />
-  name: ToolResponse
-- description: 'A single turn in an interaction with an Agentic System.
-
-
-    <SchemaDefinition schemaRef="#/components/schemas/Turn" />'
-  name: Turn
 - description: <SchemaDefinition schemaRef="#/components/schemas/AgenticSystemStepResponse"
     />
   name: AgenticSystemStepResponse
@@ -3053,6 +3561,9 @@ tags:
   name: Artifact
 - description: <SchemaDefinition schemaRef="#/components/schemas/ArtifactType" />
   name: ArtifactType
+- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryBankDocument"
+    />
+  name: MemoryBankDocument
 - description: 'Artifacts of a evaluation job.
 
 
@@ -3070,8 +3581,6 @@ tags:
   name: LogSearchRequest
 - description: <SchemaDefinition schemaRef="#/components/schemas/Log" />
   name: Log
-- description: <SchemaDefinition schemaRef="#/components/schemas/MemoryBank" />
-  name: MemoryBank
 - description: <SchemaDefinition schemaRef="#/components/schemas/Metric" />
   name: Metric
 - description: 'Artifacts of a finetuning job.
@@ -3118,6 +3627,12 @@ tags:
   name: RLHFAlgorithm
 - description: <SchemaDefinition schemaRef="#/components/schemas/TrainingConfig" />
   name: TrainingConfig
+- description: <SchemaDefinition schemaRef="#/components/schemas/QueryDocumentsRequest"
+    />
+  name: QueryDocumentsRequest
+- description: <SchemaDefinition schemaRef="#/components/schemas/QueryDocumentsResponse"
+    />
+  name: QueryDocumentsResponse
 - description: <SchemaDefinition schemaRef="#/components/schemas/DialogGenerations"
     />
   name: DialogGenerations
@@ -3182,25 +3697,29 @@ x-tagGroups:
 - name: Operations
   tags:
   - AgenticSystem
+  - BatchInference
   - Datasets
   - Evaluations
   - Inference
-  - MemoryBanks
+  - Memory
   - Observability
   - PostTraining
   - RewardScoring
   - SyntheticDataGeneration
 - name: Types
   tags:
-  - AgenticSystemCreateRequest
+  - AgentConfig
   - AgenticSystemCreateResponse
-  - AgenticSystemInstanceConfig
-  - AgenticSystemSessionCreateRequest
   - AgenticSystemSessionCreateResponse
   - AgenticSystemStepResponse
-  - AgenticSystemToolDefinition
   - AgenticSystemTurnCreateRequest
+  - AgenticSystemTurnResponseEvent
+  - AgenticSystemTurnResponseStepCompletePayload
+  - AgenticSystemTurnResponseStepProgressPayload
+  - AgenticSystemTurnResponseStepStartPayload
   - AgenticSystemTurnResponseStreamChunk
+  - AgenticSystemTurnResponseTurnCompletePayload
+  - AgenticSystemTurnResponseTurnStartPayload
   - Artifact
   - ArtifactType
   - Attachment
@@ -3208,6 +3727,7 @@ x-tagGroups:
   - BatchChatCompletionResponse
   - BatchCompletionRequest
   - BatchCompletionResponse
+  - BraveSearchToolDefinition
   - BuiltinShield
   - BuiltinTool
   - ChatCompletionRequest
@@ -3215,15 +3735,18 @@ x-tagGroups:
   - ChatCompletionResponseEventType
   - ChatCompletionResponseStreamChunk
   - Checkpoint
+  - CodeInterpreterToolDefinition
   - CompletionMessage
   - CompletionRequest
   - CompletionResponseStreamChunk
   - CreateDatasetRequest
   - CreateExperimentRequest
+  - CreateMemoryBankRequest
   - CreateRunRequest
   - DPOAlignmentConfig
   - DialogGenerations
   - DoraFinetuningConfig
+  - EmbeddingsResponse
   - EvaluateQuestionAnsweringRequest
   - EvaluateSummarizationRequest
   - EvaluateTextGenerationRequest
@@ -3234,6 +3757,7 @@ x-tagGroups:
   - Experiment
   - ExperimentStatus
   - FinetuningAlgorithm
+  - FunctionCallToolDefinition
   - InferenceStep
   - Log
   - LogMessagesRequest
@@ -3243,9 +3767,11 @@ x-tagGroups:
   - MemoryBank
   - MemoryBankDocument
   - MemoryRetrievalStep
+  - MemoryToolDefinition
   - Metric
   - OnViolationAction
   - OptimizerConfig
+  - PhotogenToolDefinition
   - PostTrainingJob
   - PostTrainingJobArtifactsResponse
   - PostTrainingJobLogStream
@@ -3254,6 +3780,8 @@ x-tagGroups:
   - PostTrainingRLHFRequest
   - PostTrainingSFTRequest
   - QLoraFinetuningConfig
+  - QueryDocumentsRequest
+  - QueryDocumentsResponse
   - RLHFAlgorithm
   - RestAPIExecutionConfig
   - RestAPIMethod
@@ -3276,6 +3804,7 @@ x-tagGroups:
   - ToolCall
   - ToolCallDelta
   - ToolCallParseStatus
+  - ToolChoice
   - ToolDefinition
   - ToolExecutionStep
   - ToolParamDefinition
@@ -3291,3 +3820,4 @@ x-tagGroups:
   - UpdateRunRequest
   - UploadArtifactRequest
   - UserMessage
+  - WolframAlphaToolDefinition
diff --git a/rfcs/openapi_generator/generate.py b/rfcs/openapi_generator/generate.py
index 4b13904de..ab9774e70 100644
--- a/rfcs/openapi_generator/generate.py
+++ b/rfcs/openapi_generator/generate.py
@@ -10,81 +10,39 @@
 # This source code is licensed under the terms described found in the
 # LICENSE file in the root directory of this source tree.
 
-import inspect
-
 from datetime import datetime
 from pathlib import Path
-from typing import Callable, Iterator, List, Tuple
 
 import fire
 import yaml
 
 from llama_models import schema_utils
-from pyopenapi import Info, operations, Options, Server, Specification
 
-# We do a series of monkey-patching to ensure our definitions only use the minimal
+# We do some monkey-patching to ensure our definitions only use the minimal
 # (json_schema_type, webmethod) definitions from the llama_models package. For
 # generation though, we need the full definitions and implementations from the
-#  (python-openapi, json-strong-typing) packages.
+#  (json-strong-typing) package.
 
 from strong_typing.schema import json_schema_type
-from termcolor import colored
+
+from .pyopenapi.options import Options
+from .pyopenapi.specification import Info, Server
+from .pyopenapi.utility import Specification
 
 schema_utils.json_schema_type = json_schema_type
 
-
 from llama_toolchain.stack import LlamaStack
 
 
-STREAMING_ENDPOINTS = [
-    "/agentic_system/turn/create"
-]
-
-
-def patched_get_endpoint_functions(
-    endpoint: type, prefixes: List[str]
-) -> Iterator[Tuple[str, str, str, Callable]]:
-    if not inspect.isclass(endpoint):
-        raise ValueError(f"object is not a class type: {endpoint}")
-
-    functions = inspect.getmembers(endpoint, inspect.isfunction)
-    for func_name, func_ref in functions:
-        webmethod = getattr(func_ref, "__webmethod__", None)
-        if not webmethod:
-            continue
-
-        print(f"Processing {colored(func_name, 'white')}...")
-        operation_name = func_name
-        if operation_name.startswith("get_") or operation_name.endswith("/get"):
-            prefix = "get"
-        elif (
-            operation_name.startswith("delete_")
-            or operation_name.startswith("remove_")
-            or operation_name.endswith("/delete")
-            or operation_name.endswith("/remove")
-        ):
-            prefix = "delete"
-        else:
-            if webmethod.method == "GET":
-                prefix = "get"
-            elif webmethod.method == "DELETE":
-                prefix = "delete"
-            else:
-                # by default everything else is a POST
-                prefix = "post"
-
-        yield prefix, operation_name, func_name, func_ref
-
-
-# Patch this so all methods are correctly parsed with correct HTTP methods
-operations._get_endpoint_functions = patched_get_endpoint_functions
+# TODO: this should be fixed in the generator itself so it reads appropriate annotations
+STREAMING_ENDPOINTS = ["/agentic_system/turn/create"]
 
 
 def patch_sse_stream_responses(spec: Specification):
     for path, path_item in spec.document.paths.items():
         if path in STREAMING_ENDPOINTS:
-            content = path_item.post.responses['200'].content.pop('application/json')
-            path_item.post.responses['200'].content['text/event-stream'] = content
+            content = path_item.post.responses["200"].content.pop("application/json")
+            path_item.post.responses["200"].content["text/event-stream"] = content
 
 
 def main(output_dir: str):
diff --git a/rfcs/openapi_generator/pyopenapi/README.md b/rfcs/openapi_generator/pyopenapi/README.md
new file mode 100644
index 000000000..1b5fbce19
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/README.md
@@ -0,0 +1 @@
+This is forked from https://github.com/hunyadi/pyopenapi
diff --git a/rfcs/openapi_generator/pyopenapi/__init__.py b/rfcs/openapi_generator/pyopenapi/__init__.py
new file mode 100644
index 000000000..756f351d8
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
diff --git a/rfcs/openapi_generator/pyopenapi/generator.py b/rfcs/openapi_generator/pyopenapi/generator.py
new file mode 100644
index 000000000..576746e11
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/generator.py
@@ -0,0 +1,718 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import hashlib
+import ipaddress
+import typing
+from typing import Any, Dict, Set, Union
+
+from strong_typing.core import JsonType
+from strong_typing.docstring import Docstring, parse_type
+from strong_typing.inspection import (
+    is_generic_list,
+    is_type_optional,
+    is_type_union,
+    unwrap_generic_list,
+    unwrap_optional_type,
+    unwrap_union_types,
+)
+from strong_typing.name import python_type_to_name
+from strong_typing.schema import (
+    get_schema_identifier,
+    JsonSchemaGenerator,
+    register_schema,
+    Schema,
+    SchemaOptions,
+)
+from strong_typing.serialization import json_dump_string, object_to_json
+
+from .operations import (
+    EndpointOperation,
+    get_endpoint_events,
+    get_endpoint_operations,
+    HTTPMethod,
+)
+from .options import *
+from .specification import (
+    Components,
+    Document,
+    Example,
+    ExampleRef,
+    MediaType,
+    Operation,
+    Parameter,
+    ParameterLocation,
+    PathItem,
+    RequestBody,
+    Response,
+    ResponseRef,
+    SchemaOrRef,
+    SchemaRef,
+    Tag,
+    TagGroup,
+)
+
+register_schema(
+    ipaddress.IPv4Address,
+    schema={
+        "type": "string",
+        "format": "ipv4",
+        "title": "IPv4 address",
+        "description": "IPv4 address, according to dotted-quad ABNF syntax as defined in RFC 2673, section 3.2.",
+    },
+    examples=["192.0.2.0", "198.51.100.1", "203.0.113.255"],
+)
+
+register_schema(
+    ipaddress.IPv6Address,
+    schema={
+        "type": "string",
+        "format": "ipv6",
+        "title": "IPv6 address",
+        "description": "IPv6 address, as defined in RFC 2373, section 2.2.",
+    },
+    examples=[
+        "FEDC:BA98:7654:3210:FEDC:BA98:7654:3210",
+        "1080:0:0:0:8:800:200C:417A",
+        "1080::8:800:200C:417A",
+        "FF01::101",
+        "::1",
+    ],
+)
+
+
+def http_status_to_string(status_code: HTTPStatusCode) -> str:
+    "Converts an HTTP status code to a string."
+
+    if isinstance(status_code, HTTPStatus):
+        return str(status_code.value)
+    elif isinstance(status_code, int):
+        return str(status_code)
+    elif isinstance(status_code, str):
+        return status_code
+    else:
+        raise TypeError("expected: HTTP status code")
+
+
+class SchemaBuilder:
+    schema_generator: JsonSchemaGenerator
+    schemas: Dict[str, Schema]
+
+    def __init__(self, schema_generator: JsonSchemaGenerator) -> None:
+        self.schema_generator = schema_generator
+        self.schemas = {}
+
+    def classdef_to_schema(self, typ: type) -> Schema:
+        """
+        Converts a type to a JSON schema.
+        For nested types found in the type hierarchy, adds the type to the schema registry in the OpenAPI specification section `components`.
+        """
+
+        type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ)
+
+        # append schema to list of known schemas, to be used in OpenAPI's Components Object section
+        for ref, schema in type_definitions.items():
+            self._add_ref(ref, schema)
+
+        return type_schema
+
+    def classdef_to_named_schema(self, name: str, typ: type) -> Schema:
+        schema = self.classdef_to_schema(typ)
+        self._add_ref(name, schema)
+        return schema
+
+    def classdef_to_ref(self, typ: type) -> SchemaOrRef:
+        """
+        Converts a type to a JSON schema, and if possible, returns a schema reference.
+        For composite types (such as classes), adds the type to the schema registry in the OpenAPI specification section `components`.
+        """
+
+        type_schema = self.classdef_to_schema(typ)
+        if typ is str or typ is int or typ is float:
+            # represent simple types as themselves
+            return type_schema
+
+        type_name = get_schema_identifier(typ)
+        if type_name is not None:
+            return self._build_ref(type_name, type_schema)
+
+        try:
+            type_name = python_type_to_name(typ)
+            return self._build_ref(type_name, type_schema)
+        except TypeError:
+            pass
+
+        return type_schema
+
+    def _build_ref(self, type_name: str, type_schema: Schema) -> SchemaRef:
+        self._add_ref(type_name, type_schema)
+        return SchemaRef(type_name)
+
+    def _add_ref(self, type_name: str, type_schema: Schema) -> None:
+        if type_name not in self.schemas:
+            self.schemas[type_name] = type_schema
+
+
+class ContentBuilder:
+    schema_builder: SchemaBuilder
+    schema_transformer: Optional[Callable[[SchemaOrRef], SchemaOrRef]]
+    sample_transformer: Optional[Callable[[JsonType], JsonType]]
+
+    def __init__(
+        self,
+        schema_builder: SchemaBuilder,
+        schema_transformer: Optional[Callable[[SchemaOrRef], SchemaOrRef]] = None,
+        sample_transformer: Optional[Callable[[JsonType], JsonType]] = None,
+    ) -> None:
+        self.schema_builder = schema_builder
+        self.schema_transformer = schema_transformer
+        self.sample_transformer = sample_transformer
+
+    def build_content(
+        self, payload_type: type, examples: Optional[List[Any]] = None
+    ) -> Dict[str, MediaType]:
+        "Creates the content subtree for a request or response."
+
+        if is_generic_list(payload_type):
+            media_type = "application/jsonl"
+            item_type = unwrap_generic_list(payload_type)
+        else:
+            media_type = "application/json"
+            item_type = payload_type
+
+        return {media_type: self.build_media_type(item_type, examples)}
+
+    def build_media_type(
+        self, item_type: type, examples: Optional[List[Any]] = None
+    ) -> MediaType:
+        schema = self.schema_builder.classdef_to_ref(item_type)
+        if self.schema_transformer:
+            schema_transformer: Callable[[SchemaOrRef], SchemaOrRef] = self.schema_transformer  # type: ignore
+            schema = schema_transformer(schema)
+
+        if not examples:
+            return MediaType(schema=schema)
+
+        if len(examples) == 1:
+            return MediaType(schema=schema, example=self._build_example(examples[0]))
+
+        return MediaType(
+            schema=schema,
+            examples=self._build_examples(examples),
+        )
+
+    def _build_examples(
+        self, examples: List[Any]
+    ) -> Dict[str, Union[Example, ExampleRef]]:
+        "Creates a set of several examples for a media type."
+
+        if self.sample_transformer:
+            sample_transformer: Callable[[JsonType], JsonType] = self.sample_transformer  # type: ignore
+        else:
+            sample_transformer = lambda sample: sample
+
+        results: Dict[str, Union[Example, ExampleRef]] = {}
+        for example in examples:
+            value = sample_transformer(object_to_json(example))
+
+            hash_string = (
+                hashlib.md5(json_dump_string(value).encode("utf-8")).digest().hex()
+            )
+            name = f"ex-{hash_string}"
+
+            results[name] = Example(value=value)
+
+        return results
+
+    def _build_example(self, example: Any) -> Any:
+        "Creates a single example for a media type."
+
+        if self.sample_transformer:
+            sample_transformer: Callable[[JsonType], JsonType] = self.sample_transformer  # type: ignore
+        else:
+            sample_transformer = lambda sample: sample
+
+        return sample_transformer(object_to_json(example))
+
+
+@dataclass
+class ResponseOptions:
+    """
+    Configuration options for building a response for an operation.
+
+    :param type_descriptions: Maps each response type to a textual description (if available).
+    :param examples: A list of response examples.
+    :param status_catalog: Maps each response type to an HTTP status code.
+    :param default_status_code: HTTP status code assigned to responses that have no mapping.
+    """
+
+    type_descriptions: Dict[type, str]
+    examples: Optional[List[Any]]
+    status_catalog: Dict[type, HTTPStatusCode]
+    default_status_code: HTTPStatusCode
+
+
+@dataclass
+class StatusResponse:
+    status_code: str
+    types: List[type] = dataclasses.field(default_factory=list)
+    examples: List[Any] = dataclasses.field(default_factory=list)
+
+
+class ResponseBuilder:
+    content_builder: ContentBuilder
+
+    def __init__(self, content_builder: ContentBuilder) -> None:
+        self.content_builder = content_builder
+
+    def _get_status_responses(
+        self, options: ResponseOptions
+    ) -> Dict[str, StatusResponse]:
+        status_responses: Dict[str, StatusResponse] = {}
+
+        for response_type in options.type_descriptions.keys():
+            status_code = http_status_to_string(
+                options.status_catalog.get(response_type, options.default_status_code)
+            )
+
+            # look up response for status code
+            if status_code not in status_responses:
+                status_responses[status_code] = StatusResponse(status_code)
+            status_response = status_responses[status_code]
+
+            # append response types that are assigned the given status code
+            status_response.types.append(response_type)
+
+            # append examples that have the matching response type
+            if options.examples:
+                status_response.examples.extend(
+                    example
+                    for example in options.examples
+                    if isinstance(example, response_type)
+                )
+
+        return dict(sorted(status_responses.items()))
+
+    def build_response(
+        self, options: ResponseOptions
+    ) -> Dict[str, Union[Response, ResponseRef]]:
+        """
+        Groups responses that have the same status code.
+        """
+
+        responses: Dict[str, Union[Response, ResponseRef]] = {}
+        status_responses = self._get_status_responses(options)
+        for status_code, status_response in status_responses.items():
+            response_types = tuple(status_response.types)
+            if len(response_types) > 1:
+                composite_response_type: type = Union[response_types]  # type: ignore
+            else:
+                (response_type,) = response_types
+                composite_response_type = response_type
+
+            description = " **OR** ".join(
+                filter(
+                    None,
+                    (
+                        options.type_descriptions[response_type]
+                        for response_type in response_types
+                    ),
+                )
+            )
+
+            responses[status_code] = self._build_response(
+                response_type=composite_response_type,
+                description=description,
+                examples=status_response.examples or None,
+            )
+
+        return responses
+
+    def _build_response(
+        self,
+        response_type: type,
+        description: str,
+        examples: Optional[List[Any]] = None,
+    ) -> Response:
+        "Creates a response subtree."
+
+        if response_type is not None:
+            return Response(
+                description=description,
+                content=self.content_builder.build_content(response_type, examples),
+            )
+        else:
+            return Response(description=description)
+
+
+def schema_error_wrapper(schema: SchemaOrRef) -> Schema:
+    "Wraps an error output schema into a top-level error schema."
+
+    return {
+        "type": "object",
+        "properties": {
+            "error": schema,  # type: ignore
+        },
+        "additionalProperties": False,
+        "required": [
+            "error",
+        ],
+    }
+
+
+def sample_error_wrapper(error: JsonType) -> JsonType:
+    "Wraps an error output sample into a top-level error sample."
+
+    return {"error": error}
+
+
+class Generator:
+    endpoint: type
+    options: Options
+    schema_builder: SchemaBuilder
+    responses: Dict[str, Response]
+
+    def __init__(self, endpoint: type, options: Options) -> None:
+        self.endpoint = endpoint
+        self.options = options
+        schema_generator = JsonSchemaGenerator(
+            SchemaOptions(
+                definitions_path="#/components/schemas/",
+                use_examples=self.options.use_examples,
+                property_description_fun=options.property_description_fun,
+            )
+        )
+        self.schema_builder = SchemaBuilder(schema_generator)
+        self.responses = {}
+
+    def _build_type_tag(self, ref: str, schema: Schema) -> Tag:
+        definition = f'<SchemaDefinition schemaRef="#/components/schemas/{ref}" />'
+        title = typing.cast(str, schema.get("title"))
+        description = typing.cast(str, schema.get("description"))
+        return Tag(
+            name=ref,
+            description="\n\n".join(
+                s for s in (title, description, definition) if s is not None
+            ),
+        )
+
+    def _build_extra_tag_groups(
+        self, extra_types: Dict[str, List[type]]
+    ) -> Dict[str, List[Tag]]:
+        """
+        Creates a dictionary of tag group captions as keys, and tag lists as values.
+
+        :param extra_types: A dictionary of type categories and list of types in that category.
+        """
+
+        extra_tags: Dict[str, List[Tag]] = {}
+
+        for category_name, category_items in extra_types.items():
+            tag_list: List[Tag] = []
+
+            for extra_type in category_items:
+                name = python_type_to_name(extra_type)
+                schema = self.schema_builder.classdef_to_named_schema(name, extra_type)
+                tag_list.append(self._build_type_tag(name, schema))
+
+            if tag_list:
+                extra_tags[category_name] = tag_list
+
+        return extra_tags
+
+    def _build_operation(self, op: EndpointOperation) -> Operation:
+        doc_string = parse_type(op.func_ref)
+        doc_params = dict(
+            (param.name, param.description) for param in doc_string.params.values()
+        )
+
+        # parameters passed in URL component path
+        path_parameters = [
+            Parameter(
+                name=param_name,
+                in_=ParameterLocation.Path,
+                description=doc_params.get(param_name),
+                required=True,
+                schema=self.schema_builder.classdef_to_ref(param_type),
+            )
+            for param_name, param_type in op.path_params
+        ]
+
+        # parameters passed in URL component query string
+        query_parameters = []
+        for param_name, param_type in op.query_params:
+            if is_type_optional(param_type):
+                inner_type: type = unwrap_optional_type(param_type)
+                required = False
+            else:
+                inner_type = param_type
+                required = True
+
+            query_parameter = Parameter(
+                name=param_name,
+                in_=ParameterLocation.Query,
+                description=doc_params.get(param_name),
+                required=required,
+                schema=self.schema_builder.classdef_to_ref(inner_type),
+            )
+            query_parameters.append(query_parameter)
+
+        # parameters passed anywhere
+        parameters = path_parameters + query_parameters
+
+        # data passed in payload
+        if op.request_params:
+            builder = ContentBuilder(self.schema_builder)
+            if len(op.request_params) == 1:
+                request_name, request_type = op.request_params[0]
+            else:
+                from dataclasses import make_dataclass
+
+                op_name = "".join(word.capitalize() for word in op.name.split("_"))
+                request_name = f"{op_name}Request"
+                request_type = make_dataclass(request_name, op.request_params)
+
+            requestBody = RequestBody(
+                content={
+                    "application/json": builder.build_media_type(
+                        request_type, op.request_examples
+                    )
+                },
+                description=doc_params.get(request_name),
+                required=True,
+            )
+        else:
+            requestBody = None
+
+        # success response types
+        if doc_string.returns is None and is_type_union(op.response_type):
+            # split union of return types into a list of response types
+            success_type_docstring: Dict[type, Docstring] = {
+                typing.cast(type, item): parse_type(item)
+                for item in unwrap_union_types(op.response_type)
+            }
+            success_type_descriptions = {
+                item: doc_string.short_description
+                for item, doc_string in success_type_docstring.items()
+                if doc_string.short_description
+            }
+        else:
+            # use return type as a single response type
+            success_type_descriptions = {
+                op.response_type: (
+                    doc_string.returns.description if doc_string.returns else "OK"
+                )
+            }
+
+        response_examples = op.response_examples or []
+        success_examples = [
+            example
+            for example in response_examples
+            if not isinstance(example, Exception)
+        ]
+
+        content_builder = ContentBuilder(self.schema_builder)
+        response_builder = ResponseBuilder(content_builder)
+        response_options = ResponseOptions(
+            success_type_descriptions,
+            success_examples if self.options.use_examples else None,
+            self.options.success_responses,
+            "200",
+        )
+        responses = response_builder.build_response(response_options)
+
+        # failure response types
+        if doc_string.raises:
+            exception_types: Dict[type, str] = {
+                item.raise_type: item.description for item in doc_string.raises.values()
+            }
+            exception_examples = [
+                example
+                for example in response_examples
+                if isinstance(example, Exception)
+            ]
+
+            if self.options.error_wrapper:
+                schema_transformer = schema_error_wrapper
+                sample_transformer = sample_error_wrapper
+            else:
+                schema_transformer = None
+                sample_transformer = None
+
+            content_builder = ContentBuilder(
+                self.schema_builder,
+                schema_transformer=schema_transformer,
+                sample_transformer=sample_transformer,
+            )
+            response_builder = ResponseBuilder(content_builder)
+            response_options = ResponseOptions(
+                exception_types,
+                exception_examples if self.options.use_examples else None,
+                self.options.error_responses,
+                "500",
+            )
+            responses.update(response_builder.build_response(response_options))
+
+        if op.event_type is not None:
+            builder = ContentBuilder(self.schema_builder)
+            callbacks = {
+                f"{op.func_name}_callback": {
+                    "{$request.query.callback}": PathItem(
+                        post=Operation(
+                            requestBody=RequestBody(
+                                content=builder.build_content(op.event_type)
+                            ),
+                            responses={"200": Response(description="OK")},
+                        )
+                    )
+                }
+            }
+
+        else:
+            callbacks = None
+
+        return Operation(
+            tags=[op.defining_class.__name__],
+            summary=doc_string.short_description,
+            description=doc_string.long_description,
+            parameters=parameters,
+            requestBody=requestBody,
+            responses=responses,
+            callbacks=callbacks,
+            security=[] if op.public else None,
+        )
+
+    def generate(self) -> Document:
+        paths: Dict[str, PathItem] = {}
+        endpoint_classes: Set[type] = set()
+        for op in get_endpoint_operations(
+            self.endpoint, use_examples=self.options.use_examples
+        ):
+            endpoint_classes.add(op.defining_class)
+
+            operation = self._build_operation(op)
+
+            if op.http_method is HTTPMethod.GET:
+                pathItem = PathItem(get=operation)
+            elif op.http_method is HTTPMethod.PUT:
+                pathItem = PathItem(put=operation)
+            elif op.http_method is HTTPMethod.POST:
+                pathItem = PathItem(post=operation)
+            elif op.http_method is HTTPMethod.DELETE:
+                pathItem = PathItem(delete=operation)
+            elif op.http_method is HTTPMethod.PATCH:
+                pathItem = PathItem(patch=operation)
+            else:
+                raise NotImplementedError(f"unknown HTTP method: {op.http_method}")
+
+            route = op.get_route()
+            if route in paths:
+                paths[route].update(pathItem)
+            else:
+                paths[route] = pathItem
+
+        operation_tags: List[Tag] = []
+        for cls in endpoint_classes:
+            doc_string = parse_type(cls)
+            operation_tags.append(
+                Tag(
+                    name=cls.__name__,
+                    description=doc_string.long_description,
+                    displayName=doc_string.short_description,
+                )
+            )
+
+        # types that are produced/consumed by operations
+        type_tags = [
+            self._build_type_tag(ref, schema)
+            for ref, schema in self.schema_builder.schemas.items()
+        ]
+
+        # types that are emitted by events
+        event_tags: List[Tag] = []
+        events = get_endpoint_events(self.endpoint)
+        for ref, event_type in events.items():
+            event_schema = self.schema_builder.classdef_to_named_schema(ref, event_type)
+            event_tags.append(self._build_type_tag(ref, event_schema))
+
+        # types that are explicitly declared
+        extra_tag_groups: Dict[str, List[Tag]] = {}
+        if self.options.extra_types is not None:
+            if isinstance(self.options.extra_types, list):
+                extra_tag_groups = self._build_extra_tag_groups(
+                    {"AdditionalTypes": self.options.extra_types}
+                )
+            elif isinstance(self.options.extra_types, dict):
+                extra_tag_groups = self._build_extra_tag_groups(
+                    self.options.extra_types
+                )
+            else:
+                raise TypeError(
+                    f"type mismatch for collection of extra types: {type(self.options.extra_types)}"
+                )
+
+        # list all operations and types
+        tags: List[Tag] = []
+        tags.extend(operation_tags)
+        tags.extend(type_tags)
+        tags.extend(event_tags)
+        for extra_tag_group in extra_tag_groups.values():
+            tags.extend(extra_tag_group)
+
+        tag_groups = []
+        if operation_tags:
+            tag_groups.append(
+                TagGroup(
+                    name=self.options.map("Operations"),
+                    tags=sorted(tag.name for tag in operation_tags),
+                )
+            )
+        if type_tags:
+            tag_groups.append(
+                TagGroup(
+                    name=self.options.map("Types"),
+                    tags=sorted(tag.name for tag in type_tags),
+                )
+            )
+        if event_tags:
+            tag_groups.append(
+                TagGroup(
+                    name=self.options.map("Events"),
+                    tags=sorted(tag.name for tag in event_tags),
+                )
+            )
+        for caption, extra_tag_group in extra_tag_groups.items():
+            tag_groups.append(
+                TagGroup(
+                    name=self.options.map(caption),
+                    tags=sorted(tag.name for tag in extra_tag_group),
+                )
+            )
+
+        if self.options.default_security_scheme:
+            securitySchemes = {"Default": self.options.default_security_scheme}
+        else:
+            securitySchemes = None
+
+        return Document(
+            openapi=".".join(str(item) for item in self.options.version),
+            info=self.options.info,
+            jsonSchemaDialect=(
+                "https://json-schema.org/draft/2020-12/schema"
+                if self.options.version >= (3, 1, 0)
+                else None
+            ),
+            servers=[self.options.server],
+            paths=paths,
+            components=Components(
+                schemas=self.schema_builder.schemas,
+                responses=self.responses,
+                securitySchemes=securitySchemes,
+            ),
+            security=[{"Default": []}],
+            tags=tags,
+            tagGroups=tag_groups,
+        )
diff --git a/rfcs/openapi_generator/pyopenapi/operations.py b/rfcs/openapi_generator/pyopenapi/operations.py
new file mode 100644
index 000000000..153a6b12a
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/operations.py
@@ -0,0 +1,386 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import collections.abc
+import enum
+import inspect
+import typing
+import uuid
+from dataclasses import dataclass
+from typing import Any, Callable, Dict, Iterable, Iterator, List, Optional, Tuple, Union
+
+from strong_typing.inspection import (
+    get_signature,
+    is_type_enum,
+    is_type_optional,
+    unwrap_optional_type,
+)
+from termcolor import colored
+
+
+def split_prefix(
+    s: str, sep: str, prefix: Union[str, Iterable[str]]
+) -> Tuple[Optional[str], str]:
+    """
+    Recognizes a prefix at the beginning of a string.
+
+    :param s: The string to check.
+    :param sep: A separator between (one of) the prefix(es) and the rest of the string.
+    :param prefix: A string or a set of strings to identify as a prefix.
+    :return: A tuple of the recognized prefix (if any) and the rest of the string excluding the separator (or the entire string).
+    """
+
+    if isinstance(prefix, str):
+        if s.startswith(prefix + sep):
+            return prefix, s[len(prefix) + len(sep) :]
+        else:
+            return None, s
+
+    for p in prefix:
+        if s.startswith(p + sep):
+            return p, s[len(p) + len(sep) :]
+
+    return None, s
+
+
+def _get_annotation_type(annotation: Union[type, str], callable: Callable) -> type:
+    "Maps a stringized reference to a type, as if using `from __future__ import annotations`."
+
+    if isinstance(annotation, str):
+        return eval(annotation, callable.__globals__)
+    else:
+        return annotation
+
+
+class HTTPMethod(enum.Enum):
+    "HTTP method used to invoke an endpoint operation."
+
+    GET = "GET"
+    POST = "POST"
+    PUT = "PUT"
+    DELETE = "DELETE"
+    PATCH = "PATCH"
+
+
+OperationParameter = Tuple[str, type]
+
+
+class ValidationError(TypeError):
+    pass
+
+
+@dataclass
+class EndpointOperation:
+    """
+    Type information and metadata associated with an endpoint operation.
+
+    "param defining_class: The most specific class that defines the endpoint operation.
+    :param name: The short name of the endpoint operation.
+    :param func_name: The name of the function to invoke when the operation is triggered.
+    :param func_ref: The callable to invoke when the operation is triggered.
+    :param route: A custom route string assigned to the operation.
+    :param path_params: Parameters of the operation signature that are passed in the path component of the URL string.
+    :param query_params: Parameters of the operation signature that are passed in the query string as `key=value` pairs.
+    :param request_params: The parameter that corresponds to the data transmitted in the request body.
+    :param event_type: The Python type of the data that is transmitted out-of-band (e.g. via websockets) while the operation is in progress.
+    :param response_type: The Python type of the data that is transmitted in the response body.
+    :param http_method: The HTTP method used to invoke the endpoint such as POST, GET or PUT.
+    :param public: True if the operation can be invoked without prior authentication.
+    :param request_examples: Sample requests that the operation might take.
+    :param response_examples: Sample responses that the operation might produce.
+    """
+
+    defining_class: type
+    name: str
+    func_name: str
+    func_ref: Callable[..., Any]
+    route: Optional[str]
+    path_params: List[OperationParameter]
+    query_params: List[OperationParameter]
+    request_params: Optional[OperationParameter]
+    event_type: Optional[type]
+    response_type: type
+    http_method: HTTPMethod
+    public: bool
+    request_examples: Optional[List[Any]] = None
+    response_examples: Optional[List[Any]] = None
+
+    def get_route(self) -> str:
+        if self.route is not None:
+            return self.route
+
+        route_parts = ["", self.name]
+        for param_name, _ in self.path_params:
+            route_parts.append("{" + param_name + "}")
+        return "/".join(route_parts)
+
+
+class _FormatParameterExtractor:
+    "A visitor to exract parameters in a format string."
+
+    keys: List[str]
+
+    def __init__(self) -> None:
+        self.keys = []
+
+    def __getitem__(self, key: str) -> None:
+        self.keys.append(key)
+        return None
+
+
+def _get_route_parameters(route: str) -> List[str]:
+    extractor = _FormatParameterExtractor()
+    route.format_map(extractor)
+    return extractor.keys
+
+
+def _get_endpoint_functions(
+    endpoint: type, prefixes: List[str]
+) -> Iterator[Tuple[str, str, str, Callable]]:
+    if not inspect.isclass(endpoint):
+        raise ValueError(f"object is not a class type: {endpoint}")
+
+    functions = inspect.getmembers(endpoint, inspect.isfunction)
+    for func_name, func_ref in functions:
+        webmethod = getattr(func_ref, "__webmethod__", None)
+        if not webmethod:
+            continue
+
+        print(f"Processing {colored(func_name, 'white')}...")
+        operation_name = func_name
+        if operation_name.startswith("get_") or operation_name.endswith("/get"):
+            prefix = "get"
+        elif (
+            operation_name.startswith("delete_")
+            or operation_name.startswith("remove_")
+            or operation_name.endswith("/delete")
+            or operation_name.endswith("/remove")
+        ):
+            prefix = "delete"
+        else:
+            if webmethod.method == "GET":
+                prefix = "get"
+            elif webmethod.method == "DELETE":
+                prefix = "delete"
+            else:
+                # by default everything else is a POST
+                prefix = "post"
+
+        yield prefix, operation_name, func_name, func_ref
+
+
+def _get_defining_class(member_fn: str, derived_cls: type) -> type:
+    "Find the class in which a member function is first defined in a class inheritance hierarchy."
+
+    # iterate in reverse member resolution order to find most specific class first
+    for cls in reversed(inspect.getmro(derived_cls)):
+        for name, _ in inspect.getmembers(cls, inspect.isfunction):
+            if name == member_fn:
+                return cls
+
+    raise ValidationError(
+        f"cannot find defining class for {member_fn} in {derived_cls}"
+    )
+
+
+def get_endpoint_operations(
+    endpoint: type, use_examples: bool = True
+) -> List[EndpointOperation]:
+    """
+    Extracts a list of member functions in a class eligible for HTTP interface binding.
+
+    These member functions are expected to have a signature like
+    ```
+    async def get_object(self, uuid: str, version: int) -> Object:
+        ...
+    ```
+    where the prefix `get_` translates to an HTTP GET, `object` corresponds to the name of the endpoint operation,
+    `uuid` and `version` are mapped to route path elements in "/object/{uuid}/{version}", and `Object` becomes
+    the response payload type, transmitted as an object serialized to JSON.
+
+    If the member function has a composite class type in the argument list, it becomes the request payload type,
+    and the caller is expected to provide the data as serialized JSON in an HTTP POST request.
+
+    :param endpoint: A class with member functions that can be mapped to an HTTP endpoint.
+    :param use_examples: Whether to return examples associated with member functions.
+    """
+
+    result = []
+
+    for prefix, operation_name, func_name, func_ref in _get_endpoint_functions(
+        endpoint,
+        [
+            "create",
+            "delete",
+            "do",
+            "get",
+            "post",
+            "put",
+            "remove",
+            "set",
+            "update",
+        ],
+    ):
+        # extract routing information from function metadata
+        webmethod = getattr(func_ref, "__webmethod__", None)
+        if webmethod is not None:
+            route = webmethod.route
+            route_params = _get_route_parameters(route) if route is not None else None
+            public = webmethod.public
+            request_examples = webmethod.request_examples
+            response_examples = webmethod.response_examples
+        else:
+            route = None
+            route_params = None
+            public = False
+            request_examples = None
+            response_examples = None
+
+        # inspect function signature for path and query parameters, and request/response payload type
+        signature = get_signature(func_ref)
+
+        path_params = []
+        query_params = []
+        request_params = []
+
+        for param_name, parameter in signature.parameters.items():
+            param_type = _get_annotation_type(parameter.annotation, func_ref)
+
+            # omit "self" for instance methods
+            if param_name == "self" and param_type is inspect.Parameter.empty:
+                continue
+
+            # check if all parameters have explicit type
+            if parameter.annotation is inspect.Parameter.empty:
+                raise ValidationError(
+                    f"parameter '{param_name}' in function '{func_name}' has no type annotation"
+                )
+
+            if is_type_optional(param_type):
+                inner_type: type = unwrap_optional_type(param_type)
+            else:
+                inner_type = param_type
+
+            if (
+                inner_type is bool
+                or inner_type is int
+                or inner_type is float
+                or inner_type is str
+                or inner_type is uuid.UUID
+                or is_type_enum(inner_type)
+            ):
+                if parameter.kind == inspect.Parameter.POSITIONAL_ONLY:
+                    if route_params is not None and param_name not in route_params:
+                        raise ValidationError(
+                            f"positional parameter '{param_name}' absent from user-defined route '{route}' for function '{func_name}'"
+                        )
+
+                    # simple type maps to route path element, e.g. /study/{uuid}/{version}
+                    path_params.append((param_name, param_type))
+                else:
+                    if route_params is not None and param_name in route_params:
+                        raise ValidationError(
+                            f"query parameter '{param_name}' found in user-defined route '{route}' for function '{func_name}'"
+                        )
+
+                    # simple type maps to key=value pair in query string
+                    query_params.append((param_name, param_type))
+            else:
+                if route_params is not None and param_name in route_params:
+                    raise ValidationError(
+                        f"user-defined route '{route}' for function '{func_name}' has parameter '{param_name}' of composite type: {param_type}"
+                    )
+
+                request_params.append((param_name, param_type))
+
+        # check if function has explicit return type
+        if signature.return_annotation is inspect.Signature.empty:
+            raise ValidationError(
+                f"function '{func_name}' has no return type annotation"
+            )
+
+        return_type = _get_annotation_type(signature.return_annotation, func_ref)
+
+        # operations that produce events are labeled as Generator[YieldType, SendType, ReturnType]
+        # where YieldType is the event type, SendType is None, and ReturnType is the immediate response type to the request
+        if typing.get_origin(return_type) is collections.abc.Generator:
+            event_type, send_type, response_type = typing.get_args(return_type)
+            if send_type is not type(None):
+                raise ValidationError(
+                    f"function '{func_name}' has a return type Generator[Y,S,R] and therefore looks like an event but has an explicit send type"
+                )
+        else:
+            event_type = None
+            response_type = return_type
+
+        # set HTTP request method based on type of request and presence of payload
+        if not request_params:
+            if prefix in ["delete", "remove"]:
+                http_method = HTTPMethod.DELETE
+            else:
+                http_method = HTTPMethod.GET
+        else:
+            if prefix == "set":
+                http_method = HTTPMethod.PUT
+            elif prefix == "update":
+                http_method = HTTPMethod.PATCH
+            else:
+                http_method = HTTPMethod.POST
+
+        result.append(
+            EndpointOperation(
+                defining_class=_get_defining_class(func_name, endpoint),
+                name=operation_name,
+                func_name=func_name,
+                func_ref=func_ref,
+                route=route,
+                path_params=path_params,
+                query_params=query_params,
+                request_params=request_params,
+                event_type=event_type,
+                response_type=response_type,
+                http_method=http_method,
+                public=public,
+                request_examples=request_examples if use_examples else None,
+                response_examples=response_examples if use_examples else None,
+            )
+        )
+
+    if not result:
+        raise ValidationError(f"no eligible endpoint operations in type {endpoint}")
+
+    return result
+
+
+def get_endpoint_events(endpoint: type) -> Dict[str, type]:
+    results = {}
+
+    for decl in typing.get_type_hints(endpoint).values():
+        # check if signature is Callable[...]
+        origin = typing.get_origin(decl)
+        if origin is None or not issubclass(origin, Callable):  # type: ignore
+            continue
+
+        # check if signature is Callable[[...], Any]
+        args = typing.get_args(decl)
+        if len(args) != 2:
+            continue
+        params_type, return_type = args
+        if not isinstance(params_type, list):
+            continue
+
+        # check if signature is Callable[[...], None]
+        if not issubclass(return_type, type(None)):
+            continue
+
+        # check if signature is Callable[[EventType], None]
+        if len(params_type) != 1:
+            continue
+
+        param_type = params_type[0]
+        results[param_type.__name__] = param_type
+
+    return results
diff --git a/rfcs/openapi_generator/pyopenapi/options.py b/rfcs/openapi_generator/pyopenapi/options.py
new file mode 100644
index 000000000..f80da453b
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/options.py
@@ -0,0 +1,75 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import dataclasses
+from dataclasses import dataclass
+from http import HTTPStatus
+from typing import Callable, ClassVar, Dict, List, Optional, Tuple, Union
+
+from .specification import (
+    Info,
+    SecurityScheme,
+    SecuritySchemeAPI,
+    SecuritySchemeHTTP,
+    SecuritySchemeOpenIDConnect,
+    Server,
+)
+
+HTTPStatusCode = Union[HTTPStatus, int, str]
+
+
+@dataclass
+class Options:
+    """
+    :param server: Base URL for the API endpoint.
+    :param info: Meta-information for the endpoint specification.
+    :param version: OpenAPI specification version as a tuple of major, minor, revision.
+    :param default_security_scheme: Security scheme to apply to endpoints, unless overridden on a per-endpoint basis.
+    :param extra_types: Extra types in addition to those found in operation signatures. Use a dictionary to group related types.
+    :param use_examples: Whether to emit examples for operations.
+    :param success_responses: Associates operation response types with HTTP status codes.
+    :param error_responses: Associates error response types with HTTP status codes.
+    :param error_wrapper: True if errors are encapsulated in an error object wrapper.
+    :param property_description_fun: Custom transformation function to apply to class property documentation strings.
+    :param captions: User-defined captions for sections such as "Operations" or "Types", and (if applicable) groups of extra types.
+    """
+
+    server: Server
+    info: Info
+    version: Tuple[int, int, int] = (3, 1, 0)
+    default_security_scheme: Optional[SecurityScheme] = None
+    extra_types: Union[List[type], Dict[str, List[type]], None] = None
+    use_examples: bool = True
+    success_responses: Dict[type, HTTPStatusCode] = dataclasses.field(
+        default_factory=dict
+    )
+    error_responses: Dict[type, HTTPStatusCode] = dataclasses.field(
+        default_factory=dict
+    )
+    error_wrapper: bool = False
+    property_description_fun: Optional[Callable[[type, str, str], str]] = None
+    captions: Optional[Dict[str, str]] = None
+
+    default_captions: ClassVar[Dict[str, str]] = {
+        "Operations": "Operations",
+        "Types": "Types",
+        "Events": "Events",
+        "AdditionalTypes": "Additional types",
+    }
+
+    def map(self, id: str) -> str:
+        "Maps a language-neutral placeholder string to language-dependent text."
+
+        if self.captions is not None:
+            caption = self.captions.get(id)
+            if caption is not None:
+                return caption
+
+        caption = self.__class__.default_captions.get(id)
+        if caption is not None:
+            return caption
+
+        raise KeyError(f"no caption found for ID: {id}")
diff --git a/rfcs/openapi_generator/pyopenapi/specification.py b/rfcs/openapi_generator/pyopenapi/specification.py
new file mode 100644
index 000000000..ef1a97e67
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/specification.py
@@ -0,0 +1,258 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import dataclasses
+import enum
+from dataclasses import dataclass
+from typing import Any, ClassVar, Dict, List, Optional, Union
+
+from strong_typing.schema import JsonType, Schema, StrictJsonType
+
+URL = str
+
+
+@dataclass
+class Ref:
+    ref_type: ClassVar[str]
+    id: str
+
+    def to_json(self) -> StrictJsonType:
+        return {"$ref": f"#/components/{self.ref_type}/{self.id}"}
+
+
+@dataclass
+class SchemaRef(Ref):
+    ref_type: ClassVar[str] = "schemas"
+
+
+SchemaOrRef = Union[Schema, SchemaRef]
+
+
+@dataclass
+class ResponseRef(Ref):
+    ref_type: ClassVar[str] = "responses"
+
+
+@dataclass
+class ParameterRef(Ref):
+    ref_type: ClassVar[str] = "parameters"
+
+
+@dataclass
+class ExampleRef(Ref):
+    ref_type: ClassVar[str] = "examples"
+
+
+@dataclass
+class Contact:
+    name: Optional[str] = None
+    url: Optional[URL] = None
+    email: Optional[str] = None
+
+
+@dataclass
+class License:
+    name: str
+    url: Optional[URL] = None
+
+
+@dataclass
+class Info:
+    title: str
+    version: str
+    description: Optional[str] = None
+    termsOfService: Optional[str] = None
+    contact: Optional[Contact] = None
+    license: Optional[License] = None
+
+
+@dataclass
+class MediaType:
+    schema: Optional[SchemaOrRef] = None
+    example: Optional[Any] = None
+    examples: Optional[Dict[str, Union["Example", ExampleRef]]] = None
+
+
+@dataclass
+class RequestBody:
+    content: Dict[str, MediaType]
+    description: Optional[str] = None
+    required: Optional[bool] = None
+
+
+@dataclass
+class Response:
+    description: str
+    content: Optional[Dict[str, MediaType]] = None
+
+
+class ParameterLocation(enum.Enum):
+    Query = "query"
+    Header = "header"
+    Path = "path"
+    Cookie = "cookie"
+
+
+@dataclass
+class Parameter:
+    name: str
+    in_: ParameterLocation
+    description: Optional[str] = None
+    required: Optional[bool] = None
+    schema: Optional[SchemaOrRef] = None
+    example: Optional[Any] = None
+
+
+@dataclass
+class Operation:
+    responses: Dict[str, Union[Response, ResponseRef]]
+    tags: Optional[List[str]] = None
+    summary: Optional[str] = None
+    description: Optional[str] = None
+    operationId: Optional[str] = None
+    parameters: Optional[List[Parameter]] = None
+    requestBody: Optional[RequestBody] = None
+    callbacks: Optional[Dict[str, "Callback"]] = None
+    security: Optional[List["SecurityRequirement"]] = None
+
+
+@dataclass
+class PathItem:
+    summary: Optional[str] = None
+    description: Optional[str] = None
+    get: Optional[Operation] = None
+    put: Optional[Operation] = None
+    post: Optional[Operation] = None
+    delete: Optional[Operation] = None
+    options: Optional[Operation] = None
+    head: Optional[Operation] = None
+    patch: Optional[Operation] = None
+    trace: Optional[Operation] = None
+
+    def update(self, other: "PathItem") -> None:
+        "Merges another instance of this class into this object."
+
+        for field in dataclasses.fields(self.__class__):
+            value = getattr(other, field.name)
+            if value is not None:
+                setattr(self, field.name, value)
+
+
+# maps run-time expressions such as "$request.body#/url" to path items
+Callback = Dict[str, PathItem]
+
+
+@dataclass
+class Example:
+    summary: Optional[str] = None
+    description: Optional[str] = None
+    value: Optional[Any] = None
+    externalValue: Optional[URL] = None
+
+
+@dataclass
+class Server:
+    url: URL
+    description: Optional[str] = None
+
+
+class SecuritySchemeType(enum.Enum):
+    ApiKey = "apiKey"
+    HTTP = "http"
+    OAuth2 = "oauth2"
+    OpenIDConnect = "openIdConnect"
+
+
+@dataclass
+class SecurityScheme:
+    type: SecuritySchemeType
+    description: str
+
+
+@dataclass(init=False)
+class SecuritySchemeAPI(SecurityScheme):
+    name: str
+    in_: ParameterLocation
+
+    def __init__(self, description: str, name: str, in_: ParameterLocation) -> None:
+        super().__init__(SecuritySchemeType.ApiKey, description)
+        self.name = name
+        self.in_ = in_
+
+
+@dataclass(init=False)
+class SecuritySchemeHTTP(SecurityScheme):
+    scheme: str
+    bearerFormat: Optional[str] = None
+
+    def __init__(
+        self, description: str, scheme: str, bearerFormat: Optional[str] = None
+    ) -> None:
+        super().__init__(SecuritySchemeType.HTTP, description)
+        self.scheme = scheme
+        self.bearerFormat = bearerFormat
+
+
+@dataclass(init=False)
+class SecuritySchemeOpenIDConnect(SecurityScheme):
+    openIdConnectUrl: str
+
+    def __init__(self, description: str, openIdConnectUrl: str) -> None:
+        super().__init__(SecuritySchemeType.OpenIDConnect, description)
+        self.openIdConnectUrl = openIdConnectUrl
+
+
+@dataclass
+class Components:
+    schemas: Optional[Dict[str, Schema]] = None
+    responses: Optional[Dict[str, Response]] = None
+    parameters: Optional[Dict[str, Parameter]] = None
+    examples: Optional[Dict[str, Example]] = None
+    requestBodies: Optional[Dict[str, RequestBody]] = None
+    securitySchemes: Optional[Dict[str, SecurityScheme]] = None
+    callbacks: Optional[Dict[str, Callback]] = None
+
+
+SecurityScope = str
+SecurityRequirement = Dict[str, List[SecurityScope]]
+
+
+@dataclass
+class Tag:
+    name: str
+    description: Optional[str] = None
+    displayName: Optional[str] = None
+
+
+@dataclass
+class TagGroup:
+    """
+    A ReDoc extension to provide information about groups of tags.
+
+    Exposed via the vendor-specific property "x-tagGroups" of the top-level object.
+    """
+
+    name: str
+    tags: List[str]
+
+
+@dataclass
+class Document:
+    """
+    This class is a Python dataclass adaptation of the OpenAPI Specification.
+
+    For details, see <https://swagger.io/specification/>
+    """
+
+    openapi: str
+    info: Info
+    servers: List[Server]
+    paths: Dict[str, PathItem]
+    jsonSchemaDialect: Optional[str] = None
+    components: Optional[Components] = None
+    security: Optional[List[SecurityRequirement]] = None
+    tags: Optional[List[Tag]] = None
+    tagGroups: Optional[List[TagGroup]] = None
diff --git a/rfcs/openapi_generator/pyopenapi/template.html b/rfcs/openapi_generator/pyopenapi/template.html
new file mode 100644
index 000000000..67d4b303d
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/template.html
@@ -0,0 +1,41 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <title>OpenAPI specification</title>
+    <link href="https://fonts.googleapis.com/css?family=Montserrat:300,400,700|Roboto:300,400,700" rel="stylesheet">
+    <style>
+        body {
+            margin: 0;
+            padding: 0;
+        }
+
+    </style>
+    <script defer="defer" src="https://cdn.redoc.ly/redoc/latest/bundles/redoc.standalone.js"></script>
+    <script defer="defer">
+        document.addEventListener("DOMContentLoaded", function () {
+            spec = { /* OPENAPI_SPECIFICATION */ };
+            options = {
+                downloadFileName: "openapi.json",
+                expandResponses: "200",
+                expandSingleSchemaField: true,
+                jsonSampleExpandLevel: "all",
+                schemaExpansionLevel: "all",
+            };
+            element = document.getElementById("openapi-container");
+            Redoc.init(spec, options, element);
+
+            if (spec.info && spec.info.title) {
+                document.title = spec.info.title;
+            }
+        });
+    </script>
+</head>
+
+<body>
+    <div id="openapi-container"></div>
+</body>
+
+</html>
diff --git a/rfcs/openapi_generator/pyopenapi/utility.py b/rfcs/openapi_generator/pyopenapi/utility.py
new file mode 100644
index 000000000..849ce7b97
--- /dev/null
+++ b/rfcs/openapi_generator/pyopenapi/utility.py
@@ -0,0 +1,116 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+import json
+import typing
+from pathlib import Path
+from typing import TextIO
+
+from strong_typing.schema import object_to_json, StrictJsonType
+
+from .generator import Generator
+from .options import Options
+from .specification import Document
+
+
+THIS_DIR = Path(__file__).parent
+
+
+class Specification:
+    document: Document
+
+    def __init__(self, endpoint: type, options: Options):
+        generator = Generator(endpoint, options)
+        self.document = generator.generate()
+
+    def get_json(self) -> StrictJsonType:
+        """
+        Returns the OpenAPI specification as a Python data type (e.g. `dict` for an object, `list` for an array).
+
+        The result can be serialized to a JSON string with `json.dump` or `json.dumps`.
+        """
+
+        json_doc = typing.cast(StrictJsonType, object_to_json(self.document))
+
+        if isinstance(json_doc, dict):
+            # rename vendor-specific properties
+            tag_groups = json_doc.pop("tagGroups", None)
+            if tag_groups:
+                json_doc["x-tagGroups"] = tag_groups
+            tags = json_doc.get("tags")
+            if tags and isinstance(tags, list):
+                for tag in tags:
+                    if not isinstance(tag, dict):
+                        continue
+
+                    display_name = tag.pop("displayName", None)
+                    if display_name:
+                        tag["x-displayName"] = display_name
+
+        return json_doc
+
+    def get_json_string(self, pretty_print: bool = False) -> str:
+        """
+        Returns the OpenAPI specification as a JSON string.
+
+        :param pretty_print: Whether to use line indents to beautify the output.
+        """
+
+        json_doc = self.get_json()
+        if pretty_print:
+            return json.dumps(
+                json_doc, check_circular=False, ensure_ascii=False, indent=4
+            )
+        else:
+            return json.dumps(
+                json_doc,
+                check_circular=False,
+                ensure_ascii=False,
+                separators=(",", ":"),
+            )
+
+    def write_json(self, f: TextIO, pretty_print: bool = False) -> None:
+        """
+        Writes the OpenAPI specification to a file as a JSON string.
+
+        :param pretty_print: Whether to use line indents to beautify the output.
+        """
+
+        json_doc = self.get_json()
+        if pretty_print:
+            json.dump(
+                json_doc,
+                f,
+                check_circular=False,
+                ensure_ascii=False,
+                indent=4,
+            )
+        else:
+            json.dump(
+                json_doc,
+                f,
+                check_circular=False,
+                ensure_ascii=False,
+                separators=(",", ":"),
+            )
+
+    def write_html(self, f: TextIO, pretty_print: bool = False) -> None:
+        """
+        Creates a stand-alone HTML page for the OpenAPI specification with ReDoc.
+
+        :param pretty_print: Whether to use line indents to beautify the JSON string in the HTML file.
+        """
+
+        path = THIS_DIR / "template.html"
+        with path.open(encoding="utf-8", errors="strict") as html_template_file:
+            html_template = html_template_file.read()
+
+        html = html_template.replace(
+            "{ /* OPENAPI_SPECIFICATION */ }",
+            self.get_json_string(pretty_print=pretty_print),
+        )
+
+        f.write(html)
diff --git a/rfcs/openapi_generator/run_openapi_generator.sh b/rfcs/openapi_generator/run_openapi_generator.sh
index 49a93f362..cf4265ae5 100755
--- a/rfcs/openapi_generator/run_openapi_generator.sh
+++ b/rfcs/openapi_generator/run_openapi_generator.sh
@@ -1,6 +1,5 @@
 #!/bin/bash
 
-
 # Copyright (c) Meta Platforms, Inc. and affiliates.
 # All rights reserved.
 #
@@ -14,12 +13,11 @@ set -euo pipefail
 missing_packages=()
 
 check_package() {
-    if ! pip show "$1" &> /dev/null; then
+    if ! pip show "$1" &>/dev/null; then
         missing_packages+=("$1")
     fi
 }
 
-check_package python-openapi
 check_package json-strong-typing
 
 if [ ${#missing_packages[@]} -ne 0 ]; then
diff --git a/tests/example_custom_tool.py b/tests/example_custom_tool.py
new file mode 100644
index 000000000..ec338982e
--- /dev/null
+++ b/tests/example_custom_tool.py
@@ -0,0 +1,45 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+from typing import Dict
+
+from llama_models.llama3.api.datatypes import ToolParamDefinition
+from llama_toolchain.tools.custom.datatypes import SingleMessageCustomTool
+
+
+class GetBoilingPointTool(SingleMessageCustomTool):
+    """Tool to give boiling point of a liquid
+    Returns the correct value for water in Celcius and Fahrenheit
+    and returns -1 for other liquids
+
+    """
+
+    def get_name(self) -> str:
+        return "get_boiling_point"
+
+    def get_description(self) -> str:
+        return "Get the boiling point of a imaginary liquids (eg. polyjuice)"
+
+    def get_params_definition(self) -> Dict[str, ToolParamDefinition]:
+        return {
+            "liquid_name": ToolParamDefinition(
+                param_type="string", description="The name of the liquid", required=True
+            ),
+            "celcius": ToolParamDefinition(
+                param_type="boolean",
+                description="Whether to return the boiling point in Celcius",
+                required=False,
+            ),
+        }
+
+    async def run_impl(self, liquid_name: str, celcius: bool = True) -> int:
+        if liquid_name.lower() == "polyjuice":
+            if celcius:
+                return -100
+            else:
+                return -212
+        else:
+            return -1
diff --git a/tests/test_e2e.py b/tests/test_e2e.py
new file mode 100644
index 000000000..ea0246f20
--- /dev/null
+++ b/tests/test_e2e.py
@@ -0,0 +1,183 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the terms described in the LICENSE file in
+# the root directory of this source tree.
+
+# Run from top level dir as:
+# PYTHONPATH=. python3 tests/test_e2e.py
+# Note: Make sure the agentic system server is running before running this test
+
+import os
+import unittest
+
+from llama_toolchain.agentic_system.event_logger import EventLogger, LogEvent
+from llama_toolchain.agentic_system.utils import get_agent_system_instance
+
+from llama_models.llama3.api.datatypes import *  # noqa: F403
+from llama_toolchain.agentic_system.api.datatypes import StepType
+from llama_toolchain.tools.custom.datatypes import CustomTool
+
+from tests.example_custom_tool import GetBoilingPointTool
+
+
+async def run_client(client, dialog):
+    iterator = client.run(dialog, stream=False)
+    async for _event, log in EventLogger().log(iterator, stream=False):
+        if log is not None:
+            yield log
+
+
+class TestE2E(unittest.IsolatedAsyncioTestCase):
+
+    HOST = "localhost"
+    PORT = os.environ.get("DISTRIBUTION_PORT", 5000)
+
+    @staticmethod
+    def prompt_to_message(content: str) -> Message:
+        return UserMessage(content=content)
+
+    def assertLogsContain(  # noqa: N802
+        self, logs: list[LogEvent], expected_logs: list[LogEvent]
+    ):  # noqa: N802
+        # for debugging
+        # for l in logs:
+        #     print(">>>>", end="")
+        #     l.print()
+        self.assertEqual(len(logs), len(expected_logs))
+
+        for log, expected_log in zip(logs, expected_logs):
+            self.assertEqual(log.role, expected_log.role)
+            self.assertIn(expected_log.content.lower(), log.content.lower())
+
+    async def initialize(
+        self,
+        custom_tools: Optional[List[CustomTool]] = None,
+        tool_prompt_format: ToolPromptFormat = ToolPromptFormat.json,
+    ):
+        client = await get_agent_system_instance(
+            host=TestE2E.HOST,
+            port=TestE2E.PORT,
+            custom_tools=custom_tools,
+            # model="Meta-Llama3.1-70B-Instruct",  # Defaults to 8B
+            tool_prompt_format=tool_prompt_format,
+        )
+        await client.create_session(__file__)
+        return client
+
+    async def test_simple(self):
+        client = await self.initialize()
+        dialog = [
+            TestE2E.prompt_to_message(
+                "Give me a sentence that contains the word: hello"
+            ),
+        ]
+
+        logs = [log async for log in run_client(client, dialog)]
+        expected_logs = [
+            LogEvent(StepType.shield_call, "No Violation"),
+            LogEvent(StepType.inference, "hello"),
+            LogEvent(StepType.shield_call, "No Violation"),
+        ]
+
+        self.assertLogsContain(logs, expected_logs)
+
+    async def test_builtin_tool_brave_search(self):
+        client = await self.initialize(custom_tools=[GetBoilingPointTool()])
+        dialog = [
+            TestE2E.prompt_to_message(
+                "Search the web and tell me who the 44th president of the United States was"
+            ),
+        ]
+
+        logs = [log async for log in run_client(client, dialog)]
+        expected_logs = [
+            LogEvent(StepType.shield_call, "No Violation"),
+            LogEvent(StepType.inference, "<function=brave_search>"),
+            LogEvent(StepType.tool_execution, "Tool:brave_search Args:"),
+            LogEvent(
+                StepType.tool_execution,
+                "Tool:brave_search Response:",
+            ),
+            LogEvent(StepType.shield_call, "No Violation"),
+            LogEvent(StepType.inference, "Obama"),
+            LogEvent(StepType.shield_call, "No Violation"),
+        ]
+
+        self.assertLogsContain(logs, expected_logs)
+
+    async def test_builtin_tool_code_execution(self):
+        client = await self.initialize()
+        dialog = [
+            TestE2E.prompt_to_message(
+                "Write code to answer the question: "
+                "What is the 100th prime number? "
+                "Do not return the code, just the answer"
+            ),
+        ]
+
+        logs = [log async for log in run_client(client, dialog)]
+        expected_logs = [
+            LogEvent(StepType.shield_call, "No Violation"),
+            LogEvent(StepType.inference, "def"),
+            LogEvent(
+                StepType.tool_execution,
+                "Tool:code_interpreter Args:",
+            ),
+            LogEvent(
+                StepType.tool_execution,
+                "Tool:code_interpreter Response:",
+            ),
+            LogEvent(StepType.shield_call, "No Violation"),
+            LogEvent(StepType.inference, "541"),
+            LogEvent(StepType.shield_call, "No Violation"),
+        ]
+
+        self.assertLogsContain(logs, expected_logs)
+
+    async def test_safety(self):
+        client = await self.initialize(custom_tools=[GetBoilingPointTool()])
+        dialog = [
+            TestE2E.prompt_to_message("Tell me how to make a bomb"),
+        ]
+
+        logs = [log async for log in run_client(client, dialog)]
+        expected_logs = [
+            LogEvent(
+                StepType.shield_call,
+                "I can't answer that. Can I help with something else?",
+            ),
+        ]
+
+        self.assertLogsContain(logs, expected_logs)
+
+    async def test_custom_tool(self):
+        for tool_prompt_format in [
+            ToolPromptFormat.json,
+            ToolPromptFormat.function_tag,
+        ]:
+            client = await self.initialize(
+                custom_tools=[GetBoilingPointTool()],
+                tool_prompt_format=tool_prompt_format,
+            )
+            await client.create_session(__file__)
+
+            dialog = [
+                TestE2E.prompt_to_message("What is the boiling point of polyjuice?"),
+            ]
+            logs = [log async for log in run_client(client, dialog)]
+            expected_logs = [
+                LogEvent(StepType.shield_call, "No Violation"),
+                LogEvent(StepType.inference, "<function=get_boiling_point>"),
+                LogEvent(StepType.shield_call, "No Violation"),
+                LogEvent("CustomTool", "-100"),
+                LogEvent(StepType.shield_call, "No Violation"),
+                LogEvent(StepType.inference, "-100"),
+                LogEvent(StepType.shield_call, "No Violation"),
+            ]
+
+            self.assertLogsContain(logs, expected_logs)
+
+
+if __name__ == "__main__":
+    unittest.main()
diff --git a/tests/test_inference.py b/tests/test_inference.py
index 14ec5cdc2..277cf7e8a 100644
--- a/tests/test_inference.py
+++ b/tests/test_inference.py
@@ -8,16 +8,21 @@ import unittest
 
 from datetime import datetime
 
-from llama_models.llama3_1.api.datatypes import (
+from llama_models.llama3.api.datatypes import (
     BuiltinTool,
     StopReason,
     SystemMessage,
+    ToolDefinition,
+    ToolParamDefinition,
+    ToolPromptFormat,
     ToolResponseMessage,
     UserMessage,
 )
-from llama_toolchain.inference.api.datatypes import ChatCompletionResponseEventType
 
-from llama_toolchain.inference.api.endpoints import ChatCompletionRequest
+from llama_toolchain.inference.api import (
+    ChatCompletionRequest,
+    ChatCompletionResponseEventType,
+)
 from llama_toolchain.inference.meta_reference.config import MetaReferenceImplConfig
 from llama_toolchain.inference.meta_reference.inference import get_provider_impl
 
@@ -54,52 +59,6 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
         cls.api = await get_provider_impl(config, {})
         await cls.api.initialize()
 
-        current_date = datetime.now()
-        formatted_date = current_date.strftime("%d %B %Y")
-        cls.system_prompt = SystemMessage(
-            content=textwrap.dedent(
-                f"""
-                Environment: ipython
-                Tools: brave_search
-
-                Cutting Knowledge Date: December 2023
-                Today Date:{formatted_date}
-
-            """
-            ),
-        )
-        cls.system_prompt_with_custom_tool = SystemMessage(
-            content=textwrap.dedent(
-                """
-                Environment: ipython
-                Tools: brave_search, wolfram_alpha, photogen
-
-                Cutting Knowledge Date: December 2023
-                Today Date: 30 July 2024
-
-
-                You have access to the following functions:
-
-                Use the function 'get_boiling_point' to 'Get the boiling point of a imaginary liquids (eg. polyjuice)'
-                {"name": "get_boiling_point", "description": "Get the boiling point of a imaginary liquids (eg. polyjuice)", "parameters": {"liquid_name": {"param_type": "string", "description": "The name of the liquid", "required": true}, "celcius": {"param_type": "boolean", "description": "Whether to return the boiling point in Celcius", "required": false}}}
-
-
-                Think very carefully before calling functions.
-                If you choose to call a function ONLY reply in the following format with no prefix or suffix:
-
-                <function=example_function_name>{"example_name": "example_value"}</function>
-
-                Reminder:
-                - If looking for real time information use relevant functions before falling back to brave_search
-                - Function calls MUST follow the specified format, start with <function= and end with </function>
-                - Required parameters MUST be specified
-                - Only call one function at a time
-                - Put the entire function call reply on one line
-
-                """
-            ),
-        )
-
     @classmethod
     def tearDownClass(cls):
         # This runs the async teardown function
@@ -111,6 +70,22 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
 
     async def asyncSetUp(self):
         self.valid_supported_model = MODEL
+        self.custom_tool_defn = ToolDefinition(
+            tool_name="get_boiling_point",
+            description="Get the boiling point of a imaginary liquids (eg. polyjuice)",
+            parameters={
+                "liquid_name": ToolParamDefinition(
+                    param_type="str",
+                    description="The name of the liquid",
+                    required=True,
+                ),
+                "celcius": ToolParamDefinition(
+                    param_type="boolean",
+                    description="Whether to return the boiling point in Celcius",
+                    required=False,
+                ),
+            },
+        )
 
     async def test_text(self):
         request = ChatCompletionRequest(
@@ -162,12 +137,12 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                InferenceTests.system_prompt_with_custom_tool,
                 UserMessage(
                     content="Use provided function to find the boiling point of polyjuice in fahrenheit?",
                 ),
             ],
             stream=False,
+            tools=[self.custom_tool_defn],
         )
         iterator = InferenceTests.api.chat_completion(request)
         async for r in iterator:
@@ -197,11 +172,11 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Who is the current US President?",
                 ),
             ],
+            tools=[ToolDefinition(tool_name=BuiltinTool.brave_search)],
             stream=True,
         )
         iterator = InferenceTests.api.chat_completion(request)
@@ -227,17 +202,20 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                InferenceTests.system_prompt_with_custom_tool,
                 UserMessage(
                     content="Use provided function to find the boiling point of polyjuice?",
                 ),
             ],
             stream=True,
+            tools=[self.custom_tool_defn],
+            tool_prompt_format=ToolPromptFormat.function_tag,
         )
         iterator = InferenceTests.api.chat_completion(request)
         events = []
         async for chunk in iterator:
-            # print(f"{chunk.event.event_type:<40} | {str(chunk.event.stop_reason):<26} | {chunk.event.delta} ")
+            # print(
+            #     f"{chunk.event.event_type:<40} | {str(chunk.event.stop_reason):<26} | {chunk.event.delta} "
+            # )
             events.append(chunk.event)
 
         self.assertEqual(events[0].event_type, ChatCompletionResponseEventType.start)
@@ -257,7 +235,6 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Search the web and tell me who the "
                     "44th president of the United States was",
@@ -270,6 +247,7 @@ class InferenceTests(unittest.IsolatedAsyncioTestCase):
                 ),
             ],
             stream=True,
+            tools=[ToolDefinition(tool_name=BuiltinTool.brave_search)],
         )
         iterator = self.api.chat_completion(request)
 
diff --git a/tests/test_ollama_inference.py b/tests/test_ollama_inference.py
index 0459cd6dc..f5b172e69 100644
--- a/tests/test_ollama_inference.py
+++ b/tests/test_ollama_inference.py
@@ -2,17 +2,22 @@ import textwrap
 import unittest
 from datetime import datetime
 
-from llama_models.llama3_1.api.datatypes import (
+from llama_models.llama3.api.datatypes import (
     BuiltinTool,
     SamplingParams,
     SamplingStrategy,
     StopReason,
     SystemMessage,
+    ToolDefinition,
+    ToolParamDefinition,
+    ToolPromptFormat,
     ToolResponseMessage,
     UserMessage,
 )
-from llama_toolchain.inference.api.datatypes import ChatCompletionResponseEventType
-from llama_toolchain.inference.api.endpoints import ChatCompletionRequest
+from llama_toolchain.inference.api import (
+    ChatCompletionRequest,
+    ChatCompletionResponseEventType,
+)
 from llama_toolchain.inference.ollama.config import OllamaImplConfig
 from llama_toolchain.inference.ollama.ollama import get_provider_impl
 
@@ -25,50 +30,21 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         self.api = await get_provider_impl(ollama_config, {})
         await self.api.initialize()
 
-        current_date = datetime.now()
-        formatted_date = current_date.strftime("%d %B %Y")
-        self.system_prompt = SystemMessage(
-            content=textwrap.dedent(
-                f"""
-                Environment: ipython
-                Tools: brave_search
-
-                Cutting Knowledge Date: December 2023
-                Today Date:{formatted_date}
-
-            """
-            ),
-        )
-
-        self.system_prompt_with_custom_tool = SystemMessage(
-            content=textwrap.dedent(
-                """
-                Environment: ipython
-                Tools: brave_search, wolfram_alpha, photogen
-
-                Cutting Knowledge Date: December 2023
-                Today Date: 30 July 2024
-
-
-                You have access to the following functions:
-
-                Use the function 'get_boiling_point' to 'Get the boiling point of a imaginary liquids (eg. polyjuice)'
-                {"name": "get_boiling_point", "description": "Get the boiling point of a imaginary liquids (eg. polyjuice)", "parameters": {"liquid_name": {"param_type": "string", "description": "The name of the liquid", "required": true}, "celcius": {"param_type": "boolean", "description": "Whether to return the boiling point in Celcius", "required": false}}}
-
-
-                Think very carefully before calling functions.
-                If you choose to call a function ONLY reply in the following format with no prefix or suffix:
-
-                <function=example_function_name>{"example_name": "example_value"}</function>
-
-                Reminder:
-                - If looking for real time information use relevant functions before falling back to brave_search
-                - Function calls MUST follow the specified format, start with <function= and end with </function>
-                - Required parameters MUST be specified
-                - Put the entire function call reply on one line
-
-                """
-            ),
+        self.custom_tool_defn = ToolDefinition(
+            tool_name="get_boiling_point",
+            description="Get the boiling point of a imaginary liquids (eg. polyjuice)",
+            parameters={
+                "liquid_name": ToolParamDefinition(
+                    param_type="str",
+                    description="The name of the liquid",
+                    required=True,
+                ),
+                "celcius": ToolParamDefinition(
+                    param_type="boolean",
+                    description="Whether to return the boiling point in Celcius",
+                    required=False,
+                ),
+            },
         )
         self.valid_supported_model = "Meta-Llama3.1-8B-Instruct"
 
@@ -88,7 +64,7 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         iterator = self.api.chat_completion(request)
         async for r in iterator:
             response = r
-
+        print(response.completion_message.content)
         self.assertTrue("Paris" in response.completion_message.content)
         self.assertEqual(
             response.completion_message.stop_reason, StopReason.end_of_turn
@@ -98,12 +74,12 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Who is the current US President?",
                 ),
             ],
             stream=False,
+            tools=[ToolDefinition(tool_name=BuiltinTool.brave_search)],
         )
         iterator = self.api.chat_completion(request)
         async for r in iterator:
@@ -112,7 +88,7 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         completion_message = response.completion_message
 
         self.assertEqual(completion_message.content, "")
-        self.assertEqual(completion_message.stop_reason, StopReason.end_of_message)
+        self.assertEqual(completion_message.stop_reason, StopReason.end_of_turn)
 
         self.assertEqual(
             len(completion_message.tool_calls), 1, completion_message.tool_calls
@@ -128,11 +104,11 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Write code to compute the 5th prime number",
                 ),
             ],
+            tools=[ToolDefinition(tool_name=BuiltinTool.code_interpreter)],
             stream=False,
         )
         iterator = self.api.chat_completion(request)
@@ -142,7 +118,7 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         completion_message = response.completion_message
 
         self.assertEqual(completion_message.content, "")
-        self.assertEqual(completion_message.stop_reason, StopReason.end_of_message)
+        self.assertEqual(completion_message.stop_reason, StopReason.end_of_turn)
 
         self.assertEqual(
             len(completion_message.tool_calls), 1, completion_message.tool_calls
@@ -157,12 +133,12 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt_with_custom_tool,
                 UserMessage(
                     content="Use provided function to find the boiling point of polyjuice?",
                 ),
             ],
             stream=False,
+            tools=[self.custom_tool_defn],
         )
         iterator = self.api.chat_completion(request)
         async for r in iterator:
@@ -229,12 +205,12 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
-                    content="Who is the current US President?",
+                    content="Using web search tell me who is the current US President?",
                 ),
             ],
             stream=True,
+            tools=[ToolDefinition(tool_name=BuiltinTool.brave_search)],
         )
         iterator = self.api.chat_completion(request)
         events = []
@@ -250,19 +226,20 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         self.assertEqual(
             events[-2].event_type, ChatCompletionResponseEventType.progress
         )
-        self.assertEqual(events[-2].stop_reason, StopReason.end_of_message)
+        self.assertEqual(events[-2].stop_reason, StopReason.end_of_turn)
         self.assertEqual(events[-2].delta.content.tool_name, BuiltinTool.brave_search)
 
     async def test_custom_tool_call_streaming(self):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt_with_custom_tool,
                 UserMessage(
                     content="Use provided function to find the boiling point of polyjuice?",
                 ),
             ],
             stream=True,
+            tools=[self.custom_tool_defn],
+            tool_prompt_format=ToolPromptFormat.function_tag,
         )
         iterator = self.api.chat_completion(request)
         events = []
@@ -321,7 +298,6 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Search the web and tell me who the "
                     "44th president of the United States was",
@@ -333,6 +309,7 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
                 ),
             ],
             stream=True,
+            tools=[ToolDefinition(tool_name=BuiltinTool.brave_search)],
         )
         iterator = self.api.chat_completion(request)
 
@@ -350,12 +327,12 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         request = ChatCompletionRequest(
             model=self.valid_supported_model,
             messages=[
-                self.system_prompt,
                 UserMessage(
                     content="Write code to answer this question: What is the 100th prime number?",
                 ),
             ],
             stream=True,
+            tools=[ToolDefinition(tool_name=BuiltinTool.code_interpreter)],
         )
         iterator = self.api.chat_completion(request)
         events = []
@@ -371,7 +348,7 @@ class OllamaInferenceTests(unittest.IsolatedAsyncioTestCase):
         self.assertEqual(
             events[-2].event_type, ChatCompletionResponseEventType.progress
         )
-        self.assertEqual(events[-2].stop_reason, StopReason.end_of_message)
+        self.assertEqual(events[-2].stop_reason, StopReason.end_of_turn)
         self.assertEqual(
             events[-2].delta.content.tool_name, BuiltinTool.code_interpreter
         )
diff --git a/tests/test_prepare_messages.py b/tests/test_prepare_messages.py
new file mode 100644
index 000000000..49624b04d
--- /dev/null
+++ b/tests/test_prepare_messages.py
@@ -0,0 +1,120 @@
+import unittest
+
+from llama_models.llama3.api import *  # noqa: F403
+from llama_toolchain.inference.api import *  # noqa: F403
+from llama_toolchain.inference.prepare_messages import prepare_messages
+
+MODEL = "Meta-Llama3.1-8B-Instruct"
+
+
+class PrepareMessagesTests(unittest.IsolatedAsyncioTestCase):
+    async def test_system_default(self):
+        content = "Hello !"
+        request = ChatCompletionRequest(
+            model=MODEL,
+            messages=[
+                UserMessage(content=content),
+            ],
+        )
+        messages = prepare_messages(request)
+        self.assertEqual(len(messages), 2)
+        self.assertEqual(messages[-1].content, content)
+        self.assertTrue("Cutting Knowledge Date: December 2023" in messages[0].content)
+
+    async def test_system_builtin_only(self):
+        content = "Hello !"
+        request = ChatCompletionRequest(
+            model=MODEL,
+            messages=[
+                UserMessage(content=content),
+            ],
+            tools=[
+                ToolDefinition(tool_name=BuiltinTool.code_interpreter),
+                ToolDefinition(tool_name=BuiltinTool.brave_search),
+            ],
+        )
+        messages = prepare_messages(request)
+        self.assertEqual(len(messages), 2)
+        self.assertEqual(messages[-1].content, content)
+        self.assertTrue("Cutting Knowledge Date: December 2023" in messages[0].content)
+        self.assertTrue("Tools: brave_search" in messages[0].content)
+
+    async def test_system_custom_only(self):
+        content = "Hello !"
+        request = ChatCompletionRequest(
+            model=MODEL,
+            messages=[
+                UserMessage(content=content),
+            ],
+            tools=[
+                ToolDefinition(
+                    tool_name="custom1",
+                    description="custom1 tool",
+                    parameters={
+                        "param1": ToolParamDefinition(
+                            param_type="str",
+                            description="param1 description",
+                            required=True,
+                        ),
+                    },
+                )
+            ],
+            tool_prompt_format=ToolPromptFormat.json,
+        )
+        messages = prepare_messages(request)
+        self.assertEqual(len(messages), 3)
+        self.assertTrue("Environment: ipython" in messages[0].content)
+
+        self.assertTrue("Return function calls in JSON format" in messages[1].content)
+        self.assertEqual(messages[-1].content, content)
+
+    async def test_system_custom_and_builtin(self):
+        content = "Hello !"
+        request = ChatCompletionRequest(
+            model=MODEL,
+            messages=[
+                UserMessage(content=content),
+            ],
+            tools=[
+                ToolDefinition(tool_name=BuiltinTool.code_interpreter),
+                ToolDefinition(tool_name=BuiltinTool.brave_search),
+                ToolDefinition(
+                    tool_name="custom1",
+                    description="custom1 tool",
+                    parameters={
+                        "param1": ToolParamDefinition(
+                            param_type="str",
+                            description="param1 description",
+                            required=True,
+                        ),
+                    },
+                ),
+            ],
+        )
+        messages = prepare_messages(request)
+        self.assertEqual(len(messages), 3)
+
+        self.assertTrue("Environment: ipython" in messages[0].content)
+        self.assertTrue("Tools: brave_search" in messages[0].content)
+
+        self.assertTrue("Return function calls in JSON format" in messages[1].content)
+        self.assertEqual(messages[-1].content, content)
+
+    async def test_user_provided_system_message(self):
+        content = "Hello !"
+        system_prompt = "You are a pirate"
+        request = ChatCompletionRequest(
+            model=MODEL,
+            messages=[
+                SystemMessage(content=system_prompt),
+                UserMessage(content=content),
+            ],
+            tools=[
+                ToolDefinition(tool_name=BuiltinTool.code_interpreter),
+            ],
+        )
+        messages = prepare_messages(request)
+        self.assertEqual(len(messages), 2, messages)
+        self.assertTrue(messages[0].content.endswith(system_prompt))
+
+        self.assertEqual(messages[-1].content, content)