Refactor codebase to implement MCP server for GPT Researcher

Replaced FastAPI app with an MCP server implementation, enhancing flexibility and modularity for research operations. Deprecated `phoenix_technologies` package, updated server logic, added utility functions, and revised dependencies in `requirements.txt`. Updated Dockerfile and README to align with the new architecture.
2025-04-26 17:54:43 +02:00 · 2025-04-26 17:54:43 +02:00 · 44b91b9375
commit 44b91b9375
parent 73e929ca00
10 changed files with 481 additions and 281 deletions
--- a/2
+++ b/2
@ -22,4 +22,4 @@ COPY src/ /app/
 EXPOSE 8000
 # Set the default command to run the app with `uvicorn`
-CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+CMD ["python", "server.py"]
--- a/README.md
+++ b/README.md
@ -1,175 +1,72 @@
-# README for FastAPI-Based Report GPT Generation Service
+# Project Overview
-
+## Description
-## Overview
+This project is a server-side application built with Python that facilitates research-related operations. It provides functionalities to manage researchers, handle resources, process queries, and generate in-depth research reports. The application features reusable utility functions to streamline responses, handle exceptions gracefully, and format data for client consumption. A `Dockerfile` is provided for easy containerization and deployment.
 This repository contains the implementation of a **FastAPI**-based service designed to generate research reports. The service processes user-provided queries and report types, performing advanced research powered by `GPTResearcher` and responding with comprehensive results, including details, cost, context, images, and other associated metadata.
 ## Features
 ### Server Functionality
 The main server functionalities are defined in `server.py`, which includes:
 - **research_resource**: Management of research resources.
 - **deep_research**: Conducts detailed research operations.
 - **write_report**: Creates comprehensive reports based on researched data.
 - **get_research_sources**: Retrieves information sources for research.
 - **get_research_context**: Provides contextual information tied to research.
 - **research_query**: Handles incoming research-related queries.
 - **run_server**: Initializes and runs the server.
- **RESTful API** to handle user queries and generate reports.
+### Utility Functions
- **Streaming responses** to deliver research output in chunks.
+The `utils.py` file provides additional support, including:
- **Secure API access** with API Key authentication.
+- **Response Handling**:
- Completely containerized setup with Docker.
+    - `create_error_response`
- Built with modular design for easier scalability and maintenance.
+    - `create_success_response`
---
+- **Error & Exception Management**:
    - `handle_exception`
-## System Architecture
+- **Data Operations**:
    - `get_researcher_by_id`
    - `format_sources_for_response`
    - `format_context_with_sources`
    - `store_research_results`
    - `create_research_prompt`
-### Core Components
+### Docker Support
 The included `Dockerfile` allows for simple containerized deployment:
 - Uses a lightweight Python 3.13 image.
 - Installs required dependencies from `requirements.txt`.
 - Configures the application to run via `server.py` on port `8000` using `CMD ["python", "server.py"]`.
-1. **FastAPI App (`main.py`)**:
+## Setup and Usage
-    - Hosts the API endpoints.
+### Prerequisites
-    - Handles API Key authentication for secure use.
+- Python 3.13 or later.
-    - Accepts user inputs (query and report type) and generates a chunked streaming response.
+- `pip` for dependency management.
 - Docker (optional, for containerized deployment).
-2. **Research Logic (`deepresearch.py`)**:
+### Installation
-    - Encapsulates research and report generation.
+1. Clone this repository.
-    - Utilizes `GPTResearcher` to conduct research, generate reports, and retrieve extended data like images, contexts, or costs.
+2. Install dependencies:
-
+``` bash
-3. **Docker Integration**:
+   pip install -r requirements.txt
    - The application is containerized with a well-defined `Dockerfile`.
    - Includes dependency installation, environment setup, and FastAPI server configuration for rapid deployment.
 ---
 ## Prerequisites
 Before running the application, ensure the following are installed on your system:
 - **Docker**: Version 24.0+
 - **Python**: Version 3.13+
 - **pip**: Pre-installed Python package manager.
 ---
 ## Running the Application Locally
 ### Cloning the Repository
 Clone the repository to a directory of your choice:
 ```shell script
 git clone https://git.kvant.cloud/phoenix/gpt-researcher.git
 cd gpt-researcher
 ```
-
+1. Run the application:
-### Environment Variable Configuration
+``` bash
-
+   python server.py
 Create a `.env` file in the root of the project and define:
 ```
-API_KEY=your_api_key # Replace "your_api_key" with your desired key
+### Using Docker
-OPENAI_BASE_URL=
+Build and run the application as a Docker container:
-OPENAI_API_KEY=
+1. Build the Docker image:
-EMBEDDING=
+``` bash
-FAST_LLM=
+   docker build -t research-app .
 SMART_LLM=
 STRATEGIC_LLM=
 OPENAI_API_VERSION=
 SERPER_API_KEY=
 RETRIEVER=serper
 ```
-
+1. Run the Docker container:
-### Installing Dependencies
+``` bash
-
+   docker run -p 8000:8000 research-app
 Install the required Python modules based on the generated `requirements.txt`.
 ```shell script
 pip install --no-cache-dir -r requirements.txt
 ```
-
+The application will be accessible at `http://localhost:8000`.
-### Running the App
+## Folder Structure
-
+``` 
-Run the FastAPI app locally:
+|-- src/
-
+    |-- server.py          # Main server logic
-```shell script
+    |-- utils.py           # Reusable utility functions
-uvicorn main:app --host 0.0.0.0 --port 8000
+|-- Dockerfile             # Containerization setup
 |-- requirements.txt       # Dependencies file
 |-- README.md              # Documentation (this file)
 ```
 After running, your app will be available at `http://127.0.0.1:8000`.
 ---
 ## Using Docker for Deployment
 ### Building the Docker Image
 Build the Docker image using the **Dockerfile** provided:
 ```shell script
 docker build -t fastapi-report-service .
 ```
 ### Running the Docker Container
 Spin up a container and map FastAPI's default port, `8000`:
 ```shell script
 docker run --env-file .env -p 8000:8000 fastapi-report-service
 ```
 ---
 ## API Usage
 ### 1. **`/get_report`**
 - **Method**: `POST`
 - **Description**: Generates a report based on user input.
 - **Headers**:
    - `X-API-KEY`: API Key for authentication.
 - **Request Body** (`JSON`):
 ```json
 {
  "query": "Research on AI in healthcare",
  "report_type": "research_report|resource_report|outline_report|custom_report|detailed_report|subtopic_report|deep"
 }
 ```
 - **Streaming Response**: Research and report are provided in chunks.
 ---
 ## Code Structure
 ```
 ├── Dockerfile                   # Configuration for Dockerizing the application
 ├── requirements.txt             # Python dependencies list
 ├── main.py                      # FastAPI server entry point
 ├── deepresearch.py              # Research-related logic and GPTResearcher integration
 └── src/                         # Other project files and assets
 ```
 ---
 ## Features Under the Hood
 1. **Authentication**:
    - An API key mechanism ensures that only authorized users can access endpoints.
 2. **Streaming Response**:
    - Large research reports are sent incrementally using `StreamingResponse` for better experience and efficiency.
 3. **Modular Research Logic**:
    - Research and generation tasks are handled by a dedicated class (`ReportGenerator`), making the application extensible.
 ---
 ## Future Enhancements
 - **Asynchronous Enhancements**:
    - Improve async handling for long-running queries.
 - **Database Integration**:
    - Save request history for auditing and reference purposes.
 - **Web Interface**:
    - A user-friendly web application for interacting with the API.
 ---
 ## Contributing
 Contributions are welcome! Feel free to fork the repository, make updates, and submit a pull request.
--- a/requirements.txt
+++ b/requirements.txt
@ -1,5 +1,12 @@
-fastapi
+# GPT Researcher dependencies
-uvicorn
+gpt-researcher>=0.12.16
-pydantic
+python-dotenv
-gpt-researcher
+
-asyncio
+# MCP dependencies
 mcp>=1.6.0
 fastapi>=0.103.1
 uvicorn>=0.23.2
 pydantic>=2.3.0
 # Utility dependencies
 loguru>=0.7.0
--- a/src/init.py
+++ b/src/init.py
@ -0,0 +1,8 @@
 """
 GPT Researcher MCP Server
 This module provides an MCP server implementation for GPT Researcher,
 allowing AI assistants to perform web research and generate reports via the MCP protocol.
 """
 __version__ = "0.1.0"
--- a/src/main.py
+++ b/src/main.py
@ -1,55 +0,0 @@
 import uvicorn
 from fastapi import FastAPI, HTTPException, Request, Depends
 from pydantic import BaseModel
 from phoenix_technologies import ReportGenerator, CustomLogsHandler
 from fastapi.responses import StreamingResponse
 from typing import Dict, Any, AsyncGenerator, Coroutine, Generator
 import os
 import asyncio
 import time
 # FastAPI app instance
 app = FastAPI()
 # Define a request body structure using Pydantic
 class ReportRequest(BaseModel):
    query: str
    report_type: str
 # Shared log array using asyncio.Queue
 log_queue = asyncio.Queue()
 # Define a dependency to validate the API Key
 def verify_api_key(request: Request):
    # Define the API key from the environment variables
    expected_api_key = os.getenv("API_KEY", None)
    if not expected_api_key:
        raise HTTPException(
            status_code=500, detail="API key is not configured on the server."
        )
    # Get the API key from the request headers
    provided_api_key = request.headers.get("X-API-KEY", None)
    # Check if the API key is correct
    if not provided_api_key or provided_api_key != expected_api_key:
        raise HTTPException(status_code=403, detail="Invalid or missing API key.")
@app.post("/get_report", dependencies=[Depends(verify_api_key)])
 async def get_report_endpoint(request: ReportRequest):
    """
    Expose the `get_report` function as a POST API endpoint, with a streaming response.
    """
    def fake_data_streamer():
        for i in range(5):
            yield f"My custom Log: {i}"
            time.sleep(5)
    # Return streaming response
    return StreamingResponse(fake_data_streamer(), media_type="text/plain")
 if __name__ == "__main__":
    uvicorn.run(app='main:app', host="127.0.0.1", port=8000)
--- a/src/phoenix_technologies/init.py
+++ b/src/phoenix_technologies/init.py
@ -1,4 +0,0 @@
 # phoenix-technologies/__init__.py
 from .gptresearch.deepresearch import ReportGenerator, CustomLogsHandler
 __all__ = ["ReportGenerator", "CustomLogsHandler"]
--- a/src/phoenix_technologies/gptresearch/init.py
+++ b/src/phoenix_technologies/gptresearch/init.py
--- a/src/phoenix_technologies/gptresearch/deepresearch.py
+++ b/src/phoenix_technologies/gptresearch/deepresearch.py
@ -1,53 +0,0 @@
 from gpt_researcher import GPTResearcher
 from typing import Dict, Any, AsyncGenerator, Coroutine
 class CustomLogsHandler:
    """A custom Logs handler class to handle JSON data."""
    def __init__(self):
        self.logs = []  # Initialize logs to store data
    async def send_json(self, data: Dict[str, Any]) -> None:
        """Send JSON data and log it."""
        self.logs.append(data)  # Append data to logs
        print(f"My custom Log: {data}")  # For demonstration, print the log
 class ReportGenerator:
    def __init__(self, query: str, report_type: str):
        """
        Initializes the ReportGenerator with a query and report type.
        """
        self.query = query
        self.report_type = report_type
        # Initialize researcher with a custom WebSocket
        self.custom_logs_handler = CustomLogsHandler()
        self.complete = False
        self.researcher = GPTResearcher(query, report_type, websocket=self.custom_logs_handler)
    def init(self) -> CustomLogsHandler:
        return self.custom_logs_handler
    async def generate_report(self) -> None:
        """
        Conducts research and generates the report along with additional information.
        """
        # Conduct research
        research_result = await self.researcher.conduct_research()
        report = await self.researcher.write_report()
        # Retrieve additional information
        research_context = self.researcher.get_research_context()
        research_costs = self.researcher.get_costs()
        research_images = self.researcher.get_research_images()
        research_sources = self.researcher.get_research_sources()
        self.complete = True
    def get_query_details(self):
        """
        Returns details of the query and report type.
        """
        return {
            "query": self.query,
            "report_type": self.report_type
        }
--- a/src/server.py
+++ b/src/server.py
@ -0,0 +1,261 @@
 """
 GPT Researcher MCP Server
 This script implements an MCP server for GPT Researcher, allowing AI assistants
 to conduct web research and generate reports via the MCP protocol.
 """
 import os
 import sys
 import uuid
 import logging
 from typing import Dict, Any, Optional
 from dotenv import load_dotenv
 from mcp.server.fastmcp import FastMCP
 from gpt_researcher import GPTResearcher
 # Load environment variables
 load_dotenv()
 from utils import (
    research_store,
    create_success_response,
    handle_exception,
    get_researcher_by_id,
    format_sources_for_response,
    format_context_with_sources,
    store_research_results,
    create_research_prompt
 )
 logging.basicConfig(
    level=logging.INFO,
    format='[%(asctime)s][%(levelname)s] - %(message)s',
 )
 logger = logging.getLogger(__name__)
 # Initialize FastMCP server
 mcp = FastMCP("GPT Researcher")
 # Initialize researchers dictionary
 if not hasattr(mcp, "researchers"):
    mcp.researchers = {}
@mcp.resource("research://{topic}")
 async def research_resource(topic: str) -> str:
    """
    Provide research context for a given topic directly as a resource.
    This allows LLMs to access web-sourced information without explicit function calls.
    Args:
        topic: The research topic or query
    Returns:
        String containing the research context with source information
    """
    # Check if we've already researched this topic
    if topic in research_store:
        logger.info(f"Returning cached research for topic: {topic}")
        return research_store[topic]["context"]
    # If not, conduct the research
    logger.info(f"Conducting new research for resource on topic: {topic}")
    # Initialize GPT Researcher
    researcher = GPTResearcher(topic)
    try:
        # Conduct the research
        await researcher.conduct_research()
        # Get the context and sources
        context = researcher.get_research_context()
        sources = researcher.get_research_sources()
        source_urls = researcher.get_source_urls()
        # Format with sources included
        formatted_context = format_context_with_sources(topic, context, sources)
        # Store for future use
        store_research_results(topic, context, sources, source_urls, formatted_context)
        return formatted_context
    except Exception as e:
        return f"Error conducting research on '{topic}': {str(e)}"
@mcp.tool()
 async def deep_research(query: str) -> Dict[str, Any]:
    """
    Conduct a deep web research on a given query using GPT Researcher.
    Use this tool when you need time-sensitive, real-time information like stock prices, news, people, specific knowledge, etc.
    You must include citations that back your responses when using this tool.
    Args:
        query: The research query or topic
    Returns:
        Dict containing research status, ID, and the actual research context and sources
        that can be used directly by LLMs for context enrichment
    """
    logger.info(f"Conducting research on query: {query}...")
    # Generate a unique ID for this research session
    research_id = str(uuid.uuid4())
    # Initialize GPT Researcher
    researcher = GPTResearcher(query)
    # Start research
    try:
        await researcher.conduct_research()
        mcp.researchers[research_id] = researcher
        logger.info(f"Research completed for ID: {research_id}")
        # Get the research context and sources
        context = researcher.get_research_context()
        sources = researcher.get_research_sources()
        source_urls = researcher.get_source_urls()
        # Store in the research store for the resource API
        store_research_results(query, context, sources, source_urls)
        return create_success_response({
            "research_id": research_id,
            "query": query,
            "source_count": len(sources),
            "context": context,
            "sources": format_sources_for_response(sources),
            "source_urls": source_urls
        })
    except Exception as e:
        return handle_exception(e, "Research")
@mcp.tool()
 async def write_report(research_id: str, custom_prompt: Optional[str] = None) -> Dict[str, Any]:
    """
    Generate a report based on previously conducted research.
    Args:
        research_id: The ID of the research session from conduct_research
        custom_prompt: Optional custom prompt for report generation
    Returns:
        Dict containing the report content and metadata
    """
    success, researcher, error = get_researcher_by_id(mcp.researchers, research_id)
    if not success:
        return error
    logger.info(f"Generating report for research ID: {research_id}")
    try:
        # Generate report
        report = await researcher.write_report(custom_prompt=custom_prompt)
        # Get additional information
        sources = researcher.get_research_sources()
        costs = researcher.get_costs()
        return create_success_response({
            "report": report,
            "source_count": len(sources),
            "costs": costs
        })
    except Exception as e:
        return handle_exception(e, "Report generation")
@mcp.tool()
 async def get_research_sources(research_id: str) -> Dict[str, Any]:
    """
    Get the sources used in the research.
    Args:
        research_id: The ID of the research session
    Returns:
        Dict containing the research sources
    """
    success, researcher, error = get_researcher_by_id(mcp.researchers, research_id)
    if not success:
        return error
    sources = researcher.get_research_sources()
    source_urls = researcher.get_source_urls()
    return create_success_response({
        "sources": format_sources_for_response(sources),
        "source_urls": source_urls
    })
@mcp.tool()
 async def get_research_context(research_id: str) -> Dict[str, Any]:
    """
    Get the full context of the research.
    Args:
        research_id: The ID of the research session
    Returns:
        Dict containing the research context
    """
    success, researcher, error = get_researcher_by_id(mcp.researchers, research_id)
    if not success:
        return error
    context = researcher.get_research_context()
    return create_success_response({
        "context": context
    })
@mcp.prompt()
 def research_query(topic: str, goal: str, report_format: str = "research_report") -> str:
    """
    Create a research query prompt for GPT Researcher.
    Args:
        topic: The topic to research
        goal: The goal or specific question to answer
        report_format: The format of the report to generate
    Returns:
        A formatted prompt for research
    """
    return create_research_prompt(topic, goal, report_format)
 def run_server():
    """Run the MCP server using FastMCP's built-in event loop handling."""
    # Check if API keys are set
    if not os.getenv("OPENAI_API_KEY"):
        logger.error("OPENAI_API_KEY not found. Please set it in your .env file.")
        return
    # Add startup message
    logger.info("Starting GPT Researcher MCP Server...")
    print("🚀 GPT Researcher MCP Server starting... Check researcher_mcp_server.log for details")
    # Let FastMCP handle the event loop
    try:
        mcp.run("sse")
        # Note: If we reach here, the server has stopped
        logger.info("MCP Server has stopped")
    except Exception as e:
        logger.error(f"Error running MCP server: {str(e)}")
        print(f"❌ MCP Server error: {str(e)}")
        return
    print("✅ MCP Server stopped")
 if __name__ == "__main__":
    # Use the non-async approach to avoid asyncio nesting issues
    run_server()
--- a/src/utils.py
+++ b/src/utils.py
@ -0,0 +1,139 @@
 """
 GPT Researcher MCP Server Utilities
 This module provides utility functions and helpers for the GPT Researcher MCP Server.
 """
 import sys
 from typing import Dict, List, Optional, Tuple, Any
 from loguru import logger
 # Configure logging for console only (no file logging)
 logger.configure(handlers=[{"sink": sys.stderr, "level": "INFO"}])
 # Research store to track ongoing research topics and contexts
 research_store = {}
 # API Response Utilities
 def create_error_response(message: str) -> Dict[str, Any]:
    """Create a standardized error response"""
    return {"status": "error", "message": message}
 def create_success_response(data: Dict[str, Any]) -> Dict[str, Any]:
    """Create a standardized success response"""
    return {"status": "success", **data}
 def handle_exception(e: Exception, operation: str) -> Dict[str, Any]:
    """Handle exceptions in a consistent way"""
    error_message = str(e)
    logger.error(f"{operation} failed: {error_message}")
    return create_error_response(error_message)
 def get_researcher_by_id(researchers_dict: Dict, research_id: str) -> Tuple[bool, Any, Dict[str, Any]]:
    """
    Helper function to retrieve a researcher by ID.
    Args:
        researchers_dict: Dictionary of research objects
        research_id: The ID of the research session
    Returns:
        Tuple containing (success, researcher_object, error_response)
    """
    if not researchers_dict or research_id not in researchers_dict:
        return False, None, create_error_response("Research ID not found. Please conduct research first.")
    return True, researchers_dict[research_id], {}
 def format_sources_for_response(sources: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Format source information for API responses.
    Args:
        sources: List of source dictionaries
    Returns:
        Formatted source list for API responses
    """
    return [
        {
            "title": source.get("title", "Unknown"),
            "url": source.get("url", ""),
            "content_length": len(source.get("content", ""))
        }
        for source in sources
    ]
 def format_context_with_sources(topic: str, context: str, sources: List[Dict[str, Any]]) -> str:
    """
    Format research context with sources for display.
    Args:
        topic: Research topic
        context: Research context
        sources: List of sources
    Returns:
        Formatted context string with sources
    """
    formatted_context = f"## Research: {topic}\n\n{context}\n\n"
    formatted_context += "## Sources:\n"
    for i, source in enumerate(sources):
        formatted_context += f"{i+1}. {source.get('title', 'Unknown')}: {source.get('url', '')}\n"
    return formatted_context
 def store_research_results(topic: str, context: str, sources: List[Dict[str, Any]],
                           source_urls: List[str], formatted_context: Optional[str] = None):
    """
    Store research results in the research store.
    Args:
        topic: Research topic
        context: Research context
        sources: List of sources
        source_urls: List of source URLs
        formatted_context: Optional pre-formatted context
    """
    research_store[topic] = {
        "context": formatted_context or context,
        "sources": sources,
        "source_urls": source_urls
    }
 def create_research_prompt(topic: str, goal: str, report_format: str = "research_report") -> str:
    """
    Create a research query prompt for GPT Researcher.
    Args:
        topic: The topic to research
        goal: The goal or specific question to answer
        report_format: The format of the report to generate
    Returns:
        A formatted prompt for research
    """
    return f"""
    Please research the following topic: {topic}
    Goal: {goal}
    You have two methods to access web-sourced information:
    1. Use the "research://{topic}" resource to directly access context about this topic if it exists
       or if you want to get straight to the information without tracking a research ID.
    2. Use the conduct_research tool to perform new research and get a research_id for later use.
       This tool also returns the context directly in its response, which you can use immediately.
    After getting context, you can:
    - Use it directly in your response
    - Use the write_report tool with a custom prompt to generate a structured {report_format}
    You can also use get_research_sources to view additional details about the information sources.
    """