DeepSearch Experimental

An AI-powered research assistant that performs comprehensive web searches across multiple search engines and analyzes results using LLMs.

Features

Multi-engine search across Google, Bing, and Yahoo
Intelligent query generation using OpenAI's GPT models
Content summarization with local Ollama models
In-depth analysis using Google's Gemini AI
Concurrent web scraping with retry mechanisms
Structured output in Markdown format

How It Works?

Generates nine sub-queries using OpenAI's ChatGPT.
Distributes them across search engines: three queries are searched on Google, three on Bing, and three on Yahoo.
Aggregates all retrieved content and processes it with Google Gemini to generate a comprehensive research report.

Example on the following query : How to use search and AST to improve RAG for large codebases?

Missing Features

Support YouTube transcripts
Support online PDF documents
Support scraping of Reddit pages

Prerequisites

Java 17+
Chrome WebDriver
Ollama (for local summarization, optional)
API keys:
- OpenAI
- Google (Gemini)

Environment Variables

Create a .env file with:

OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
OPENAI_MODEL_NAME=gpt-3.5-turbo
GOOGLE_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
GEMINI_MODEL_NAME=gemini-2.0-pro-exp-02-05
OUTPUT_DIRECTORY=search_results

GENERATE_NEW_QUERIES=false
GENERATE_SUMMARIES=false

Installation

Clone the repository
Install dependencies with Maven
Install Ollama and the llama3.1 model (optional)
Set up environment variables

Usage

Run the main class:

java com.devoxx.agentic.Main

Enter your research query when prompted. The program will:

Generate optimized sub-queries
Search across multiple engines
Scrape and analyze content
Generate summaries (if enabled)
Create a comprehensive report

Results are saved in Markdown format in the specified output directory.

Architecture

llm/ - LLM client implementations (OpenAI, Gemini, Ollama)
web/ - Web scraping and search functionality
search/ - Search engine specific implementations
model/ - Data models and content storage
util/ - Utility classes for retry logic and result writing

Contributing

Contributions welcome! Please read our contributing guidelines and submit pull requests.

License

This project is licensed under the MIT License.

stephanj/DeepSearch