An AI-powered research assistant that performs comprehensive web searches across multiple search engines and analyzes results using LLMs.
- Multi-engine search across Google, Bing, and Yahoo
- Intelligent query generation using OpenAI's GPT models
- Content summarization with local Ollama models
- In-depth analysis using Google's Gemini AI
- Concurrent web scraping with retry mechanisms
- Structured output in Markdown format
- Generates nine sub-queries using OpenAI's ChatGPT.
- Distributes them across search engines: three queries are searched on Google, three on Bing, and three on Yahoo.
- Aggregates all retrieved content and processes it with Google Gemini to generate a comprehensive research report.
Example on the following query : How to use search and AST to improve RAG for large codebases?
- Support YouTube transcripts
- Support online PDF documents
- Support scraping of Reddit pages
- Java 17+
- Chrome WebDriver
- Ollama (for local summarization, optional)
- API keys:
- OpenAI
- Google (Gemini)
Create a .env
file with:
OPENAI_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
OPENAI_MODEL_NAME=gpt-3.5-turbo
GOOGLE_API_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
GEMINI_MODEL_NAME=gemini-2.0-pro-exp-02-05
OUTPUT_DIRECTORY=search_results
GENERATE_NEW_QUERIES=false
GENERATE_SUMMARIES=false
- Clone the repository
- Install dependencies with Maven
- Install Ollama and the llama3.1 model (optional)
- Set up environment variables
Run the main class:
java com.devoxx.agentic.Main
Enter your research query when prompted. The program will:
- Generate optimized sub-queries
- Search across multiple engines
- Scrape and analyze content
- Generate summaries (if enabled)
- Create a comprehensive report
Results are saved in Markdown format in the specified output directory.
llm/
- LLM client implementations (OpenAI, Gemini, Ollama)web/
- Web scraping and search functionalitysearch/
- Search engine specific implementationsmodel/
- Data models and content storageutil/
- Utility classes for retry logic and result writing
Contributions welcome! Please read our contributing guidelines and submit pull requests.
This project is licensed under the MIT License.