DeepDoc is a tool that performs deep research on your local resources instead of the internet. It uses a research-style workflow to explore your documents, organize the findings, and generate a clear markdown report. This way, you can quickly uncover insights from your own files without manually digging through them.
- Start by uploading local resources (PDF, DOCX, JPG, TXT, etc.).
- The system extracts text and splits it into page-wise chunks.
- These chunks are stored in a vector database for semantic similarity search.
- Based on your instruction query, a content structure is generated.
- You can provide feedback to refine the structure.
- The tool then generates report sections and section topics.
- For each section, research agents:
- Generate knowledge for the section.
- Create research queries.
- Run search agents over the chunked local data.
- Use reflection agents to refine results.
- Generate final section content.
- Section-wise content is compiled and passed to a final report writer.
- The output is a complete, structured report in markdown format.
This diagram shows how Local DeepResearcher takes your local resources and instructions, processes and analyzes the content, and turns it into a structured report.
Follow these steps to set up and run the project locally.
uv
is required to manage the virtual environment and dependencies.
You can download it from the official uv GitHub repository, which includes platform-specific installation instructions.
git clone https://github.com/Datalore-ai/deepdoc.git
cd deepdoc
Use uv
to create a virtual environment:
uv venv
Activate the environment depending on your OS:
Windows:
.venv\Scripts\activate
macOS/Linux:
source .venv/bin/activate
Copy the example .env
file and add your API keys:
cp .env.example .env
Open the .env
file in a text editor and fill in the required fields:
MISTRAL_API_KEY=
TAVILY_API_KEY=
OPENAI_API_KEY=
# Default
QDRANT_URL=http://localhost:6333
COLLECTION_NAME=knowledge_base
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
QDRANT_DISABLE_THREADING=true # Don't change this
These keys are essential for the application to work correctly.
Install required packages using:
uv pip install -r requirements.txt
Make sure you have Docker and Docker Compose installed. Then start the required services (e.g., Qdrant) using:
docker-compose up --build
This will spin up the necessary services in the background.
Once the environment and services are ready, start the application:
python main.py
You're all set to go! The application will now guide you through the dataset creation process step by step and the final dataset will be saved in the output_files directory.
You can customize how the tool behaves using the configuration.py
file. It lets you adjust 2 parameters for this application.
import uuid
LLM_CONFIG = {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.5,
}
THREAD_CONFIG = {
"configurable": {
"thread_id": str(uuid.uuid4()),
"max_queries": 3,
"search_depth": 2,
"num_reflections": 2,
"n_points": 1,
}
}
If something here could be improved, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE
file for more details.