threadsrecon

OSINT Tool for threads.net

Introduction

This tool is designed to scrape, analyze, visualize data, and generate reports from threads.net. It is built using Python and leverages several open source Python libraries.

Features

Profile Analysis: Detailed user profile information and statistics
Content Scraping: Automated collection of posts, replies, and media
Engagement Analytics: Track likes, replies, and interaction patterns
Sentiment Analysis: Analyze emotional tone and content sentiment
Network Visualization: Map connections and hashtag relationships
Alert System: Real-time monitoring with Telegram notifications
Custom Reporting: Generate comprehensive PDF reports
Data Export: Export findings in JSON format

Technologies & Libraries

Selenium: Web automation and data scraping
BeautifulSoup4: HTML parsing and data extraction
Pandas: Data manipulation and analysis
NLTK: Natural language processing and sentiment analysis
NetworkX: Network analysis and visualization
Matplotlib/Plotly: Data visualization and charting
python-telegram-bot: Telegram bot integration
pdfkit/wkhtmltopdf: PDF report generation
PyYAML: Configuration management
requests: HTTP requests handling
chromium/chromedriver: Browser automation
logging: Debug and error logging

Requirements

Python 3.8+
2GB RAM minimum
Unix-based OS or Windows 10+
Google Chrome/Chromium with appropriate chromedriver version 90+
Telegram bot
wkhtmltopdf installed

Installation

Generic

Install chromedriver for your chrome version and OS.

Crate your Telegram bot and obtain your bot token and chat ID.

Install wkhtmltopdf for your OS.

macOS chromedriver (via homebrew)

brew install chromedriver
xattr -d com.apple.quarantine /opt/homebrew/bin/chromedriver

Install the required libraries for python:

python3 -m pip install -r requirements.txt

Quick Start

# Clone the repository
git clone https://github.com/offseq/threadsrecon.git
cd threadsrecon

# Install dependencies
python3 -m pip install -r requirements.txt

# Create and configure settings.yaml
touch settings.yaml
nano settings.yaml  #Paste and edit with your settings

# Run the tool
python main.py all

Docker

Prerequisites

Docker installed
Docker Compose installed
2GB RAM minimum (for Chrome in container)
Settings file configured (settings.yaml)

Installation & Usage

Build the container:

docker-compose build

Run specific commands:

#Run complete pipeline
docker-compose run threadsrecon all

#Run scrape only
docker-compose run threadsrecon scrape

#Run analyze only
docker-compose run threadsrecon analyze

#Run visualize only
docker-compose run threadsrecon visualize

#Run report only
docker-compose run threadsrecon report

Run with specific user permissions:

sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all

#Run scrape only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon scrape

#Run analyze only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon analyze

#Run visualize only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon visualize

#Run report only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon report

Docker Volume Mounts

./settings.yaml:/app/settings.yaml:ro (read-only configuration)
./data:/app/data (persistent data storage)

Environment Variables

PYTHONUNBUFFERED=1 (unbuffered Python output)
DISPLAY=:99 (for Chrome)
UID (container user ID)
GID (container group ID)

Configuration

Settings File Structure

The settings.yaml file contains all configuration parameters. Key sections include:

Credentials: Authentication details
ScraperSettings: Scraping parameters and browser configuration
AnalysisSettings: Data processing and output preferences
WarningSystem: Alert configuration
ReportGeneration: Report generation settings

Create settings.yaml file in the root directory.

Example configuration:

Credentials:  # if not set, anonymous access will be used
  instagram_username: your_username
  instagram_password: your_password

ScraperSettings:
  base_url: https://www.threads.net
  chromedriver: ./chromedriver  # path to chromedriver or /usr/local/bin/chromedriver for docker
  usernames:
    - target_username
    - target_username2
  timeouts:
    page_load: 20
    element_wait: 10
  retries:
    max_attempts: 3
    initial_delay: 1
  delays:
    min_wait: 1
    max_wait: 3
  user_agents:
    - 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
    - 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
    - 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
  browser_options:
    headless: true # False for debugging
    window_size:
      width: 1920
      height: 1080
    disabled_features:
      - gpu
      - sandbox
      - dev-shm-usage
      - extensions
      - infobars
      - logging
      - popup-blocking
      
AnalysisSettings:
 input_file: data/profiles.json
 archive_file: data/archived_profiles.json
 output_file: data/analyzed_profiles.json
 visualization_dir: data/visualizations
 keywords: 
  - keyword1
  - keyword2
 date_range: 
  start: null  # or "2024-01-01"
  end: null    # or "2024-12-31"

WarningSystem:
  token: your_telegram_bot_token
  chat_id: your_chat_id
  priority_keywords:
    HIGH:
      - "urgent"
      - "emergency"
      - "critical"
    MEDIUM:
      - "important"
      - "attention"
      - "warning"
    LOW:
      - "update"
      - "info"
      - "notice"

ReportGeneration:
# Docker environment path /usr/bin/wkhtmltopdf 
 path_to_wkhtmltopdf: your\path\to\wkhtmltopdf.exe # Example location: C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe
 output_path: data/reports/report.pdf

Running

Run complete pipeline.

python main.py all

Only scrape data.

python main.py scrape

Only analyze existing data.

python main.py analyze

Generate visualizations.

python main.py visualize

Create PDF report.

python main.py report

Security Considerations

This tool respects threads.net's robots.txt.
Data collected should be used in accordance with local privacy laws.
Recommend not scraping with your personal account.
Consider using a VPN and running in a virtual environment when collecting data.
Credential information is stored securely in settings.yaml.

Troubleshooting

Common issues and solutions:

ChromeDriver version mismatch

Error: "ChromeDriver only supports Chrome version XX"
Solution: Download matching ChromeDriver version from https://sites.google.com/chromium.org/driver/downloads

Authentication Issues

Error: "Account requires additional verification"
Solution: Log into Instagram manually first and complete verification

Error: "Suspicious login attempt detected"
Solution: Verify your account manually and try again

Error: "Account has been temporarily blocked"
Solution: Wait for the block to expire or use anonymous access

Network and Connection Issues

Error: "Connection timed out while accessing threads.net"
Solution: Check your internet connection and try again

Error: "Could not resolve the host name"
Solution: Verify the URL and DNS settings

Error: "Connection refused by threads.net"
Solution: The server might be down or blocking requests. Try using a VPN

Error: "Proxy connection failed"
Solution: Check your proxy settings or disable proxy

Scraping Issues

Error: "Required element not found"
Solution: The page structure might have changed. Update selectors

Error: "Element is no longer attached to the DOM"
Solution: Page was updated during scraping. Increase wait times

Error: "Could not interact with element"
Solution: Element might be covered. Try running in non-headless mode

Data Processing Issues

Error: "Module not found"
Solution: Run `pip install -r requirements.txt`

Error: "Failed to process sentiment analysis"
Solution: Run `python -c "import nltk; nltk.download('vader_lexicon')"`

Report Generation Issues

Error: "wkhtmltopdf not found"
Solution: Install wkhtmltopdf and update path in settings.yaml

Error: "Failed to generate PDF"
Solution: Check write permissions in output directory

Warning System Issues

Error: "Failed to send Telegram alert"
Solution: Verify bot token and chat ID in settings.yaml

Error: "Message too long"
Solution: Alerts are automatically truncated to 200 characters

Configuration Issues

Error: "Invalid settings"
Solution: Validate settings.yaml against example configuration

Error: "Missing required field"
Solution: Check all required fields are present in settings.yaml

Docker Issues

Error: "docker.errors.DockerException: Error while fetching server API version"
Solution: Ensure Docker daemon is running with `sudo systemctl start docker`

Error: "Error response from daemon: OCI runtime create failed"
Solution: Ensure sufficient memory (2GB minimum) and check Docker permissions

Error: "chrome_driver.exceptions.WebDriverException: unknown error: Chrome failed to start: crashed"
Solution: Add or increase shm-size in docker-compose.yml:
```yaml
services:
  threadsrecon:
    shm_size: '2gb'
```

Error: "Permission denied: '/app/data'"
Solution: Run with correct UID/GID:
```bash
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all
```

Error: "wkhtmltopdf not found"
Solution: In settings.yaml, set path_to_wkhtmltopdf to "/usr/bin/wkhtmltopdf"

Container Resource Issues

Error: "Container killed due to memory limit"
Solution: Increase Docker memory limit or reduce Chrome instances

Error: "No space left on device"
Solution: Clean up unused Docker images and volumes:
```bash
docker system prune -a
docker volume prune
```

For persistent issues:

Check the logs in the data directory
Try running in non-headless mode for debugging
Clear browser cache and cookies
Verify all dependencies are installed correctly
Ensure you have the latest version of Chrome installed

ishideo/threadsrecon