/threadsrecon

OSINT Tool for threads.net

Primary LanguagePythonOtherNOASSERTION

threadsrecon

OSINT Tool for threads.net

Introduction

This tool is designed to scrape, analyze, visualize data, and generate reports from threads.net. It is built using Python and leverages several open source Python libraries.

Features

  • Profile Analysis: Detailed user profile information and statistics
  • Content Scraping: Automated collection of posts, replies, and media
  • Engagement Analytics: Track likes, replies, and interaction patterns
  • Sentiment Analysis: Analyze emotional tone and content sentiment
  • Network Visualization: Map connections and hashtag relationships
  • Alert System: Real-time monitoring with Telegram notifications
  • Custom Reporting: Generate comprehensive PDF reports
  • Data Export: Export findings in JSON format

Technologies & Libraries

  • Selenium: Web automation and data scraping
  • BeautifulSoup4: HTML parsing and data extraction
  • Pandas: Data manipulation and analysis
  • NLTK: Natural language processing and sentiment analysis
  • NetworkX: Network analysis and visualization
  • Matplotlib/Plotly: Data visualization and charting
  • python-telegram-bot: Telegram bot integration
  • pdfkit/wkhtmltopdf: PDF report generation
  • PyYAML: Configuration management
  • requests: HTTP requests handling
  • chromium/chromedriver: Browser automation
  • logging: Debug and error logging

Requirements

  • Python 3.8+
  • 2GB RAM minimum
  • Unix-based OS or Windows 10+
  • Google Chrome/Chromium with appropriate chromedriver version 90+
  • Telegram bot
  • wkhtmltopdf installed

Installation

Generic

Install chromedriver for your chrome version and OS.

Crate your Telegram bot and obtain your bot token and chat ID.

Install wkhtmltopdf for your OS.

macOS chromedriver (via homebrew)

brew install chromedriver
xattr -d com.apple.quarantine /opt/homebrew/bin/chromedriver

Install the required libraries for python:

python3 -m pip install -r requirements.txt

Quick Start

asciicast

# Clone the repository
git clone https://github.com/offseq/threadsrecon.git
cd threadsrecon

# Install dependencies
python3 -m pip install -r requirements.txt

# Create and configure settings.yaml
touch settings.yaml
nano settings.yaml  #Paste and edit with your settings

# Run the tool
python main.py all

Docker

Prerequisites

  • Docker installed
  • Docker Compose installed
  • 2GB RAM minimum (for Chrome in container)
  • Settings file configured (settings.yaml)

Installation & Usage

  1. Build the container:
docker-compose build
  1. Run specific commands:
#Run complete pipeline
docker-compose run threadsrecon all

#Run scrape only
docker-compose run threadsrecon scrape

#Run analyze only
docker-compose run threadsrecon analyze

#Run visualize only
docker-compose run threadsrecon visualize

#Run report only
docker-compose run threadsrecon report
  1. Run with specific user permissions:
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all

#Run scrape only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon scrape

#Run analyze only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon analyze

#Run visualize only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon visualize

#Run report only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon report

Docker Volume Mounts

  • ./settings.yaml:/app/settings.yaml:ro (read-only configuration)
  • ./data:/app/data (persistent data storage)

Environment Variables

  • PYTHONUNBUFFERED=1 (unbuffered Python output)
  • DISPLAY=:99 (for Chrome)
  • UID (container user ID)
  • GID (container group ID)

Configuration

Settings File Structure

The settings.yaml file contains all configuration parameters. Key sections include:

  • Credentials: Authentication details
  • ScraperSettings: Scraping parameters and browser configuration
  • AnalysisSettings: Data processing and output preferences
  • WarningSystem: Alert configuration
  • ReportGeneration: Report generation settings

Create settings.yaml file in the root directory.

Example configuration:

Credentials:  # if not set, anonymous access will be used
  instagram_username: your_username
  instagram_password: your_password

ScraperSettings:
  base_url: https://www.threads.net
  chromedriver: ./chromedriver  # path to chromedriver or /usr/local/bin/chromedriver for docker
  usernames:
    - target_username
    - target_username2
  timeouts:
    page_load: 20
    element_wait: 10
  retries:
    max_attempts: 3
    initial_delay: 1
  delays:
    min_wait: 1
    max_wait: 3
  user_agents:
    - 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
    - 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
    - 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
  browser_options:
    headless: true # False for debugging
    window_size:
      width: 1920
      height: 1080
    disabled_features:
      - gpu
      - sandbox
      - dev-shm-usage
      - extensions
      - infobars
      - logging
      - popup-blocking
      
AnalysisSettings:
 input_file: data/profiles.json
 archive_file: data/archived_profiles.json
 output_file: data/analyzed_profiles.json
 visualization_dir: data/visualizations
 keywords: 
  - keyword1
  - keyword2
 date_range: 
  start: null  # or "2024-01-01"
  end: null    # or "2024-12-31"

WarningSystem:
  token: your_telegram_bot_token
  chat_id: your_chat_id
  priority_keywords:
    HIGH:
      - "urgent"
      - "emergency"
      - "critical"
    MEDIUM:
      - "important"
      - "attention"
      - "warning"
    LOW:
      - "update"
      - "info"
      - "notice"

ReportGeneration:
# Docker environment path /usr/bin/wkhtmltopdf 
 path_to_wkhtmltopdf: your\path\to\wkhtmltopdf.exe # Example location: C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe
 output_path: data/reports/report.pdf

Running

Run complete pipeline.

python main.py all 

Only scrape data.

asciicast

python main.py scrape  

Only analyze existing data.

asciicast

python main.py analyze 

Generate visualizations.

asciicast

python main.py visualize 

Create PDF report.

asciicast

python main.py report  

Security Considerations

  • This tool respects threads.net's robots.txt.
  • Data collected should be used in accordance with local privacy laws.
  • Recommend not scraping with your personal account.
  • Consider using a VPN and running in a virtual environment when collecting data.
  • Credential information is stored securely in settings.yaml.

Troubleshooting

Common issues and solutions:

  1. ChromeDriver version mismatch
Error: "ChromeDriver only supports Chrome version XX"
Solution: Download matching ChromeDriver version from https://sites.google.com/chromium.org/driver/downloads
  1. Authentication Issues
Error: "Account requires additional verification"
Solution: Log into Instagram manually first and complete verification

Error: "Suspicious login attempt detected"
Solution: Verify your account manually and try again

Error: "Account has been temporarily blocked"
Solution: Wait for the block to expire or use anonymous access
  1. Network and Connection Issues
Error: "Connection timed out while accessing threads.net"
Solution: Check your internet connection and try again

Error: "Could not resolve the host name"
Solution: Verify the URL and DNS settings

Error: "Connection refused by threads.net"
Solution: The server might be down or blocking requests. Try using a VPN

Error: "Proxy connection failed"
Solution: Check your proxy settings or disable proxy
  1. Scraping Issues
Error: "Required element not found"
Solution: The page structure might have changed. Update selectors

Error: "Element is no longer attached to the DOM"
Solution: Page was updated during scraping. Increase wait times

Error: "Could not interact with element"
Solution: Element might be covered. Try running in non-headless mode
  1. Data Processing Issues
Error: "Module not found"
Solution: Run `pip install -r requirements.txt`

Error: "Failed to process sentiment analysis"
Solution: Run `python -c "import nltk; nltk.download('vader_lexicon')"`
  1. Report Generation Issues
Error: "wkhtmltopdf not found"
Solution: Install wkhtmltopdf and update path in settings.yaml

Error: "Failed to generate PDF"
Solution: Check write permissions in output directory
  1. Warning System Issues
Error: "Failed to send Telegram alert"
Solution: Verify bot token and chat ID in settings.yaml

Error: "Message too long"
Solution: Alerts are automatically truncated to 200 characters
  1. Configuration Issues
Error: "Invalid settings"
Solution: Validate settings.yaml against example configuration

Error: "Missing required field"
Solution: Check all required fields are present in settings.yaml
  1. Docker Issues
Error: "docker.errors.DockerException: Error while fetching server API version"
Solution: Ensure Docker daemon is running with `sudo systemctl start docker`

Error: "Error response from daemon: OCI runtime create failed"
Solution: Ensure sufficient memory (2GB minimum) and check Docker permissions

Error: "chrome_driver.exceptions.WebDriverException: unknown error: Chrome failed to start: crashed"
Solution: Add or increase shm-size in docker-compose.yml:
```yaml
services:
  threadsrecon:
    shm_size: '2gb'
```

Error: "Permission denied: '/app/data'"
Solution: Run with correct UID/GID:
```bash
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all
```

Error: "wkhtmltopdf not found"
Solution: In settings.yaml, set path_to_wkhtmltopdf to "/usr/bin/wkhtmltopdf"
  1. Container Resource Issues
Error: "Container killed due to memory limit"
Solution: Increase Docker memory limit or reduce Chrome instances

Error: "No space left on device"
Solution: Clean up unused Docker images and volumes:
```bash
docker system prune -a
docker volume prune
```

For persistent issues:

  1. Check the logs in the data directory
  2. Try running in non-headless mode for debugging
  3. Clear browser cache and cookies
  4. Verify all dependencies are installed correctly
  5. Ensure you have the latest version of Chrome installed