OSINT Tool for threads.net
This tool is designed to scrape, analyze, visualize data, and generate reports from threads.net. It is built using Python and leverages several open source Python libraries.
- Profile Analysis: Detailed user profile information and statistics
- Content Scraping: Automated collection of posts, replies, and media
- Engagement Analytics: Track likes, replies, and interaction patterns
- Sentiment Analysis: Analyze emotional tone and content sentiment
- Network Visualization: Map connections and hashtag relationships
- Alert System: Real-time monitoring with Telegram notifications
- Custom Reporting: Generate comprehensive PDF reports
- Data Export: Export findings in JSON format
- Selenium: Web automation and data scraping
- BeautifulSoup4: HTML parsing and data extraction
- Pandas: Data manipulation and analysis
- NLTK: Natural language processing and sentiment analysis
- NetworkX: Network analysis and visualization
- Matplotlib/Plotly: Data visualization and charting
- python-telegram-bot: Telegram bot integration
- pdfkit/wkhtmltopdf: PDF report generation
- PyYAML: Configuration management
- requests: HTTP requests handling
- chromium/chromedriver: Browser automation
- logging: Debug and error logging
- Python 3.8+
- 2GB RAM minimum
- Unix-based OS or Windows 10+
- Google Chrome/Chromium with appropriate chromedriver version 90+
- Telegram bot
- wkhtmltopdf installed
Install chromedriver for your chrome version and OS.
Crate your Telegram bot and obtain your bot token and chat ID.
Install wkhtmltopdf for your OS.
macOS chromedriver (via homebrew)
brew install chromedriver
xattr -d com.apple.quarantine /opt/homebrew/bin/chromedriverInstall the required libraries for python:
python3 -m pip install -r requirements.txt# Clone the repository
git clone https://github.com/offseq/threadsrecon.git
cd threadsrecon
# Install dependencies
python3 -m pip install -r requirements.txt
# Create and configure settings.yaml
touch settings.yaml
nano settings.yaml #Paste and edit with your settings
# Run the tool
python main.py all- Docker installed
- Docker Compose installed
- 2GB RAM minimum (for Chrome in container)
- Settings file configured (settings.yaml)
- Build the container:
docker-compose build- Run specific commands:
#Run complete pipeline
docker-compose run threadsrecon all
#Run scrape only
docker-compose run threadsrecon scrape
#Run analyze only
docker-compose run threadsrecon analyze
#Run visualize only
docker-compose run threadsrecon visualize
#Run report only
docker-compose run threadsrecon report- Run with specific user permissions:
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all
#Run scrape only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon scrape
#Run analyze only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon analyze
#Run visualize only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon visualize
#Run report only
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon report./settings.yaml:/app/settings.yaml:ro(read-only configuration)./data:/app/data(persistent data storage)
PYTHONUNBUFFERED=1(unbuffered Python output)DISPLAY=:99(for Chrome)UID(container user ID)GID(container group ID)
The settings.yaml file contains all configuration parameters. Key sections include:
- Credentials: Authentication details
- ScraperSettings: Scraping parameters and browser configuration
- AnalysisSettings: Data processing and output preferences
- WarningSystem: Alert configuration
- ReportGeneration: Report generation settings
Create settings.yaml file in the root directory.
Example configuration:
Credentials: # if not set, anonymous access will be used
instagram_username: your_username
instagram_password: your_password
ScraperSettings:
base_url: https://www.threads.net
chromedriver: ./chromedriver # path to chromedriver or /usr/local/bin/chromedriver for docker
usernames:
- target_username
- target_username2
timeouts:
page_load: 20
element_wait: 10
retries:
max_attempts: 3
initial_delay: 1
delays:
min_wait: 1
max_wait: 3
user_agents:
- 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
- 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36'
- 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36'
browser_options:
headless: true # False for debugging
window_size:
width: 1920
height: 1080
disabled_features:
- gpu
- sandbox
- dev-shm-usage
- extensions
- infobars
- logging
- popup-blocking
AnalysisSettings:
input_file: data/profiles.json
archive_file: data/archived_profiles.json
output_file: data/analyzed_profiles.json
visualization_dir: data/visualizations
keywords:
- keyword1
- keyword2
date_range:
start: null # or "2024-01-01"
end: null # or "2024-12-31"
WarningSystem:
token: your_telegram_bot_token
chat_id: your_chat_id
priority_keywords:
HIGH:
- "urgent"
- "emergency"
- "critical"
MEDIUM:
- "important"
- "attention"
- "warning"
LOW:
- "update"
- "info"
- "notice"
ReportGeneration:
# Docker environment path /usr/bin/wkhtmltopdf
path_to_wkhtmltopdf: your\path\to\wkhtmltopdf.exe # Example location: C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe
output_path: data/reports/report.pdfRun complete pipeline.
python main.py all Only scrape data.
python main.py scrape Only analyze existing data.
python main.py analyze Generate visualizations.
python main.py visualize Create PDF report.
python main.py report - This tool respects threads.net's robots.txt.
- Data collected should be used in accordance with local privacy laws.
- Recommend not scraping with your personal account.
- Consider using a VPN and running in a virtual environment when collecting data.
- Credential information is stored securely in settings.yaml.
Common issues and solutions:
- ChromeDriver version mismatch
Error: "ChromeDriver only supports Chrome version XX"
Solution: Download matching ChromeDriver version from https://sites.google.com/chromium.org/driver/downloads
- Authentication Issues
Error: "Account requires additional verification"
Solution: Log into Instagram manually first and complete verification
Error: "Suspicious login attempt detected"
Solution: Verify your account manually and try again
Error: "Account has been temporarily blocked"
Solution: Wait for the block to expire or use anonymous access
- Network and Connection Issues
Error: "Connection timed out while accessing threads.net"
Solution: Check your internet connection and try again
Error: "Could not resolve the host name"
Solution: Verify the URL and DNS settings
Error: "Connection refused by threads.net"
Solution: The server might be down or blocking requests. Try using a VPN
Error: "Proxy connection failed"
Solution: Check your proxy settings or disable proxy
- Scraping Issues
Error: "Required element not found"
Solution: The page structure might have changed. Update selectors
Error: "Element is no longer attached to the DOM"
Solution: Page was updated during scraping. Increase wait times
Error: "Could not interact with element"
Solution: Element might be covered. Try running in non-headless mode
- Data Processing Issues
Error: "Module not found"
Solution: Run `pip install -r requirements.txt`
Error: "Failed to process sentiment analysis"
Solution: Run `python -c "import nltk; nltk.download('vader_lexicon')"`
- Report Generation Issues
Error: "wkhtmltopdf not found"
Solution: Install wkhtmltopdf and update path in settings.yaml
Error: "Failed to generate PDF"
Solution: Check write permissions in output directory
- Warning System Issues
Error: "Failed to send Telegram alert"
Solution: Verify bot token and chat ID in settings.yaml
Error: "Message too long"
Solution: Alerts are automatically truncated to 200 characters
- Configuration Issues
Error: "Invalid settings"
Solution: Validate settings.yaml against example configuration
Error: "Missing required field"
Solution: Check all required fields are present in settings.yaml
- Docker Issues
Error: "docker.errors.DockerException: Error while fetching server API version"
Solution: Ensure Docker daemon is running with `sudo systemctl start docker`
Error: "Error response from daemon: OCI runtime create failed"
Solution: Ensure sufficient memory (2GB minimum) and check Docker permissions
Error: "chrome_driver.exceptions.WebDriverException: unknown error: Chrome failed to start: crashed"
Solution: Add or increase shm-size in docker-compose.yml:
```yaml
services:
threadsrecon:
shm_size: '2gb'
```
Error: "Permission denied: '/app/data'"
Solution: Run with correct UID/GID:
```bash
sudo -E UID=$(id -u) GID=$(id -g) docker-compose run threadsrecon all
```
Error: "wkhtmltopdf not found"
Solution: In settings.yaml, set path_to_wkhtmltopdf to "/usr/bin/wkhtmltopdf"
- Container Resource Issues
Error: "Container killed due to memory limit"
Solution: Increase Docker memory limit or reduce Chrome instances
Error: "No space left on device"
Solution: Clean up unused Docker images and volumes:
```bash
docker system prune -a
docker volume prune
```
For persistent issues:
- Check the logs in the
datadirectory - Try running in non-headless mode for debugging
- Clear browser cache and cookies
- Verify all dependencies are installed correctly
- Ensure you have the latest version of Chrome installed