Degen Scraper

Pipeline for generating AI character files and training datasets by scraping public figures' online presence across Twitter and blogs.

⚠️ IMPORTANT: Create a new Twitter account for this tool. DO NOT use your main account as it may trigger Twitter's automation detection and result in account restrictions.

Setup

  1. Install dependencies:

    npm install
  2. Copy the .env.example into a .env file:

    # (Required) Twitter Authentication
    TWITTER_USERNAME=     # your twitter username
    TWITTER_PASSWORD=     # your twitter password
    
    # (Optional) Blog Configuration
    BLOG_URLS_FILE=      # path to file containing blog URLs
    
    # (Optional) Scraping Configuration
    MAX_TWEETS=          # max tweets to scrape
    MAX_RETRIES=         # max retries for scraping
    RETRY_DELAY=         # delay between retries
    MIN_DELAY=           # minimum delay between requests
    MAX_DELAY=           # maximum delay between requests

Usage

Twitter Collection

npm run twitter -- username

Example: npm run twitter -- pmarca

Blog Collection

npm run blog

Generate Character

npm run character -- username

Example: npm run character -- pmarca

Finetune

npm run finetune

Finetune (with test)

npm run finetune:test