Webpack Source File Extractor

A Python tool that automatically discovers and extracts original source files from webpack bundles using sourcemaps. This tool is useful for analyzing and understanding the structure of webpack-based applications by recovering the original source code from production builds.

Features

  • Automatic Discovery: Scans web pages to find JavaScript files and their associated sourcemaps
  • Recursive Chunk Detection: Identifies and downloads dynamically loaded webpack chunks
  • Source Extraction: Extracts original source files from sourcemaps while preserving directory structure
  • Concurrent Downloads: Uses threading for efficient parallel downloading
  • Smart Filtering: Excludes node_modules and webpack internals
  • Next.js Support: Includes patterns for Next.js applications

Requirements

  • Python 3.6+
  • Required packages (auto-installed if missing):
    • requests
    • beautifulsoup4

Installation

Clone or download the sourcemap_downloader.py script to your local machine.

# Install dependencies manually if needed
pip install requests beautifulsoup4

Usage

Basic Usage

python sourcemap_downloader.py <URL>

Command Line Options

python sourcemap_downloader.py <URL> [options]

Arguments:
  URL                    Target website URL (e.g., https://example.com)

Options:
  -o, --output DIR       Output directory for extracted files (default: webpack_sources)
  -w, --workers N        Number of concurrent download workers (default: 10)
  -h, --help            Show help message

Examples

  1. Extract sources from a website:

    python sourcemap_downloader.py https://example.com
  2. Specify custom output directory:

    python sourcemap_downloader.py https://example.com -o my_sources
  3. Adjust concurrent workers for faster/slower downloads:

    python sourcemap_downloader.py https://example.com -w 20

How It Works

  1. Discovery Phase:

    • Fetches the main page HTML
    • Identifies all JavaScript files linked in <script> tags
    • Finds preloaded JavaScript resources
  2. Recursive Chunk Detection:

    • Downloads each JavaScript file
    • Searches for references to other webpack chunks
    • Builds a complete list of all JavaScript assets
  3. Sourcemap Extraction:

    • Looks for sourcemap references in JavaScript files (//# sourceMappingURL=)
    • Attempts to download .map files for each .js file
    • Validates that downloaded files are valid sourcemaps
  4. Source File Recovery:

    • Parses sourcemap JSON data
    • Extracts original source code from sourcesContent field
    • Recreates original directory structure
    • Saves files with their original paths

Output Structure

The tool creates a directory structure that mirrors the original source code organization:

webpack_sources/
├── components/
│   ├── Header.tsx
│   ├── Footer.tsx
│   └── ...
├── pages/
│   ├── index.tsx
│   ├── about.tsx
│   └── ...
├── utils/
│   └── helpers.ts
└── ...

Supported Patterns

The tool recognizes various webpack and framework patterns:

  • Standard webpack sourcemap references
  • Next.js chunk loading patterns
  • Dynamic imports and code splitting
  • Various sourcemap URL formats

Limitations

  • Only works with websites that include sourcemaps in production
  • Requires sourcemaps to contain the sourcesContent field
  • Cannot recover sources if sourcemaps are missing or incomplete
  • Skips minified/processed node_modules code

Security Considerations

This tool is designed for legitimate purposes such as:

  • Analyzing your own applications
  • Security research with proper authorization
  • Educational purposes
  • Understanding webpack bundle structures

Always ensure you have permission to analyze the target website and comply with applicable laws and terms of service.

Troubleshooting

No sourcemaps found:

  • The website may not include sourcemaps in production
  • Try checking if the site is in development mode

Incomplete source extraction:

  • Some sourcemaps may not include source content
  • Check if all JavaScript chunks were discovered

Connection errors:

  • Reduce the number of workers with -w 5
  • Check your internet connection
  • Verify the URL is accessible

License

This tool is provided as-is for educational and analysis purposes. Users are responsible for ensuring their use complies with all applicable laws and regulations.