FriendlyEvil/offline-browser

Java

Thread-safe offline browser

Crawls html pages to the specified depth and saves all documents for offline viewing. This implementation supports:

multi-threaded processing of different pages
limit on the number of pages processed
limit on the number of pages loaded simultaneously from a single host

Usage:

try (Crawler crawler = new WebCrawler(
                new SimpleDownloader(),
                downloaders,
                extractors,
                perHost,
                directory)) {
    Result result = crawler.download("https://github.com/", depth);
}

downloaders - maximum number of simultaneously loaded pages
extractors - maximum number of pages to extract links from
perHost - maximum number of pages loaded simultaneously from a single host
directory - directory for saving files
depth - crawl depth

Comments:

My implementation use own html parser. To avoid errors, you can use existing libraries, such as jsoup.
Pages that use dynamic loading of scripts and css may not display correctly.
Uploaded files will be saved with random uuid with the old extension