/web-archive

Rust library for downloading image, script, and CSS resources and embedding them into a webpage

Primary LanguageRustApache License 2.0Apache-2.0

web-archive

Build crates.io Docs

Library for archiving a web page along with its linked resources (images, css, js) for local use.

Example

web-archive = "0.3.0"
use web_archive::{archive, blocking};

// Build a collection of linked resources attached to the page

// async API
let archive = archive("http://example.com", Default::default()).await.unwrap();

// blocking API
let archive = blocking::archive("http://example.com", Default::default()).unwrap();


// Embed the resources into the HTML
let page = archive.embed_resources();

println!("{}", page);

Feature flags

  • blocking - enable the Blocking API
  • socks - enable SOCKS proxy support

Testing

The main library contains unit tests for the parsing functionality, and dynamic tests against a local webserver are in the dynamic_tests directory. The dynamic tests are built with Rocket which requires Nightly Rust, however the main library builds on Stable.

cargo test
cd dynamic_tests && cargo run

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.