/web-data-retrieval

helper library for common web-based data retrieval tasks

Primary LanguageJavaOtherNOASSERTION

Web Data Retrieval Helper

Build Status Coverage Status Maven Central JavaDoc License: MIT with dependencies

Just another small helper library for common web-based data retrieval tasks.

This library wraps Apache HttpComponents HttpClient to make it more accessible for easier data retrieval and processing.

Examples

Basic GET retrieval

// basic configuration
HttpRetrieval retrieval = new HttpRetrieval()
    .setUserAgent("TestClient/0.1"); // it's always nice to properly identify your program

// go fetch
boolean success = retrieval.requestByGet("https://www.energiequant.de/");
System.out.println(success ? "success" : "failed");

// let's see what we got
String body = new String(retrieval.getResponseBodyBytes());
System.out.println(body);

// let's check some more details...
System.out.println(retrieval.hasCompleteContentResponseStatus()); // Was the content transferred completely?
System.out.println(retrieval.getLastRetrievedLocation()); // What was the URL after following redirects?
System.out.println(retrieval.getResponseHeaders().getFirstByName("content-type")); // inspect content-type HTTP response header

GET retrieval through promises

In addition to retrieving data directly as shown above you can also use a CompletableFuture: Simply instantiate a reusable HttpPromiseBuilder instance by providing it with a decoder Function to process the HttpRetrieval resulting from a request. Some DefaultHttpRetrievalDecoders are available and can easily be chained. Don't forget to provide a configuration template via HttpPromiseBuilder#withConfiguration and you are good to go.

The following example retrieves the content (HTTP response body) of http://www.energiequant.de/ again and automatically decodes the data to a String using the character set indicated by the server (falling back to UTF-8 if unavailable). We are still interested in the final location so we wrap response processing withMetaData to wrap the content into a RetrievedData container:

DefaultHttpRetrievalDecoders decoders = new DefaultHttpRetrievalDecoders();
HttpPromiseBuilder<RetrievedData<String>> builder = new HttpPromiseBuilder<RetrievedData<String>>(
    decoders.withMetaData(
        decoders.bodyAsStringWithHeaderCharacterSet(StandardCharsets.UTF_8)
    )
).withConfiguration(
    new HttpRetrieval()
        .setUserAgent("TestClient/0.1")
);

RetrievedData<String> retrievedData = builder.requestByGet("http://www.energiequant.de/").get();
System.out.println(retrievedData.getData()); // HTTP response body
System.out.println(retrievedData.getRetrievedLocation()); // final location after following all redirects

This becomes much more practical when actually performing some kind of repeated processing to a different target type. To keep the example simple, let's just count the number of lines on some websites by chaining a lambda after decoding the response body to a String and perform two requests for different URLs:

DefaultHttpRetrievalDecoders decoders = new DefaultHttpRetrievalDecoders();
HttpPromiseBuilder<Integer> builder = new HttpPromiseBuilder<Integer>(
    decoders //
        .bodyAsStringWithHeaderCharacterSet(StandardCharsets.UTF_8)
        .andThen(s -> s.split("\n").length)
).withConfiguration(
    new HttpRetrieval()
        .setUserAgent("TestClient/0.1")
);

String[] urls = { "http://www.energiequant.de/", "https://www.github.com/" };
for (String url : urls) {
    System.out.println(String.format("%5d %s", builder.requestByGet(url).get(), url));
}

Note that for a real application you should perform proper error and exception handling and identify your application uniquely by setting a proper user agent string.

License

The implementation and accompanying files are released under MIT license.

As this library requires runtime dependencies, further licenses apply on distribution and at runtime. Please check licenses of all dependencies (not limited to those listed on this page) and any transitive dependencies individually.

Major Runtime Dependencies

The following dependencies have a major impact on this library's operation at runtime: