/paginator

Java browser with javascript support

Primary LanguageJavaApache License 2.0Apache-2.0

Paginator

Paginator to get html documents with JS support

Build Maintainable Coverage Issues Commit Dependencies License Central Tag Javadoc Size Label Label

Requirements

  • min java 8
  • chrome installed on the machine

Docker image

Configurations

ENV VARIABLE DEFAULT DESCRIPTION
SERVER_PORT 8089 Server port
N/A 10000 HTML pages cache limit
N/A 10800000ms HTML pages cache life time

Endpoints

METHOD URL REQUEST BODY RETURN BODY Description
GET/PUT /pages url,
page_cache_ms* [optional]
Get html page from url
GET/PUT /pages/elements url,
Map<queryId, cssQuery>,
page_cache_ms* [optional]
Map<queryId,
List<Elements>>
Get specific html elements
GET/PUT /pages url,
content,
page_cache_ms* [optional]
Manual add html page to cache
GET/PUT /pages/statistics size,
maxLifeTime,
sizeLimit
Get cache statistics

* page_cache_ms is optional - it does not overwrite the previous value at the second call.

Examples

Get elements from HTML page

  • Request: GET http://localhost:8089/pages/elements
  • Body:
{
  "url": "parse.example.com",
  "css_queries": {
    "form_text": "form p"
  }
}
  • Response
{
  "form_text": [
    {
      "tag": "P",
      "text": "Some example text here.",
      "selector": "html > body > div > form > p:nth-child(1)",
      "attributes": {
      },
      "children": [
      ]
    }
  ]
}

Cache custom html pages

  • Request: POST http://localhost:8089/pages
  • Body:
{
  "url": "my.own.example.com",
  "content": "<!doctype html><html><head><title>Example Domain</title></head><body><div><h1>Example page</h1></div></body></html>"
}
  • Request: POST http://localhost:8089/pages
  • Body:
{
  "url": "my.own.example.com",
  "content": "<!doctype html><html><head><title>Example Domain</title></head><body><div><h1>Example page</h1></div></body></html>"
}

Docker build image example

  • Create jar file: mvn clean -Dmaven.test.skip=true package
  • Build local image docker build -t paginator .
  • Docker image tag latest for repo: docker tag "$(whoami)/paginator" SOME_REPO_PATH/paginator:latest;
  • Docker image push to repo: docker push SOME_REPO_PATH/paginator:latest

TODO

  • Async page call implementation [remove synchronised]
  • Endpoint to clear cache
  • configurable default cache limits
    ////((((((((((((((((((((((((((((((* **         
    //////////////////////////////////* */(/.      
    //////////////////////////////////* */////*    
    //////////////////////////////////* *////////. 
    //////////////////////////////////*            
    ///////......................,////////////////.
    //////////////////////////////////////////////.
    ///////...............................,///////.
    ///////******************************/////////.
    //////////////////////////////////////////////.
    //////*.           PAGINATOR          ,///////.
    //////////////////////////////////////////////.
    **********************************************.
    **********************************************.
    ********,....*********************************.
    ********,    *********************************.
            .,***********,    ,*******************.
             ,,,,,,,,,,,,,    ,*,,,,,      .,,,,,,.
             ,,,,,,,,,    ,,,,,,,,,,,      .,,,,,,.
      ................    .......,,,.   .......... 
      ,,,,,,.                    ,,,.  .,,,.       
      ,,,,,,.       ....     ,,,.                  
                    ,,,.     ,,,.                  
                ....                               
                ....                               
                    ....