/scraper-selenium

scraper-selenium

Primary LanguageHTML

Problem 1 :

https://docs.google.com/document/d/1SpCG_ACuzhi0JF6PmOyZfxBSao-oYS1Nw9iIABwe1xc/edit

Solution:

In Yodlee We had used Proprietary Yodlee SDK APIs which was wrapper APIs for IE DOM elements. Now I have NO access to those APIs !!
so I have to find out equivalent libraries to manipulate DOM elements / simulate headless browser behaviour.
I have quickly evaluated Jsoup , HTML unit and Selelinum.
I am using selenium for this implementation.

Download / View Screencast from youtube

https://youtu.be/jC06z8uRCJ0

Download runnable jar and log files from google drive
https://goo.gl/MdXRBQ ,19.2 MB fat jar with all dependencies


Directly run from command line
$ java -jar scraper-selenium-jar-with-dependencies.jar https://www.shoppersstop.com/haute-curry-women-cotton-anarkali-printed-churidar-suit/p-9758640
$ java -jar scraper-selenium-jar-with-dependencies.jar https://www.shoppersstop.com/kaya-pigmentation-reducing-complex-get-rs-500-off-on-kaya-products-for-rs-2490-/p-9787102

download the last run log file https://goo.gl/vGvVyZ

compile from source

  1. checkout the code from github
    $ pwd
    /home/yarish
    $ git clone https://github.com/yarish/scraper-selenium.git
    or directly download as zip https://github.com/yarish/scraper-selenium/archive/master.zip
    $ cd scraper-selenium
    $ mvn clean install
    $ cd target
    $ java -jar scraper-selenium-jar-with-dependencies.jar https://www.shoppersstop.com/haute-curry-women-cotton-anarkali-printed-churidar-suit/p-9758640
$ java -jar scraper-selenium-jar-with-dependencies.jar https://www.shoppersstop.com/kaya-pigmentation-reducing-complex-get-rs-500-off-on-kaya-products-for-rs-2490-/p-9787102

Test cases handled :
1) missing URL param in the command line
2) whether URL is valid ?
3) what if the product does not exists anymore !
4) annoying popup comes - advertisements / promos !!