-
RUN
-
TODO
- change properties in src/main/resources/fec.properties
- you can choose to run single-threaded (+webdriver) / or multi-threaded (only tested with Apache HTTP client)
- for webdriver run:
- install xvfb so you can run firefox headless with webdriver
- on Linux, apt-get install xvfb
- sudo Xvfb :10 -ac ;starts Xvfb
- in another window do
- export DISPLAY :10
- java -jar fec-1.0-SNAPSHOT-jar-with-dependencies.jar
- install xvfb so you can run firefox headless with webdriver
-
- check http client to add timeout, for sites that do not load
-
- parse all sites and display results
-
- display results in xlsx
-
- add crawl logic, to parse all pages of a site with a specified depth
-
- add custom logic for sites that do not match
-
- added contact url parse and follow
-
- added meta http refresh parse and follow
-
- added frames parse and follow, still doesn't work properly
-
- after first frame follow, the email rules should get the content from each frame and parse that
-
- need to add an extra isFrameSite param for that, when true, email rules parse on frame content, not just page content
-
- added frames parse and follow, still doesn't work properly
-
- add threads
-
- work fine now, only with HttpClient, I added timeouts for lazy sites
-
- add threads
-
- make selenium webdriver work with multiple threads