Web Crawler using Spring Boot and Completable Futures & EhCache
An example application using Spring boot and Completable Future & EhCache
Technology stack
- Spring Boot
- Swagger
- Spring Boot Test/JUnit/Mockito/RestAssured
Build Instruction
- Maven
- JDK 8 or above
Run
Internet Without proxy
mvn spring-boot:run
Internet With proxy
JVM Property | Description |
---|---|
http.proxyHost | proxy host name |
https.proxyHost | https proxy host name |
http.proxyPort | proxy port number |
https.proxyPort | https proxy port number |
http.proxyUser | proxy user name |
https.proxyUser | https proxy user name |
http.proxyPassword | proxy user password |
https.proxyPassword | https proxy user password |
Pass all applicable jvm arguments
mvn -D[jvmProperty]=[value] spring-boot:run
Testing
Endpoint:
http://localhost:8080/crawl?depth=5&breadth=10&url=https://example.com
Query Param name | Description |
---|---|
url | [Mandatory] url to crawl |
depth | Depth of the page links to be crawled, default is 2 max 10 |
breadth | Max number of page links to be considered to crawl, default is 10 and max 20 |