Java onboarding challenge

This is a Java spring boot project based on crawling the web links. In this project, we create a crawler that accepts the URL and depth until which we want to crawl. Later it caches the results in the h2 database. The crawling happens asynchronously. Also, there is another endpoint to fetch the results based on the count.

The project is structured based on DDD pattern.

Installation

Import the project in the IntelliJ Idea and run the eureka-server first and later run the crawler service.

Usage

Once both services are successfully running hit the URL: http://localhost:8100/swagger-ui.html

We will find 2 endpoints here.

a) The first one to init the crawler(/crawler/init). Use this JSON for demo

{
  "url": "https://en.wikipedia.org/wiki/Europe",
  "depth": 2
}

We can check the H2 database for the data http://localhost:8100/h2-console/ (Please check the username/password in the application.yml file).

Alternatively, you can see the debugging messages in IntelliJ of results getting saved.

b) The second endpoint(/crawler/getNameSortedResults/count/{count}) will fetch the page name sorted results for results count parameter.

Note: There is a crawler4j.yml file too where you can configure the number of crawlers, and the file location for crawler cache.