Description

For this exercise, we're asking you write an HTML parser. In the /html directory, you'll find a few different HTML files that are typical of the kind of HTML that we parse every day. The HTML files were all pulled from a specific site, and the actual URL for each page can be found at the top of each HTML file (so you can pull the original page up in your browser for viewing, or even check out other property pages). What you'll need to do is construct a parser to extract the following:

  • price
  • number of bedrooms
  • number of bathrooms
  • address
  • image URLs

The output can be anything you like. A JSON file, a CSV, System.out... You can either explain to us your design decisions and walk us through your code during the follow-up call or you can include a short write-up. This exercise has been intentionally left very open-ended so please design things as you see fit.

For this exercise, you can use any version of Java that's 8 or above. Also, feel free to use any publicly-available Java-based libraries, frameworks or utilities that you'd like, to help you with the task.

Code

Start by creating your own branch in which to do all your work. When you are finished, open a pull request for us to review. Once we have gotten a chance to review your code, we'll schedule a follow-up call to talk about it. If you have any questions or would like any clarifications regarding this exercise please don't hesitate to contact Simon.