/crawler

string-factory.com

Primary LanguageJavaScript

#D11Hack Twitter URL

Problem Statement

We have a website called string-factory.com. The website has a page that looks like this —

A typical page contains two types of content —

  1. Links to other similar pages, it is marked with a red colored anchor tags.
  2. A set of strings in white.

Our goal is to find the lexicographically smallest string on the website.

To achieve this we should first implement a crawler inside crawler.js. The crawler opens the website's home page ie — http://localhost:8080, extracts the strings that are displayed in white color and moves on to the next set of pages using the links that are displayed in red. This is done till the time all pages have not been parsed. Once all the the pages have been parsed and the strings in white have been extracted from all of them, we need to find the lexicographically smallest string amongst them. That's all!

For Example: Lets say there are just three pages on the website viz. a0, a1 and a2.

  • From a0 you can go to both a1 and a2.

  • From a1 you can only goto a2

  • From a2 you can go back to a0 & a1

In the above case I first load page a0 I extract all the strings in white from that page,

{
  strings: [
+ 'rga',
+ 'abd',
+ 'ebd',
+ 'bde'
  ]
}

Next I should go to a1 and then extract all the strings from there —

{
  strings: [
  'rga',
  'abd',
  'ebd',
  'bde',
+ 'wwd',
+ 'asw',
+ 'mjl'
  ]
}

Finally I should go to a2, extract all the strings and add it to my list and stop.

{
  strings: [
  'rga',
  'abd',
  'ebd',
  'bde',
  'wwd',
  'asw',
  'mjl',
+ 'bcc',
+ 'foo',
+ 'aax'
  ]
}

Here we can do a simple comparison between all the strings and find that aax is the lexicographically smallest string. So the answer here would be aax.

Getting Started

The project is built using express and pug. To run it locally follow these steps.

  1. Install node >= 8
  2. Clone the project git clone git@github.com:dream11/crawler.git
  3. Install npm dependencies npm i
  4. Start the server npm start
  5. Visit http://localhost:8080 in the browser.

The project consists of a mock webserver that serves string-factory.com. The server is throttles responses and sends a 429 response in case of overload, this is to mimic real world latency.

crawler.js is where you should start writing the logic. You can also test your code using npm test. The test cases runs and asserts if your logic works. It will also log the run time on the console for benchmarking.

The team would be glad to assist you in setting up the project on your computer. Feel free to stop by and ask questions.

Judging Criteria

Once your tests pass you should submit a pull request with your solution. Each pull request will be executed on travis to compute the run time of your logic. If a pull request is found that uses any form of plagiarism or uses hard coded results, you will be disqualified. In the end, if your solution has the least run time, you win!

Solution

Like seriously??

FAQ

  • What does lexicographically smallest string mean? The lexicographically smallest string would be the first string in an english dictionary where all the words are sorted in alphabetical order. Eg: a < b or aa < ab.

  • Can we libraries? Sure go ahead!

Please don't hesitate to get in touch with the team or shoot us a query directly on @d11engg.

  • Can we use libraries?