We have a website called string-factory.com. The website has a page that looks like this —
A typical page contains two types of content —
- Links to other similar pages, it is marked with a
red
colored anchor tags. - A set of strings in
white
.
Our goal is to find the lexicographically smallest string on the website.
To achieve this we should first implement a crawler inside crawler.js
. The crawler opens the website's home page ie — http://localhost:8080, extracts the strings that are displayed in white
color and moves on to the next set of pages using the links that are displayed in red
. This is done till the time all pages have not been parsed. Once all the the pages have been parsed and the strings in white
have been extracted from all of them, we need to find the lexicographically smallest string amongst them. That's all!
For Example: Lets say there are just three pages on the website viz. a0
, a1
and a2
.
-
From
a0
you can go to botha1
anda2
. -
From
a1
you can only gotoa2
-
From
a2
you can go back toa0
&a1
In the above case I first load page a0
I extract all the strings in white from that page,
{
strings: [
+ 'rga',
+ 'abd',
+ 'ebd',
+ 'bde'
]
}
Next I should go to a1
and then extract all the strings from there —
{
strings: [
'rga',
'abd',
'ebd',
'bde',
+ 'wwd',
+ 'asw',
+ 'mjl'
]
}
Finally I should go to a2
, extract all the strings and add it to my list and stop.
{
strings: [
'rga',
'abd',
'ebd',
'bde',
'wwd',
'asw',
'mjl',
+ 'bcc',
+ 'foo',
+ 'aax'
]
}
Here we can do a simple comparison between all the strings and find that aax
is the lexicographically smallest string. So the answer here would be aax
.
The project is built using express
and pug
. To run it locally follow these steps.
- Install
node >= 8
- Clone the project
git clone git@github.com:dream11/crawler.git
- Install npm dependencies
npm i
- Start the server
npm start
- Visit http://localhost:8080 in the browser.
The project consists of a mock webserver that serves string-factory.com. The server is throttles responses and sends a 429
response in case of overload, this is to mimic real world latency.
crawler.js
is where you should start writing the logic. You can also test your code using npm test
. The test cases runs and asserts if your logic works. It will also log the run time on the console for benchmarking.
The team would be glad to assist you in setting up the project on your computer. Feel free to stop by and ask questions.
Once your tests pass you should submit a pull request with your solution. Each pull request will be executed on travis to compute the run time of your logic. If a pull request is found that uses any form of plagiarism or uses hard coded results, you will be disqualified. In the end, if your solution has the least run time, you win!
Like seriously??
-
What does lexicographically smallest string mean? The lexicographically smallest string would be the first string in an english dictionary where all the words are sorted in alphabetical order. Eg:
a < b
oraa < ab
. -
Can we libraries? Sure go ahead!
Please don't hesitate to get in touch with the team or shoot us a query directly on @d11engg.
- Can we use libraries?