pracstrat/Internship

3. Extract HTML Page (3 hours)

Closed this issue · 0 comments

Extract the HTML page

Requirement

  • Give a url and fetch the html page, for example: github
  • Extract all the links, and just print out the link and inner html.

For example1:

<a href="/eishay">eishay</a>

Then output should be

/eishay:eishay

For example2:

<a class="header-logo-blacktocat" href="https://github.com/">
  <span class="mega-icon mega-icon-blacktocat"></span>
</a>

The output should be

https://github.com/:<span class="mega-icon mega-icon-blacktocat"></span>

Also the total output should include the line number, for example

1:/eishay:eishay
2:https://github.com/:<span class="mega-icon mega-icon-blacktocat"></span>