farkaskid/WebCrawler

Use regex to find URLs.

farkaskid opened this issue · 3 comments

Currently a loop that is searching for href=" matches is being used to extract URLs. Using an efficient and tight regex will provide better performance as well as cleaner code.

<a[^>]*>([^<]+)<\/a> regex can be used to detect all the anchor tags.

Implemented and tested.