vezaynk/Sitemap-Generator-Crawler

Switch from arrays to hashtables

vezaynk opened this issue · 1 comments

The current implementation uses arrays and iterates them to search for values. With sites such as codinghorror, it starts to become slow because of its O(n) complexity. Meanwhile, hashtables offer the same exact functionality but with O(1) complexity.

Before:

time php sitemap.php site=https://blog.codinghorror.com
 [+] Sitemap has been generated in 746.65 secondsand saved to sitemap.xml
 [+] Scanned a total of 3137 pages and indexed 1704 pages.
 [+] Operation Completed
php sitemap.php site=https://blog.codinghorror.com  6.04s user 1.42s system 0% cpu 12:26.71 total

After:

time php sitemap.php site=https://blog.codinghorror.com
 [+] Sitemap has been generatedin 570.19 secondsand saved to sitemap.xml
 [+] Scanned a total of 3137 pages and indexed 1704 pages.
 [+] Operation Completed
php sitemap.php site=https://blog.codinghorror.com  4.67s user 1.45s system 1% cpu 9:30.23 tota

Performance difference on server is negligible.

#26

Pushed in 4b0a38f