/google-untitled-spam-spider

A spam spider which is targeting 'Untitled' spam pages from the Google search results.

Primary LanguagePythonMIT LicenseMIT

Google 'Untitled' Spam Spider

A tiny web spider that starts crawling a website and crawls as long as it can find links on those pages, which links to similar spam pages.

This spider is targeting the 'Untitled' spam pages from the Google search results.

I wrote several articles about those spam pages. In which I discuss the underlying backgrounds of this spam network.

I crawled 105,009 Google 'Untitled' Spam Pages in 7 days and 700,504 other linked Spam Pages
— David Wolf
david.wolf.gdn

Usage

from google_spam_spider import GoogleSpamSpider

spider = GoogleSpamSpider(
    url='http://zone-casino.fr/2hephe/torch-functional-unfold.html', # The url to start crawling
    direct_spam_logs='direct_spam.log', # The file to log direct spam
    external_spam_logs='external_spam.log' # The file to log external spam
    )