/LogoExtraction

This Python program extract logo a website using scrapy package

Primary LanguagePython

LogoExtraction

Python Code

This program use scrapy package to parse a website for logo extraction. The name of spider to perform logo extraction is : logo

#Method Detail

This program only process

, and tag in order to extract logo. There are three case to extract logo:

Case 1: when contains with logo substring in its @src

Case 2: when
contains with logo substring in its @src

Case 3: when contains @href as home page address or index. and with possible file extension as like (.png, .gif, .jpg etc) and logo substring in its @class or @title or @alt

Limitation

1 - This program don't process CSS (style sheet) to parse for LogoExtraction 2 - This program don't process HTML pages having only

instead of
for Logo Extraction.

Run

In order to run this program. you can use following command at terminal inside LogoExtraction project

scrapy crawl logo

#Output It will extract the logo url and web page url and save in csv file.