LogoExtraction
Python Code
This program use scrapy package to parse a website for logo extraction. The name of spider to perform logo extraction is : logo
#Method Detail
This program only process
, and tag in order to extract logo.
There are three case to extract logo:
with logo substring in its @src
Case 1: when containsCase 2: when
with possible file extension as like (.png, .gif, .jpg etc) and logo substring in its @class or @title or @alt
Case 3: when contains @href as home page address or index. andLimitation
1 - This program don't process CSS (style sheet) to parse for LogoExtraction 2 - This program don't process HTML pages having only
instead of for Logo Extraction.
Run
In order to run this program. you can use following command at terminal inside LogoExtraction project
scrapy crawl logo
#Output It will extract the logo url and web page url and save in csv file.