= About SuperMario = SuperMario is an advance web cralwer library written in python. It provides a number of methods to mine data from kinds of sites. == License == BSD License See 'LICENSE' for details. == Requirements == Platform: *nix like system (Unix, Linux, Mac OS X, etc.) Python: 2.5+ Storage: mongodb Some other python models: - simplejson - BeautifulSoup - eventlet - PIL - pycurl - chardet - feedparser - mongokit - templatemaker - flickrapi - pyyaml - MySQLdb - dateutil == Features == + robots.txt protocol supported; + cache URL 's HTML; + normalize URL; + convert all content into unicode; + extract MainText from HTML by specific a * link-threshold * + convert partial RSS feed to full RSS feed; + proxies list support; + cookie keep support; + login support;
AmoebaFactor/SuperMario
A python module which support an other project named bububa.Lego provide several advance web scrape functions.
Python