/url-pattern-algorithm

the fundamental algorithm used for url normalization,web page classification and web information integration in web seach engine

Primary LanguageJava

url-pattern-algorithm

the fundamental algorithm used for url normalization,web page classification and web information integration in web seach engine The idea of this algorithms came from A Pattern Tree-based Approach to Learning URL Normalization Rules(from WWW),and I made several modifications according to the actual application.