aruneshmathur/dark-patterns

Implement domain categorization

Closed this issue · 2 comments

[from today's discussions] One of the alternatives for building a list of shopping sites is categorizing Alexa top 1m sites using an external (non-Alexa) API.

The following code can be used to query the Bluecoat API:
https://github.com/PoorBillionaire/sitereview/blob/master/sitereview.py

For instance it categorizes myntra.com as a shopping site:

python sitereview.py https://www.myntra.com/

======================
Symantec Site Review
======================

URL: https://www.myntra.com:443/
Last Time Rated/Reviewed: > 7 days 
Category: Shopping

Fixed in 617b393 and 7823d82

When there are multiple categories, we miss all categories but the first one:
Example: http://www.game.co.uk/ Informational, Shopping and Games -> Informational