MarginaliaSearch/MarginaliaSearch

(blacklist) Blacklist Transparency

Closed this issue · 1 comments

Create a view for listing/exporting the blacklisted domains to enhance the transparency of the search engine in some way that makes sense. Needs to be able to deal with a large amount of data. Maybe sort by initial, use a patricia trie for basic searching like in encyclopedia.marginalia.nu?

SELECT LEFT(URL_DOMAIN, 1) AS INITIAL, COUNT(*) FROM EC_DOMAIN_BLACKLIST GROUP BY INITIAL;
+---------+----------+
| INITIAL | COUNT(*) |
+---------+----------+
| 0       |      564 |
| 1       |      738 |
| 2       |      595 |
| 3       |      596 |
| 4       |      713 |
| 5       |      600 |
| 6       |      556 |
| 7       |      609 |
| 8       |      659 |
| 9       |      559 |
| a       |     1629 |
| b       |     1676 |
| c       |     1879 |
| d       |     1060 |
| e       |     1347 |
| f       |     1074 |
| g       |     1302 |
| h       |     1268 |
| i       |     1290 |
| j       |      994 |
| k       |      502 |
| l       |      901 |
| m       |     1773 |
| n       |      684 |
| o       |      601 |
| p       |     1359 |
| q       |      153 |
| r       |      867 |
| s       |     3021 |
| t       |     2116 |
| u       |      331 |
| v       |      475 |
| w       |      847 |
| x       |      252 |
| y       |      219 |
| z       |      233 |
+---------+----------+

Blacklist data is available on https://downloads.marginalia.nu/exports/ now. Probably don't need a GUI for this.