benjaminvdb/DBRD

Detecting non-Dutch reviews

Opened this issue · 0 comments

Following up on the language issue reported in #1, I wrote a quick script to detect the language of each review with the Polyglot library. There appear to be 470 non-Dutch reviews.

Script:
https://gist.github.com/1cae9033fe6310bae9f45d3c0a8c3883

List of files classified as non-Dutch:
https://gist.github.com/a84355e3898a5f1a9e995fa1c43fc2bf

Cheers