Detecting non-Dutch reviews
Opened this issue · 0 comments
andreasvc commented
Following up on the language issue reported in #1, I wrote a quick script to detect the language of each review with the Polyglot library. There appear to be 470 non-Dutch reviews.
Script:
https://gist.github.com/1cae9033fe6310bae9f45d3c0a8c3883
List of files classified as non-Dutch:
https://gist.github.com/a84355e3898a5f1a9e995fa1c43fc2bf
Cheers