Rensvandeschoot/software-overview-machine-learning-for-screening-text

Colandr: missing information

gimoAI opened this issue · 0 comments

There are a couple of features/properties about Colandr that I was unable to find in literature and/or documentation:

  • Word2Vec is used for feature extraction according to the website. However, I found this Colandr GitHub repo (outdated?) that uses a combination of Word2Vec and TF-IDF for feature extraction, see code fragment.
  • I could not find the classifier used other than this code fragment in the same GitHub repo that implements a SGD classifier.
  • No balancing strategy is mentioned, is any strategy used?