taalbrecht/JournalVis

Article count confusion when using TopMine

Closed this issue · 1 comments

When writing to file for TopMine, article count changes, which could cause an issue with document alignment or multiword token identification as tokens may not be properly identified if they are accidentally split across several documents from one source.

Actions to solve problem:

  1. Fix single articles being written to multiple lines in writeLines
  2. Ensure that only multiword tokens are taken from TopMine; not document contents (should already be in place)

Fixed in commit 6e6073a by only pulling multiword vocab results from topMine