suminb/translator

Duplicated translation entries

Closed this issue · 2 comments

I still see a lot of duplicated translation entries, even though the following code should've prevented them.

translation = Translation.query.filter_by(original_text_hash=original_text_hash, source=source, target=target, mode=mode).first()

In many cases, the timestamps are fairly close (a fraction of a second) so I'd assume it happens when users send duplicated translation requests.

Solutions:

  1. Define a unique key (original_text_hash, source, target, mode).
  2. Disable 'translate' button when translations are in progress.

75dbedb partially addresses this issue. Need to write a script to clean up existing duplicates.

After all, the raw translation records are not that important, because the only reason we are storing it as a cache. Rather, we are interested in building a multilingual corpus database. Thus, I'm closing this issue.