birchill/10ten-ja-reader

Duplicate forms returned for name entries

Closed this issue · 1 comments

As reported by email:

When looking up 堀口大學, 東京藝術大学 or 慶應大学 (all in JMnedict), both the 新字体 and 旧字体 forms appear twice in the 10ten popup

What these terms have in common is that the 旧字体 forms were originally separate entries (before being deleted and merged with the 新字体 forms earlier this year). Not sure why this would cause a duplication, though.

I was able to reproduce this in both the preview:

image

and the names tab:

image

For names, we combine entries with matching readings and translations so I guess when we process the 旧字体 variant we'll match the same entry and then merge it together.

// We group together entries where the kana readings and translation
// details are all equal.
const nameContents = getNameEntryHash(name);

We could either just make sure we take the unique set of kanji readings when we merge them:

existingEntry.k.push(...name.k);

Or we could track the IDs of the entries we've already matched and skip them.