/japanese-subtitles-word-frequency-list

A word frequency list derived from subtitles from Japanese drama, anime and films.

MIT LicenseMIT

Japanese Subtitles Word Frequency List

A word frequency and kanji frequency list derived from subtitles from Japanese drama, anime and films.

The data set was comprised of 12,277 subtitle files taken from https://github.com/Matchoo95/JP-Subtitles. The frequeny lists were generated with JParser and cb's Japanese Text Analysis Tool.

[mostly anime/JDrama - sschmidtu this fork just serves to preserve the data in case the original repo goes down]

Format of Word Frequency Report:

  • Field 1: Number of times word was encountered
  • Field 2: Word
  • Field 3: Frequency Group
  • Field 4: Frequency Rank
  • Field 5: Percentage (Field 1 / Total number of words)
  • Field 6: Cumulative percentage
  • Field 7: Part-of-speech

Format of Kanji Frequency Report:

  • Field 1: Number of times kanji was encountered
  • Field 2: Kanji
  • Field 3: Frequency Group
  • Field 4: Frequency Rank
  • Field 5: Percentage (Field 1 / Total number of kanji)
  • Field 6: Cumulative percentage