/CSJ_and_NWJC_yomitan_freq_dict

Frequency dictionaries for yomitan based on the Corpus of Spontaneous Japanese and NINJAL Web Japanese Corpus datasets

Primary LanguagePython

Corpus of Spontaneous Japanese and NINJAL Web Japanese Corpus Yomitan Frequency Dictionaries

Fork of forsakeninfinity’s script to support converting CSJ and NWJC. Check his repo for information.

The Corpus of Spontaneous Japanese - CSJ

Goes up to 31,605 frequency

Download here

“The Corpus of Spontaneous Japanese” (or CSJ) is a database containing a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology, the CSJ is world-class in both the quantity and quality of the available data.

Has different domains you can download from the CSJ Releases folder.

More information can be found here

NINJAL Web Japanese Corpus - NWJC

Goes up to 106,762 frequency

Download here

More information can be found here (in Japanese)