Introduction

Institute of Linguistics, Academia Sinica compiled a set(93,826) of Chinese words from paper works and made a Word List with Accumulated Word Frequency in Sinica Corpus.

Inside they counted rank, frequency, percent and cumulation of the word in the daily usage.

We wished to know how these words are frequently used within the category norms.

We adopted a category norm set from a thesis conducted by Yeh & Huang, 2002.

And fed the words into the website to extract the frequency.

With use of regular expression to accomplish the task.

Results were exported into csv files.