/hanakotoba

Exploring 花言葉 in Japanese and other literary corpora

Primary LanguageHTML

Literature in Bloom


Open in gitpod

Purpose

A project to explore 花言葉 (hanakotoba, lit. flower language) in Japanese and other literary corpora.


Dataset

The dataset used for the current project was pulled from the following:

  • Aozora Bunko Corpus for Japanese full text works
  • Hanakotoba for flower names, translations, and associated characteristics
  • Wikipedia for conversions of Japanese decimal classification codes (分類番号)
  • Wikipedia for a list of major Japanese eras (時代)
  • This page for a list of sub-eras (元年) Some of these didn't end up being necessary for the main project but are included with the accompanying code for genre and date conversions

Outputs

  • The main report, compiled with datapane and also in html format
  • Historical era dataframe : Jidai.csv
  • Sub-era dataframe : Gannen.csv
  • Japanese genre code dataframe : Genres.csv
  • Dataframe of all flowers/plants and associated characteristics : Hk_df.csv
  • Dataframe with all text metainfo, calculated date columns, and tagged flower occurences with locations in the text : All_df.csv