/literacy-knowledge-base

Literacy situation models knowledge base creation

Primary LanguagePythonMIT LicenseMIT

Fictional characters analysis

While analysis of literary works and their content is a commonly taught and often simple skill used by people, it is a challenge for machines. They lack human knowledge, common sense, and contextual awareness, which is very important when analyzing literary works. Many researchers have tackled these problems, some more successfully than others. In our work, we approach the problems of character extraction, sentiment analysis of character relationships, and protagonist and antagonist detection. All of these tasks are performed on our newly created and annotated corpus of fables.

Dataset

Dataset is scrapped from the Project Gutenberg website which provides free eBooks, with the focus on older works for which U.S. copyright has expired. We decided to use a collection of fables by the greek author Aesop called The Fables of Aesop collected and translated by Joseph Jacobs. We collected 55 of these fables and annotated them by hand. For each fable we annotated the following things:

  • characters,
  • sentiment relationships between the characters,
  • protagonist and antagonist of the story.

You can find the dataset and the annotations in the following directory: data/aesop/. Annotations are saved in JSON format.

Instructions

Installation

  1. Install Anaconda or make sure that your Python version is 3.8.x. If you are using Anaconda you can create and activate new environment by running:
conda create -n <env_name> python=3.8
conda activate <env_name>
  1. Clone this repository:
git clone https://github.com/anzemur/literacy-knowledge-base.git
  1. Move inside the project repository:
cd literacy-knowledge-base
  1. Install dependencies:
pip install -r requirements.txt 
  1. Download & install language models:
python -m spacy download en_core_web_trf
python -m spacy download en_core_web_sm
pip install allennlp-models
python src/downloads.py

Running the code

While running the code you may encounter some CUDA related warnings that can be ignored. The whole code should be executed in about 1-2 hours.

1. Character recognition

To generate the results of character recognition you should run the following command:

python src/characters/run_ner.py

And to evaluate the obtain results you should run:

python src/characters/eval_ner.py

2. Character sentiments

To generate the results of character sentiments & protagonist/antagonist detection you should run the following command:

python src/characters/character_sentiments.py

And to evaluate the obtain results for character sentiments you should run:

python src/characters/eval_sentiments.py

2. Protagonist/antagonist detection

To evaluate the obtain results for protagonist/antagonist detection you should run:

python src/characters/eval_leads.py