/register

A toolkit for analyzing register, genre and style

Primary LanguagePythonMIT LicenseMIT

register

register is a toolkit for analyzing language use patterns that characterize registers, genres and styles. It provides a wide range of features and covers various languages (Note that not all feature packages are supported for all languages (see doc)).

Installation

register requires Python >= 3.6.

  1. Clone/download the repository
  2. In the current folder run:
pip install -r requirements.txt
  1. Load the spaCy language model (If your language is not supported by spaCy, you can still use basic feature packages (e.g., character n-grams or token n-grams)), e.g.:
python -m spacy download de_core_news_sm

Some features need further resources:

  • For features based on constituency parse trees load the benepar model for your language:
import benepar
benepar.download('benepar_de2')
  • The feature package emotion needs specific language data.

    python -m textblob.download_corpora
    

Run register

Run register with (from the src directory):

python run_register.py path/to/your/configuration_file.json

If you don't specify your own JSON configuration file, the config.json file in the src directory is taken. Edit this file to your needs or create your configuration file following the documentation. register provides quite a lot configuration options, such as the choice of features you want to extract from your text or different machine learning models to use.

A Question of Style

To reproduce the results for the feature-based models used in our paper 'A Question of Style: A Dataset for Analyzing Formality on Different Levels' use the configuration files config_pt16.json and config_c18.json in the src directory. Edit the path to point to your local copy of in_formal sentences. (Attention: Constituency parsing for the PT18 model takes time. It may take a while.)

Citation

When using register, please cite:

@inproceedings{eder-etal-2023,
    title = "A Question of Style: A Dataset for Analyzing Formality on Different Levels",
    author = "Eder, Elisabeth  and
      	      Krieg-Holz, Ulrike  and
      	      Wiegand, Michael",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-eacl.42",
    pages = "580--593"
}

register builds on external resources. If you use them, please cite these resources appropriately (see the documentation).