/CentralBankRoBERTa

CentralBankRoBERTA is a large language model. It combines an economic agent classifier that distinguishes five basic macroeconomic agents with a binary sentiment classifier that identifies the emotional content of sentences in central bank communications.

Primary LanguageJupyter Notebook

Paper Page

CentralBankRoBERTa

A Fine-Tuned Large Language Model for Central Bank Communications

Central bank communications are an important tool for guiding the economy and fulfilling monetary policy goals. Natural language pro-cessing (NLP) algorithms have been used to analyze central bank communications, but they often ignore context. Recent research has introduced deep-learning-based NLP algorithms, also known as large language models (LLMs), which take context into account. We apply LLMs to central bank communications and construct CentralBankRoBERTa, a state-of-the-art economic agent classifier that distinguishes five basic macroeconomic agents (households, firms, banks, the government and the central bank itself) and binary sentiment classifier that identifies the emotional content of sentences in central bank communications. A detailed discussion on the motivations and results for this model can be found in Pfeifer, M. and Marohl, V.P. (2023) "CentralBankRoBERTa: A Fine-Tuned Large Language Model for Central Bank Communications". Here we release our data, models, and code.

● Data

The training data of speeches from the Fed, the ECB and the BIS are in this folder. Overall, we have collected 19,381 speeches. To train our economic agents classifier, we have labeled we labeled 6,205 randomized sentences from the Fed database as speaking either about households, firms, the financial sector, the government, or the central bank itself. To train our sentiment classifier, we have labelled 6,683 sentences from the Fed database, which are either labeled as being positive (1) or negative (0).

  • πŸ€— The Huggingface πŸ€— dataset card for the pre-labeled datasets can be found here

● Meta-labelling

The scripts and methodology for generating additional meta-labels.

● Economic agent classification

This folder contains the script testing different large language models such as BERT (Devlin et al., 2018), XLNET (Yang et al., 2019), FinBERT (Huang et al. 2022) and RoBERTa (Liu et al., 2019) for our economic agents classification task.

● Sentiment classification

The script testing different large language models (BERT, FinBERT, XLNET and RoBERTa) and and machine learning models such as Support Vector Machine (SVM), Random Forest, and a two-step TF-IDF and NaΓ―ve Bayes (NB) model on our sentiment classification task are in this folder.

● Model loader

Go to this folder if you want to use CentralBankRoBERTA for your own analysis of central bank communications. Both the economic agents classifier and the sentiment classifier and a step-by-step guide for implementation are in here.

β˜†β˜†β˜†

Please cite this model as Pfeifer, M. and Marohl, V.P. (2023) "CentralBankRoBERTa: A Fine-Tuned Large Language Model for Central Bank Communications". Journal of Finance and Data Science . https://doi.org/10.1016/j.jfds.2023.100114
Moritz Pfeifer
Institute for Economic Policy, University of Leipzig
04109 Leipzig, Germany
pfeifer@wifa.uni-leipzig.de
Vincent P. Marohl
Department of Mathematics, Columbia University
New York NY 10027, USA
vincent.marohl@columbia.edu

BibTeX entry and citation info

@article{Pfeifer2023,
  title = {CentralBankRoBERTa: A fine-tuned large language model for central bank communications},
  journal = {The Journal of Finance and Data Science},
  volume = {9},
  pages = {100114},
  year = {2023},
  issn = {2405-9188},
  doi = {https://doi.org/10.1016/j.jfds.2023.100114},
  url = {https://www.sciencedirect.com/science/article/pii/S2405918823000302},
  author = {Moritz Pfeifer and Vincent P. Marohl},
}