A Fine-Tuned Large Language Model for Central Bank Communications |
Central bank communications are an important tool for guiding the economy and fulfilling monetary policy goals. Natural language pro-cessing (NLP) algorithms have been used to analyze central bank communications, but they often ignore context. Recent research has introduced deep-learning-based NLP algorithms, also known as large language models (LLMs), which take context into account. We apply LLMs to central bank communications and construct CentralBankRoBERTa, a state-of-the-art economic agent classifier that distinguishes five basic macroeconomic agents (households, firms, banks, the government and the central bank itself) and binary sentiment classifier that identifies the emotional content of sentences in central bank communications. A detailed discussion on the motivations and results for this model can be found in Pfeifer, M. and Marohl, V.P. (2023) "CentralBankRoBERTa: A Fine-Tuned Large Language Model for Central Bank Communications". Here we release our data, models, and code.
β Data
The training data of speeches from the Fed, the ECB and the BIS are in this folder. Overall, we have collected 19,381 speeches. To train our economic agents classifier, we have labeled we labeled 6,205 randomized sentences from the Fed database as speaking either about households, firms, the financial sector, the government, or the central bank itself. To train our sentiment classifier, we have labelled 6,683 sentences from the Fed database, which are either labeled as being positive (1) or negative (0).
- π€ The Huggingface π€ dataset card for the pre-labeled datasets can be found here
β Meta-labelling
The scripts and methodology for generating additional meta-labels.
β Economic agent classification
This folder contains the script testing different large language models such as BERT (Devlin et al., 2018), XLNET (Yang et al., 2019), FinBERT (Huang et al. 2022) and RoBERTa (Liu et al., 2019) for our economic agents classification task.
The script testing different large language models (BERT, FinBERT, XLNET and RoBERTa) and and machine learning models such as Support Vector Machine (SVM), Random Forest, and a two-step TF-IDF and NaΓ―ve Bayes (NB) model on our sentiment classification task are in this folder.
β Model loader
Go to this folder if you want to use CentralBankRoBERTA for your own analysis of central bank communications. Both the economic agents classifier and the sentiment classifier and a step-by-step guide for implementation are in here.
- π€ The Huggingface π€ pipeline for both models can be found here:
βββ
Please cite this model as Pfeifer, M. and Marohl, V.P. (2023) "CentralBankRoBERTa: A Fine-Tuned Large Language Model for Central Bank Communications". Journal of Finance and Data Science . https://doi.org/10.1016/j.jfds.2023.100114 | |
Moritz Pfeifer Institute for Economic Policy, University of Leipzig 04109 Leipzig, Germany pfeifer@wifa.uni-leipzig.de |
Vincent P. Marohl Department of Mathematics, Columbia University New York NY 10027, USA vincent.marohl@columbia.edu |
@article{Pfeifer2023,
title = {CentralBankRoBERTa: A fine-tuned large language model for central bank communications},
journal = {The Journal of Finance and Data Science},
volume = {9},
pages = {100114},
year = {2023},
issn = {2405-9188},
doi = {https://doi.org/10.1016/j.jfds.2023.100114},
url = {https://www.sciencedirect.com/science/article/pii/S2405918823000302},
author = {Moritz Pfeifer and Vincent P. Marohl},
}