causal_agribusiness: A Python repository from ichalkiad

Structural changes and statistical causal relationships in agricultural commodities markets: the impact of public news sentiment and institutional announcements

A code repository to accompany the paper "Structural changes and statistical causal relationships in agricultural commodities markets: the impact of public news sentiment and institutional announcements", by Ioannis Chalkiadakis, Gareth W. Peters, Guillaume Bagnarosa and Alexandre Gohin.

Abstract

Novel empirical evidence is studied for the way the agricultural commodities futures markets process information. The significant effect of institutional announcements, such as those of the United States Department of Agriculture (USDA), on the participants in such markets has been well documented in the literature. However, existing studies consider measures of market "surprise" or analysts' "sentiment" that do not stem directly from unstructured text in official reports or public news. In this work, we aim to verify the structural changes incurred in the corn and wheat markets by the release of the USDA reports while considering higher order structural information of several market-related processes. Furthermore, we investigate whether there is evidence for statistical causality relationships between the market reaction, in terms of price, volume and volatility, and market participants' sentiment induced by public news. To address these goals we rely on a recently-published efficient algorithm for statistical causality analysis in multivariate time-series based on Gaussian Processes (Zaremba & Peters, 2022). Market and public news text signals are jointly modelled as a Gaussian Process, whose properties we leverage to study linear and non-linear causal effects between the different time-series signals. The participants' sentiment is extracted from public news data via methods developed in the area of statistical machine learning known as Natural Language Processing (NLP). A novel framework for text-to-time-series embedding is employed (Chalkiadakis et al., 2021) to construct a sentiment index from publicly available news articles. The conducted studies offer a more comprehensive perspective of the information that is available to investors and how that is incorporated into the agricultural commodities market.

Key points

The empirical analysis of this paper is close to the difference-of-opinions literature in that the novel methodology proposed to succinctly summarise textual news into an investors' sentiment signal, provides a cumulative-over-time perspective on investors' sentiment that accounts for all potential contradicting interpretations of new information as observed in public news. To study the way that the agricultural commodities market incorporates new information, four sets of studies are conducted:

We investigate whether an "extreme" event had an impact on the market, where "extreme" is understood as one of two event types: i) the sentiment signal is at the highest or lowest percentile of the sentiment distribution historically to date, or, ii) the USDA published one or multiple reports; at this stage we are performing a test of association between the observed publication event and a structural change in the market price, volume and volatility processes at various lead-lag relationships.
We investigate whether or not the information summarised in the public news text-based sentiment signal has a statistical causal relationship with the market price, volume and volatility with or without the presence of an additional set of explanatory variables ("side information"). We are now testing whether there is an association between the content of public news reporting around the time of a USDA announcement (as summarised by the proposed sentiment signal), and the market movements.
We investigate whether there is an interventional causal relationship between the publication of multiple contemporaneous (same-day) USDA reports and subsequent market behaviour, where the intervention is understood as the publication of additional reports, after the first one on a given day.
We conduct a synthetic control experiment to study the market process in the counterfactual setting, namely, what we would observe in the market process, had a USDA report not been published.

With these four studies we aim to answer questions such as:

How often do structural changes in the market (in price, volume, volatility processes) occur when a USDA report is published?
How likely is it that the USDA reports are causing the observed structural change?
Is there a pattern to the timing (for instance, seasonal according to crop cycle) of the publication's impact on the market?
How would the market price, volume and volatility change had the report not been published, based on information prior to that?

To address the research questions, the paper employs a statistical causality framework based on Gaussian Process, which is a much more flexible framework to assess the interaction between news sentiment and market processes, compared to the linear regression models commonly applied in past literature (e.g. Bollerslev et al., 2018). A formal inference procedure is utilised that can readily accommodate testing for general causality structures, which includes linear and non-linear relationships between time-series processes, whilst also incorporating side information. The utilised framework develops the classical concept of Granger statistical causality (Granger, 1969) into a more general formulation, achieved by quantifying the causal relationships between multiple signals from a conditional probability perspective. The framework utilised here (Zaremba & Peters, 2022) can explicitly and readily test for causality relationships in the trend and/or covariance structure. This is a significant development that was previously not easily achievable when using classical time-series models for statistical causal analysis. Hence, our formulation allows us to study much more flexible model structures that can incorporate non-stationarity and non-linearity in the causal relationships. These can be of first order, i.e. mean-based statistical causality, or second order, i.e. covariance-based statistical causality. At the same time, with our GP model, we mitigate model risk, as we can perform testing with good power properties even under a misspecified model Zaremba, 2022.

In addition to the novel insights into agricultural commodities markets, we significantly contribute to the methodology applied in this literature. In effect, we are able to present a novel example that allows one to combine both highly structured time-series data (price, volume, volatility) and unstructured data (text) to produce a new approach to Granger causal analysis. Whilst price time-series for asset exchange rates are well structured time-series, the modality of natural language text data has to be carefully processed into a sentiment index which inherits a structured format that can be studied in light of causal analysis of the observed price series. This is a challenge both in combining multiple sources of unstructured text data as well as constructing non-trivial sentiment index models representing the text from multiple documents on a time-series scale consistent with the price signal information.

Finally, this paper presents a comprehensive lexicon of terms relevant to the agricultural commodities space and Agribusiness, which we hope will be an asset for researchers wanting to mine and leverage text data either from the Finance/Econometrics space (Zhou et al., 2024) or from disciplines that study different aspects of the commodities markets (Blair et al., 2021).

Repository use

To install the agribusiness package, clone the repository and execute:

pip install -e .

from within the agribusiness/ folder.

The repository is organised as follows:

agribusiness/ contains the Python and R codes developed for the paper. The agribusiness/src/ folder contains preprocessing and text processing scripts, while the rest of the folders correspond to code utilised in each of the studies in the paper. Numbering in the names of the scripts (_0,_1,_2 etc) indicates execution order due to dependencies on output files.
causality_matlab/ contains MatLab code that implements the statistical causality framework of Zaremba & Peters, 2022 (please cite the paper if you use the MatLab code). Toplevel scripts are: runner_causality_total_0.m (for statistical causality studies) and runner_0.m (for structural change studies).
data_nlp/ contains auxiliary files for the text processing part of the paper.

ichalkiad/causal_agribusiness

Structural changes and statistical causal relationships in agricultural commodities markets: the impact of public news sentiment and institutional announcements

Abstract

Key points

Repository use