This repository contains a Python implementation of the distance metric described in the paper: An Earth Mover's Distance Based Graph Distance Metric For Financial Statements
Paper: https://ieeexplore.ieee.org/document/9776204
If you find the code useful, please consider citing this paper.
@INPROCEEDINGS{9776204,
author={Noels, Sander and Vandermarliere, Benjamin and Bastiaensen, Ken and De Bie, Tijl},
booktitle={2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)},
title={An Earth Mover's Distance Based Graph Distance Metric For Financial Statements},
year={2022},
volume={},
number={},
pages={1-8},
doi={10.1109/CIFEr52523.2022.9776204}}
Table of Contents
Quantifying the similarity between a group of companies has proven to be useful for several purposes, including company benchmarking, fraud detection, and searching for investment opportunities. This exercise can be done using a variety of data sources, such as company activity data and financial data. However, ledger account data is widely available and is standardized to a large extent. Such ledger accounts within a financial statement can be represented by means of a tree, i.e. a special type of graph, representing both the values of the ledger accounts and the relationships between them. Given their broad availability and rich information content, financial statements form a prime data source based on which company similarities or distances could be computed.
We present a graph distance metric that enables one to compute the similarity between the financial statements of two companies. This method may be useful for investors looking for investment opportunities, government officials attempting to identify fraudulent companies, and accountants looking to benchmark a group of companies based on their financial statements.
The following frameworks/libraries were utilized to get this project started:
Instructions for setting up this project locally can be found here. Follow the simple installation steps to get your local up and running.
- Clone the repo
git clone https://github.com/snoels/earth-movers-graph-distance-metric.git
- Change your directory to the repo
cd earth-movers-graph-distance-metric/
- Create the conda environment
env-edm-gdm
conda env create -f environment.yml
- Install pygraphviz (Ubuntu and Debian)
sudo apt-get install graphviz graphviz-dev pip install pygraphviz==1.6
Ten example vertex-weighted company representations can be found in the following file: ./synthetic_data/synthetic_company_graph_data.pkl
.
This data is synthetical data inspired on the vertex-weighted balance sheets representation of a balance sheet and by no means represents real company data.
This repository is currently maintained by me. You can reach me at sander.noels@ugent.be.