/Data-Extraction-from-Financial-Documents

This repository tells you the overview of how to extract data from financial documents like .txt files.

Primary LanguageJupyter Notebook

Data-Extraction-from-Financial-Documents

Data Extraction is used to extract some important data for further analysis. We can extract data from any type of documents (.txt files) or images or pdfs. We can done this task either using machine learning/deep learning algorithms or using regex.

In this repository, I have given the basic overview of how to extract data from financial documents which are of .txt types. I have used regex to extract the data and save it to another csv file.

This problem is taken from the HackerEarth platform which is from HCL ML Challenge. The problem statement is taken from that challenge which is given in Problem Statement .docx file. And, the dataset used in this problem is taken from there only which you can look at from the HCL ML Challenge Dataset. You can also look at the sample dataset to check whether the program is working fine.

The given solution is the basic solution using regex. There are also another ways to solve this for example, using deep learning algorithms.