/oxml-comptetiton

Primary LanguageJupyter Notebook

About

OxML 2023 Financial Machine Learning sompetition notebooks with code.

Intro

Task 1: Given a set of pdf documents, Build an ESG document classifier that can take a document as an input, classify each page to be either E,S or G related

Task 2: Given a set of pdf as images, Build a table detector that can precisely locate the position of table from of a page document

Approach

Task 1: Finutune DistillBert on train/validation/test set

Task 2: Zero-shot DETR style table transformer.

Main dependencies:

  • huggingface
  • PyMuPDF