/ECTSum

Dataset and Codes for our EMNLP 2022 Main Conference Long Paper titled "ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts"

Primary LanguagePython

ECTSum: A New Benchmark Dataset For Bullet Point Summarization of Long Earnings Call Transcripts

Long Paper Accepted at the EMNLP 2022 Main Conference!

  • Paper: https://aclanthology.org/2022.emnlp-main.748/
  • Poster: https://rajdeep345.github.io/files/pdf/research/ECTSum_EMNLP2022_Poster.pdf
  • Pre-recorded Video: https://drive.google.com/file/d/1DW2i2ApgiE6V7ViiayX5zdJSRXdAEbsy/view
  • Dataset

    The ECTSum dataset can be found under the data folder.

    Codes

    Codes and instructions for our proposed model ECT-BPS can be found under codes/ECT-BPS
    Codes and instructions for our baseline models can be found under codes/baselines

    Data Preparation for ECT-BPS

    Preparing the data for training the Extractive Module

    Imports

    pip install sentence-transformers
    pip install num2words
    pip install word2number

    Prepare the data

    python prepare_data_ectbps_ext.py

    Data Location

    The data is saved at codes/ECT-BPS/ectbps_ext/data/.
    Processed data is already uploaded at this location.

    Preparing the data for training the Paraphrasing Module

    Imports

    pip install sentence-transformers
    pip install num2words
    pip install word2number

    Prepare the data

    python prepare_data_ectbps_para.py

    Data Location

    The data is saved at codes/ECT-BPS/ectbps_para/data/para/.
    Processed data is already uploaded at this location.

    Prepare the data with numericals masked

    python prepare_data_ectbps_para_mask.py

    Data Location

    The data is saved at codes/ECT-BPS/ectbps_para/data/para_mask/.
    Processed data is already uploaded at this location.

    Updates

  • 1st November 2022 - ECTSum Dataset released
  • 30th November 2022 - Codes and Instructions released for training the Extractive Module of ECT-BPS
  • 3rd March 2023 - Added the Prediction Pipeline for the Extractive module.
  • 5th March 2023 - Codes released to prepare the data for training the Paraphrasing Module
  • 7th March 2023 - Codes released to train the Paraphrasing Module of ECT-BPS
  • 8th March 2023 - Google Colab Notebook released for training and testing the Paraphrasing Module