Housing Classification with Transformers

This repo aims to perform the classification of housing data into Apartment or House. It fine-tunes and evaluates two pre-trained transformer models on a real estate dataset.


Contents


Installation

$ virtualenv venv -p python3
$ source venv/bin/activate
$ pip install -r requirements.txt
$ pip install torch --extra-index-url https://download.pytorch.org/whl/cpu

Data

Preprocessing text file

Pre-process external files to generate training, development and test sets.

$ python -m src.data.make_dataset <dataset_file>

Parameters:

  • dataset_file: Housing dataset (.json) + binary labels (Apartment, House), e.g. "assessment_NLP.json".

The files must be inside:

$./data/raw/

Output:

$./data/processed/

Train

Fine-tune pre-trained transformer models on training and development data.

$ python -m src.models.train_model <model_name>

Parameters:

  • model_name: pre-trained transformer model, e.g. "distilbert-base-uncased" or "bert-base-uncased".

Output:

$./model/

Evaluation

Evaluate transformer models on test data.

$ python -m src.models.evaluate_model <model_name>

Parameters:

  • model_name: transformer model fine-tuned on <dataset_file>, e.g. "checkpoint-32924".

The file must be inside:

$./model/

Experimentation

Perform evaluation of Housing Classification models on test data.

$ cd notebooks/
$ jupyter notebook