/qqp

Primary LanguageJupyter Notebook

Beyond Duplicates: Unleashing Transitivity in Quora Data Augmentation

This study investigates the effects of transitivity-based data augmentation on the Quora Question Pairs (QQP) dataset. Utilizing BERT embeddings, the dense network was trained on both the original and nine augmented datasets. Results indicate slight performance improvements when including duplicate augmentations up to a certain point, but non-duplicate augmentations consistently reduced model performance. These findings suggest that the augmentation method may inadvertently amplify inherent noise in the dataset, complicating the model’s learning process. This study illuminates the complexities of data augmentation while also highlighting the importance of mindful strategy when introducing synthetic data to avoid exacerbating the noise present in the dataset.