categorizing_sexism_misogyny: A Python repository from pulkitparikh

We hereby release the code used for our research paper under review at TWEB titled "Categorizing Sexism and Misogyny through Neural Approaches". Our implementation utilizes parts of the code from [1, 2, 3, 4] and libraries Keras and Scikit-learn [5]. The following are brief descriptions of some of the contents of this repository.

main.py

The main file that needs to be run for all deep learning based methods including the proposed approach and baselines

neural_approaches.py

Training, prediction, evaluation, training data creation/transformation, loss function assignment, class imbalance correction

dl_models.py

Deep learning architectures for the proposed approach as well as baselines

load_pre_proc.py

Data loading, pre-processing, problem transformation, functions wrt our ensemble method, and other utilities

sent_enc_embed.py

Generation of sentence representations using general-purpose sentence encoding schemes

word_embed.py

Generation of distributional word representations

ling_word_feats.py

Generation of a linguistic/aspect-based word-level representation

gen_batch_keras.py

Generation of batches of inputs for training and testing

auto_encode.py

Functions related to the autoencoder-based method for using unlabeled data and the pre-training of BERT on a domain-specific corpus (esp. around data creation)

eval_measures.py

Functions related to multi-label evaluation and result reporting

traditional_ML.py

Traditional machine learning methods on ngram based and other features

doc2vec_embed.py

Creation of a vector representation of a piece of text using doc2vec

rand_approach.py

Random label assignment in accordance with normalized training frequencies of labels

rand_sample.py

Creation of a small random sample of the data for quick experimentation

split_labels.py

Label subset generation for our ensemble approach

att_visualize.py

Functions used for quantitative and qualitative analysis

config_deep_learning.txt

A sample configuration file for deep learning methods specifying multiple nested and non-nested parameter combinations

config_traditional_ML.txt

A sample configuration file for traditional machine learning methods

References:

[1] Sweta Agrawal and Amit Awekar. 2018. Deep learning for detecting cyberbullying across multiple social media platforms. In European Conference on Information Retrieval. Springer, 141–153.

[2] Richard Liao. 2017. textClassifier. https://github.com/richliao/textClassifier.

[3] Nikhil Pattisapu, Manish Gupta, Ponnurangam Kumaraguru, and Vasudeva Varma. 2017. Medical persona classification in social media. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 377–384.

[4] Pulkit Parikh, Harika Abburi, Pinkesh Badjatiya, Radhika Krishnan, Niyati Chhaya, Manish Gupta, and Vasudeva Varma. 2019. Multi-label Categorization of Accounts of Sexism using a Neural Framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1642–1652.

[5] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

pulkitparikh/categorizing_sexism_misogyny