Amazon-ML-Challenge

Team : Elementals

amazon_ml_preprocessing.ipynb : code to preprocess text
amazon_ml_translation_csv.ipynb : code to translate non-english text
amazon_ml_mode.ipynb : code to create submission file from multiple submission files using mode technique
amazon_ml_training.ipynb : code to train embeddings and predict on test embeddings
amazon_ml_embeddings.ipynb : code to generate embeddings from csv file
submission_top-score.csv : submission file with top score [Accuracy : 66.85]

First the text is converted to embeddings using pre-trained models such as paraphrase-mpnet-base-v2 , paraphrase-MiniLM-L6-v2 paraphrase-MiniLM-L3-v2
Dimension of Embeddings : 384
Embeddings of training data are sent into KNNClassifier present in CuML library
Then the trained KNNClassifier is used to predict on test embeddings.
We also used mode technique i.e. using most frequently predicted label obtained from different experiments
Cross Validation is also used to train KNNClassifier

Used NearestNeighbour, SVM, RandomForest Classifier techniques but results are not better compared to KNNClassifier.
Different size embeddings (384,768)
Combined TITLE,DESCRIPTION and TITLE,DESCRIPTION, BULLET_POINTS and TITLE, DESCRIPTION, BULLET_POINTS