pip install -r requirements.txt
- GPU: None
- CPU: i9-14900K
- RAM: 128GB
- 3k+ hand-crafted features with 5-fold LGBM (CPU-only)
- Using adversarial learning and data augmentation, a small number of outlier samples (labeled as 0) are simultaneously assigned to 1 and added to the training data, which accelerates the model convergence speed and improves the AUC by 0.5~1.2% on test data.
- Key content extraction and quantification of the author's research field, by splicing all the corresponding article content and keywords of the corresponding author to calculate the overlap score of the current article sample.
- Calculate the difference and cosine similarity between the mean value of embedding of all corresponding article contents of the authors and the current article embedding.
|--requirements.txt # Prerequisites
|--data # Dataset given by organizer
|--output_data # Dataset generated by codes
|--model # Model weights
|--code # code files
|--train.py
|--infer.py
- sh train.sh (train+infer)
- sh infer.sh (infer only)