/KDD2024-WhoIsWho

KDD2024-WhoIsWho Solution from Team Leo_Lu

Primary LanguagePython

KDD2024-WhoisWho Solution: Team Leo_Lu

Prerequisites & Hardware

pip install -r requirements.txt
  • GPU: None
  • CPU: i9-14900K
  • RAM: 128GB

Method & Parameter count

  • 3k+ hand-crafted features with 5-fold LGBM (CPU-only)
  • Using adversarial learning and data augmentation, a small number of outlier samples (labeled as 0) are simultaneously assigned to 1 and added to the training data, which accelerates the model convergence speed and improves the AUC by 0.5~1.2% on test data.
  • Key content extraction and quantification of the author's research field, by splicing all the corresponding article content and keywords of the corresponding author to calculate the overlap score of the current article sample.
  • Calculate the difference and cosine similarity between the mean value of embedding of all corresponding article contents of the authors and the current article embedding.

File structure & Run Code

 |--requirements.txt # Prerequisites
 |--data # Dataset given by organizer
 |--output_data # Dataset generated by codes
 |--model # Model weights
 |--code # code files
    |--train.py  
    |--infer.py 
  • sh train.sh (train+infer)
  • sh infer.sh (infer only)