KDD2024-WhoIsWho: A Python repository from Leo1998-Lu

KDD2024-WhoisWho Solution: Team Leo_Lu

Prerequisites & Hardware

pip install -r requirements.txt

GPU: None
CPU: i9-14900K
RAM: 128GB

Method & Parameter count

3k+ hand-crafted features with 5-fold LGBM (CPU-only)
Using adversarial learning and data augmentation, a small number of outlier samples (labeled as 0) are simultaneously assigned to 1 and added to the training data, which accelerates the model convergence speed and improves the AUC by 0.5~1.2% on test data.
Key content extraction and quantification of the author's research field, by splicing all the corresponding article content and keywords of the corresponding author to calculate the overlap score of the current article sample.
Calculate the difference and cosine similarity between the mean value of embedding of all corresponding article contents of the authors and the current article embedding.

File structure & Run Code

 |--requirements.txt # Prerequisites
 |--data # Dataset given by organizer
 |--output_data # Dataset generated by codes
 |--model # Model weights
 |--code # code files
    |--train.py  
    |--infer.py

sh train.sh (train+infer)
sh infer.sh (infer only)

Leo1998-Lu/KDD2024-WhoIsWho

KDD2024-WhoisWho Solution: Team Leo_Lu

Prerequisites & Hardware

Method & Parameter count

File structure & Run Code