/ai-scholar-gender

This is the repository for the Gender Differences in AI Scholar

Primary LanguageJupyter Notebook

AI Scholar Gender

Overview

This is the (private) repo for AI Scholar (gender project).

File Structure

.
├── README.md
├── code
│   ├── basic_scholar_profiles_1.ipynb
│   ├── basic_scholar_profiles_2.ipynb
│   ├── citation_by_academic_age.ipynb
│   ├── domain_analysis.ipynb
│   ├── gs_scholar_analysis.ipynb
│   ├── organization_clustering.ipynb
│   ├── paper_centric_analysis.ipynb
│   ├── prepare_gs_feature_data.ipynb
│   ├── stylish_title_detector.ipynb
│   └── time_series_clustering.ipynb
└── data
    ├── AIScholars78k_samp1000.csv
    └── Papers100k_samp1000.csv

code

  • prepare_gs_feature_data.ipynb: Obtain necessary data features for later analysis.
  • basic_scholar_profiles_1.ipynb and basic_scholar_profiles_2.ipynb: Analyze basic features in GS scholar profiles.
  • gs_scholar_analysis.ipynb: Analyze GS scholar features from different perspectives.
  • paper_centric_analysis.ipynb: Perform paper centric analysis corresponding to the section with the same name in the paper.
  • citation_by_academic_age.ipynb: Direct analysis of GS scholars' academic age time series.
  • time_series_clustering.ipynb: Clustering analysis of GS scholars' academic age time series.
  • organization_clustering.ipynb: Clustering analysis of GS scholars' organizations.
  • domain_analysis.ipynb: Analyze GS scholars' domain tags.
  • stylish_title_detector.ipynb: Stylish title detector implementation and samples of stylish title.

data

  • AIScholars78k_samp1000.csv: 1000 samples of the 78k AI scholar dataset. Full dataset can be accessed from Google Drive link.
  • Papers100k_samp1000.csv: 1000 samples of the 100k paper dataset. Full dataset can be accessed from Google Drive link.
  • More descriptions of the full data statistics are shown in the GitHub repo: causalNLP/AI-Scholar.

Instead, you can use the following commands to download the full dataset:

pip install gdown
cd <path_to_store_data>
python -c "https://drive.google.com/uc?id=1sfNLH549c0IMp-hojnpmskBftsW5jB7a" # AIScholars78k_samp1000.csv
python -c "https://drive.google.com/uc?id=16cmOlJ-8--7vqIXY-hP0JXtRwqaPoOfh" # Papers100k_samp1000.csv