This function crawls the titles and corresponding abstracts of papers and saves them as a dictionary of {title: abstract} in a pickle file.
└── utils
└── crawling.py
└── example
└── mlm-and-clustering.ipynb
from utils.crawling import *
For crawling papers in PMLR
url = 'https://proceedings.mlr.press/v162/
pkl_path = '../data/icml2022.pkl'
crawling_pmlr(url, pkl_path)
For crawling papers in EMNLP
url = "https://aclanthology.org/volumes/2022.emnlp-main/"
pkl_path = '../data/emnlp2022.pkl'
crawling_emnlp(url, pkl_path)
- Using BERT and k-means, we can cluster papers as shown in ./example/mlm-and-clustering.ipynb.