Mini package to crawl submissions from OpenReview
Set up virtual environment of your choice.
Install via GitHub (e.g., by putting openreview-crawler @ git+https://github.com/lisa-wm/openreview-crawler
in your requirements.txt
file).
from openreview_crawler.client import ORClient
from openreview_crawler.utils import get_credentials, extract_papers, flag_keyword
import re
import os
import pandas as pd
Get credentials and instantiate client
usr, pw = get_credentials()
my_client = ORClient(usr, pw)
Find out conference ID for, say, ICML 2023
print([x for x in my_client.get_venues() if 'ICML' in x and '2023' in x])
# ...
venue_id = 'ICML.cc/2023/Conference'
Get accepted papers and extract relevant info
accepted = my_client.get_papers(venue_id, 'accepted')
papers = extract_papers(accepted)
Perform keyword search, adding a binary column for each keyword in a list. Keywords can be
- compositions like information theory ➡️ flag with 1 if composition appears whole
- OR constructions like NN or neural network ➡️ flag with 1 if either appears
- AND constructions like tuning and categorical ➡️ flag if both appear
keywords = ['information theory', 'NN or neural network', 'tuning and categorical']
for k in keywords:
col = []
for r in range(len(papers)):
row = papers.iloc[r, :]
is_match = max(flag_keyword(row['title'], k), flag_keyword(row['abstract'], k))
col.append(is_match)
papers[re.sub(' ', '_', k)] = col