/openreview-crawler

Primary LanguagePythonMIT LicenseMIT

openreview_crawler

Mini package to crawl submissions from OpenReview

Install

Set up virtual environment of your choice.

Install via GitHub (e.g., by putting openreview-crawler @ git+https://github.com/lisa-wm/openreview-crawler in your requirements.txt file).

Example usage

from openreview_crawler.client import ORClient
from openreview_crawler.utils import get_credentials, extract_papers, flag_keyword
import re
import os
import pandas as pd

Get credentials and instantiate client

usr, pw = get_credentials()
my_client = ORClient(usr, pw)

Find out conference ID for, say, ICML 2023

print([x for x in my_client.get_venues() if 'ICML' in x and '2023' in x])
# ...
venue_id = 'ICML.cc/2023/Conference'

Get accepted papers and extract relevant info

accepted = my_client.get_papers(venue_id, 'accepted')
papers = extract_papers(accepted)

Perform keyword search, adding a binary column for each keyword in a list. Keywords can be

  • compositions like information theory ➡️ flag with 1 if composition appears whole
  • OR constructions like NN or neural network ➡️ flag with 1 if either appears
  • AND constructions like tuning and categorical ➡️ flag if both appear
keywords = ['information theory', 'NN or neural network', 'tuning and categorical']
for k in keywords:
    col = []
    for r in range(len(papers)):
        row = papers.iloc[r, :]
        is_match = max(flag_keyword(row['title'], k), flag_keyword(row['abstract'], k))
        col.append(is_match)
    papers[re.sub(' ', '_', k)] = col