/icml-crawler

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Analysis of top contributors for ICML 2022

This repository analyzes recent icml contributions. If you want to play around with the dataset yourself, you can try it out in the releases section of this repo.

Open in Gitpod

Setup

Follow the script build_and_publish.sh for setup and report generation.

The download uses a multiprocessing architecture to crawl through all paper submissions within several minutes.

I just want to download the dataset

You can download the dataset in the releases section.

Example Analysis

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("../data/records.csv")
df = df.dropna()
df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
paperid title author authorid abstract year institution
11 17199 DynaMixer: A Vision MLP Architecture with Dyna... Ziyu Wang 72871-17199 Recently, MLP-like vision models have achieved... 2022 Tencent
12 17199 DynaMixer: A Vision MLP Architecture with Dyna... Wenhao Jiang 72872-17199 Recently, MLP-like vision models have achieved... 2022 Tencent
13 17199 DynaMixer: A Vision MLP Architecture with Dyna... Yiming Zhu 72873-17199 Recently, MLP-like vision models have achieved... 2022 Graduate school at ShenZhen,Tsinghua university
14 17199 DynaMixer: A Vision MLP Architecture with Dyna... Li Yuan 72874-17199 Recently, MLP-like vision models have achieved... 2022 Peking University
15 17199 DynaMixer: A Vision MLP Architecture with Dyna... Yibing Song 50012-17199 Recently, MLP-like vision models have achieved... 2022 Tencent AI Lab
... ... ... ... ... ... ... ...
21230 595 Nyström Method with Kernel K-means++ Samples a... Dino Oglic 7757-595 We investigate, theoretically and empirically,... 2017 University of Bonn
21231 595 Nyström Method with Kernel K-means++ Samples a... Thomas Gaertner 8571-595 We investigate, theoretically and empirically,... 2017 The University of Nottingham
21232 708 Scalable Generative Models for Multi-label Lea... Vikas Jain 6772-708 We present a scalable, generative framework fo... 2017 Indian Institute of Technology Kanpur
21233 708 Scalable Generative Models for Multi-label Lea... Nirbhay Modhe 8843-708 We present a scalable, generative framework fo... 2017 Georgia Tech
21234 708 Scalable Generative Models for Multi-label Lea... Piyush Rai 8844-708 We present a scalable, generative framework fo... 2017 IIT Kanpur

17876 rows × 7 columns

Number of individual papers

df["paperid"].nunique()
4415

We can see how the conference grew over time

df.groupby("year")["paperid"].nunique().plot()
plt.ylabel("papers")
pass

png

These are the Authors with most contributions

df.groupby("author")["paperid"].nunique().sort_values(ascending=False).head(20)
author
Sergey Levine             40
Masashi Sugiyama          36
Pieter Abbeel             30
Gang Niu                  26
Mihaela van der Schaar    24
Stefano Ermon             24
Michael Jordan            22
Andreas Krause            22
Shimon Whiteson           21
Tong Zhang                21
Bernhard Schölkopf        21
Chelsea Finn              21
Bo Han                    21
Jun Zhu                   20
Percy Liang               20
Yoshua Bengio             19
Steven Wu                 19
Zhaoran Wang              19
Zhuoran Yang              19
Tommi Jaakkola            18
Name: paperid, dtype: int64

These are the institutions contributing most

df_leads = df.groupby(["institution", "year"])["paperid"].nunique().unstack().sort_values(2022, ascending=False)
df_leads.to_csv("Leading Institutions.csv")
df_leads.head(30)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
year 2017 2018 2019 2020 2021 2022
institution
Carnegie Mellon University 28.0 22.0 23.0 30.0 38.0 50.0
Google 14.0 30.0 44.0 61.0 54.0 49.0
Tsinghua University 4.0 10.0 12.0 18.0 19.0 44.0
Stanford University 15.0 27.0 24.0 47.0 47.0 41.0
UC Berkeley 18.0 27.0 30.0 41.0 45.0 38.0
MIT 15.0 21.0 29.0 52.0 46.0 36.0
Peking University 5.0 8.0 11.0 10.0 22.0 32.0
University of Oxford 10.0 15.0 18.0 25.0 33.0 30.0
DeepMind 18.0 27.0 21.0 42.0 30.0 26.0
ETH Zurich 8.0 7.0 14.0 16.0 19.0 26.0
Google Brain 17.0 21.0 28.0 36.0 31.0 26.0
Google Research 5.0 4.0 19.0 32.0 41.0 25.0
University of Texas at Austin 7.0 6.0 8.0 21.0 15.0 23.0
Microsoft Research 26.0 14.0 19.0 32.0 38.0 21.0
Stanford 6.0 7.0 6.0 22.0 18.0 20.0
University of Cambridge 11.0 9.0 10.0 13.0 16.0 19.0
Massachusetts Institute of Technology 5.0 7.0 9.0 22.0 13.0 18.0
KAIST 2.0 3.0 13.0 13.0 13.0 18.0
Amazon 5.0 5.0 3.0 9.0 14.0 17.0
University of Washington 5.0 6.0 10.0 16.0 21.0 17.0
Microsoft 8.0 4.0 7.0 9.0 23.0 17.0
University of California, Berkeley NaN 6.0 10.0 21.0 15.0 17.0
National University of Singapore 2.0 3.0 4.0 15.0 14.0 15.0
University of Wisconsin-Madison 3.0 4.0 4.0 6.0 10.0 15.0
Princeton University 10.0 13.0 15.0 25.0 25.0 15.0
Seoul National University 1.0 2.0 8.0 5.0 7.0 15.0
Purdue University 2.0 9.0 5.0 5.0 7.0 15.0
Columbia University 6.0 10.0 9.0 14.0 11.0 14.0
New York University 4.0 7.0 9.0 8.0 17.0 14.0
EPFL 5.0 12.0 12.0 14.0 14.0 14.0

I am particularily interested in Northeastern, KIT, Tübingen, Munich, Zürich, and RWTH

print("Tübingen", df[df["institution"].str.contains("Tübingen")]["paperid"].nunique())
print("Northeastern", df[df["institution"].str.contains("Northeastern")]["paperid"].nunique())
print("Karlsruhe", df[df["institution"].str.contains("Karlsruhe")]["paperid"].nunique())
print("Munich", df[df["institution"].str.contains("Munich")]["paperid"].nunique())
print("RWTH", df[df["institution"].str.contains("RWTH")]["paperid"].nunique())
print("ETH Zürich", df[df["institution"].str.contains("ETH")]["paperid"].nunique())
Tübingen 51
Northeastern 24
Karlsruhe 2
Munich 32
RWTH 1
ETH Zürich 101
df[df["institution"].str.contains("Northeastern")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Huy Nguyen                 3
Robin Walters              3
Hao Wu                     2
Kaidi Xu                   2
Jung Yeon Park             2
Jonathan Ullman            2
Jan-Willem van de Meent    2
Hongyang Zhang             2
Linfeng Zhao               2
Xiaolong Ma                2
Name: paperid, dtype: int64
df[df["institution"].str.contains("Tübingen")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Bernhard Schölkopf      19
Matthias Hein            5
Ulrike von Luxburg       3
Philipp Hennig           3
Nathanael Bosch          2
Lars Mescheder           2
Nicholas Krämer          2
Erik Daxberger           2
Niki Kilbertus           2
Georgios Arvanitidis     2
Name: paperid, dtype: int64
df[df["institution"].str.contains("ETH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Andreas Krause               22
Martin Vechev                 9
Ce Zhang                      6
Aurelien Lucchi               6
Thomas Hofmann                5
Bastian Rieck                 4
Karsten Borgwardt             4
Francesco Locatello           4
Timon Gehr                    3
Giambattista Parascandolo     3
Name: paperid, dtype: int64
df[df["institution"].str.contains("Munich")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Stephan Günnemann        9
Aleksandar Bojchevski    3
Sandra Hirche            3
Daniel Zügner            3
Thomas Frerix            2
Jonas Umlauft            2
Johannes Gasteiger       2
Stefan Feuerriegel       2
Hinrich Schuetze         2
Bertrand Charpentier     2
Name: paperid, dtype: int64
df[df["institution"].str.contains("RWTH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Ciwan Ceylan    1
Name: paperid, dtype: int64
df[df["institution"].str.contains("Karlsruhe")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Johannes Fischer       1
Martin Frank           1
Steffen Schotthöfer    1
Tianbai Xiao           1
Name: paperid, dtype: int64