Analysis of top contributors for ICML 2022

This repository analyzes recent icml contributions. If you want to play around with the dataset yourself, you can try it out in the releases section of this repo.

Setup

Follow the script build_and_publish.sh for setup and report generation.

The download uses a multiprocessing architecture to crawl through all paper submissions within several minutes.

I just want to download the dataset

You can download the dataset in the releases section.

Example Analysis

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("../data/records.csv")
df = df.dropna()
df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	paperid	title	author	authorid	abstract	year	institution
11	17199	DynaMixer: A Vision MLP Architecture with Dyna...	Ziyu Wang	72871-17199	Recently, MLP-like vision models have achieved...	2022	Tencent
12	17199	DynaMixer: A Vision MLP Architecture with Dyna...	Wenhao Jiang	72872-17199	Recently, MLP-like vision models have achieved...	2022	Tencent
13	17199	DynaMixer: A Vision MLP Architecture with Dyna...	Yiming Zhu	72873-17199	Recently, MLP-like vision models have achieved...	2022	Graduate school at ShenZhen，Tsinghua university
14	17199	DynaMixer: A Vision MLP Architecture with Dyna...	Li Yuan	72874-17199	Recently, MLP-like vision models have achieved...	2022	Peking University
15	17199	DynaMixer: A Vision MLP Architecture with Dyna...	Yibing Song	50012-17199	Recently, MLP-like vision models have achieved...	2022	Tencent AI Lab
...	...	...	...	...	...	...	...
21230	595	Nyström Method with Kernel K-means++ Samples a...	Dino Oglic	7757-595	We investigate, theoretically and empirically,...	2017	University of Bonn
21231	595	Nyström Method with Kernel K-means++ Samples a...	Thomas Gaertner	8571-595	We investigate, theoretically and empirically,...	2017	The University of Nottingham
21232	708	Scalable Generative Models for Multi-label Lea...	Vikas Jain	6772-708	We present a scalable, generative framework fo...	2017	Indian Institute of Technology Kanpur
21233	708	Scalable Generative Models for Multi-label Lea...	Nirbhay Modhe	8843-708	We present a scalable, generative framework fo...	2017	Georgia Tech
21234	708	Scalable Generative Models for Multi-label Lea...	Piyush Rai	8844-708	We present a scalable, generative framework fo...	2017	IIT Kanpur

17876 rows × 7 columns

Number of individual papers

df["paperid"].nunique()

We can see how the conference grew over time

df.groupby("year")["paperid"].nunique().plot()
plt.ylabel("papers")
pass

These are the Authors with most contributions

df.groupby("author")["paperid"].nunique().sort_values(ascending=False).head(20)

author
Sergey Levine             40
Masashi Sugiyama          36
Pieter Abbeel             30
Gang Niu                  26
Mihaela van der Schaar    24
Stefano Ermon             24
Michael Jordan            22
Andreas Krause            22
Shimon Whiteson           21
Tong Zhang                21
Bernhard Schölkopf        21
Chelsea Finn              21
Bo Han                    21
Jun Zhu                   20
Percy Liang               20
Yoshua Bengio             19
Steven Wu                 19
Zhaoran Wang              19
Zhuoran Yang              19
Tommi Jaakkola            18
Name: paperid, dtype: int64

These are the institutions contributing most

df_leads = df.groupby(["institution", "year"])["paperid"].nunique().unstack().sort_values(2022, ascending=False)
df_leads.to_csv("Leading Institutions.csv")
df_leads.head(30)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

year	2017	2018	2019	2020	2021	2022
institution
Carnegie Mellon University	28.0	22.0	23.0	30.0	38.0	50.0
Google	14.0	30.0	44.0	61.0	54.0	49.0
Tsinghua University	4.0	10.0	12.0	18.0	19.0	44.0
Stanford University	15.0	27.0	24.0	47.0	47.0	41.0
UC Berkeley	18.0	27.0	30.0	41.0	45.0	38.0
MIT	15.0	21.0	29.0	52.0	46.0	36.0
Peking University	5.0	8.0	11.0	10.0	22.0	32.0
University of Oxford	10.0	15.0	18.0	25.0	33.0	30.0
DeepMind	18.0	27.0	21.0	42.0	30.0	26.0
ETH Zurich	8.0	7.0	14.0	16.0	19.0	26.0
Google Brain	17.0	21.0	28.0	36.0	31.0	26.0
Google Research	5.0	4.0	19.0	32.0	41.0	25.0
University of Texas at Austin	7.0	6.0	8.0	21.0	15.0	23.0
Microsoft Research	26.0	14.0	19.0	32.0	38.0	21.0
Stanford	6.0	7.0	6.0	22.0	18.0	20.0
University of Cambridge	11.0	9.0	10.0	13.0	16.0	19.0
Massachusetts Institute of Technology	5.0	7.0	9.0	22.0	13.0	18.0
KAIST	2.0	3.0	13.0	13.0	13.0	18.0
Amazon	5.0	5.0	3.0	9.0	14.0	17.0
University of Washington	5.0	6.0	10.0	16.0	21.0	17.0
Microsoft	8.0	4.0	7.0	9.0	23.0	17.0
University of California, Berkeley	NaN	6.0	10.0	21.0	15.0	17.0
National University of Singapore	2.0	3.0	4.0	15.0	14.0	15.0
University of Wisconsin-Madison	3.0	4.0	4.0	6.0	10.0	15.0
Princeton University	10.0	13.0	15.0	25.0	25.0	15.0
Seoul National University	1.0	2.0	8.0	5.0	7.0	15.0
Purdue University	2.0	9.0	5.0	5.0	7.0	15.0
Columbia University	6.0	10.0	9.0	14.0	11.0	14.0
New York University	4.0	7.0	9.0	8.0	17.0	14.0
EPFL	5.0	12.0	12.0	14.0	14.0	14.0

I am particularily interested in Northeastern, KIT, Tübingen, Munich, Zürich, and RWTH

print("Tübingen", df[df["institution"].str.contains("Tübingen")]["paperid"].nunique())
print("Northeastern", df[df["institution"].str.contains("Northeastern")]["paperid"].nunique())
print("Karlsruhe", df[df["institution"].str.contains("Karlsruhe")]["paperid"].nunique())
print("Munich", df[df["institution"].str.contains("Munich")]["paperid"].nunique())
print("RWTH", df[df["institution"].str.contains("RWTH")]["paperid"].nunique())
print("ETH Zürich", df[df["institution"].str.contains("ETH")]["paperid"].nunique())

Tübingen 51
Northeastern 24
Karlsruhe 2
Munich 32
RWTH 1
ETH Zürich 101

df[df["institution"].str.contains("Northeastern")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Huy Nguyen                 3
Robin Walters              3
Hao Wu                     2
Kaidi Xu                   2
Jung Yeon Park             2
Jonathan Ullman            2
Jan-Willem van de Meent    2
Hongyang Zhang             2
Linfeng Zhao               2
Xiaolong Ma                2
Name: paperid, dtype: int64

df[df["institution"].str.contains("Tübingen")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Bernhard Schölkopf      19
Matthias Hein            5
Ulrike von Luxburg       3
Philipp Hennig           3
Nathanael Bosch          2
Lars Mescheder           2
Nicholas Krämer          2
Erik Daxberger           2
Niki Kilbertus           2
Georgios Arvanitidis     2
Name: paperid, dtype: int64

df[df["institution"].str.contains("ETH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Andreas Krause               22
Martin Vechev                 9
Ce Zhang                      6
Aurelien Lucchi               6
Thomas Hofmann                5
Bastian Rieck                 4
Karsten Borgwardt             4
Francesco Locatello           4
Timon Gehr                    3
Giambattista Parascandolo     3
Name: paperid, dtype: int64

df[df["institution"].str.contains("Munich")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Stephan Günnemann        9
Aleksandar Bojchevski    3
Sandra Hirche            3
Daniel Zügner            3
Thomas Frerix            2
Jonas Umlauft            2
Johannes Gasteiger       2
Stefan Feuerriegel       2
Hinrich Schuetze         2
Bertrand Charpentier     2
Name: paperid, dtype: int64

df[df["institution"].str.contains("RWTH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Ciwan Ceylan    1
Name: paperid, dtype: int64

df[df["institution"].str.contains("Karlsruhe")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)

author
Johannes Fischer       1
Martin Frank           1
Steffen Schotthöfer    1
Tianbai Xiao           1
Name: paperid, dtype: int64

TobiasJacob/icml-crawler

Analysis of top contributors for ICML 2022

Setup

I just want to download the dataset

Example Analysis