This repository analyzes recent icml contributions. If you want to play around with the dataset yourself, you can try it out in the releases section of this repo.
Follow the script build_and_publish.sh for setup and report generation.
The download uses a multiprocessing architecture to crawl through all paper submissions within several minutes.
You can download the dataset in the releases section.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("../data/records.csv")
df = df.dropna()
df
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
paperid | title | author | authorid | abstract | year | institution | |
---|---|---|---|---|---|---|---|
11 | 17199 | DynaMixer: A Vision MLP Architecture with Dyna... | Ziyu Wang | 72871-17199 | Recently, MLP-like vision models have achieved... | 2022 | Tencent |
12 | 17199 | DynaMixer: A Vision MLP Architecture with Dyna... | Wenhao Jiang | 72872-17199 | Recently, MLP-like vision models have achieved... | 2022 | Tencent |
13 | 17199 | DynaMixer: A Vision MLP Architecture with Dyna... | Yiming Zhu | 72873-17199 | Recently, MLP-like vision models have achieved... | 2022 | Graduate school at ShenZhen,Tsinghua university |
14 | 17199 | DynaMixer: A Vision MLP Architecture with Dyna... | Li Yuan | 72874-17199 | Recently, MLP-like vision models have achieved... | 2022 | Peking University |
15 | 17199 | DynaMixer: A Vision MLP Architecture with Dyna... | Yibing Song | 50012-17199 | Recently, MLP-like vision models have achieved... | 2022 | Tencent AI Lab |
... | ... | ... | ... | ... | ... | ... | ... |
21230 | 595 | Nyström Method with Kernel K-means++ Samples a... | Dino Oglic | 7757-595 | We investigate, theoretically and empirically,... | 2017 | University of Bonn |
21231 | 595 | Nyström Method with Kernel K-means++ Samples a... | Thomas Gaertner | 8571-595 | We investigate, theoretically and empirically,... | 2017 | The University of Nottingham |
21232 | 708 | Scalable Generative Models for Multi-label Lea... | Vikas Jain | 6772-708 | We present a scalable, generative framework fo... | 2017 | Indian Institute of Technology Kanpur |
21233 | 708 | Scalable Generative Models for Multi-label Lea... | Nirbhay Modhe | 8843-708 | We present a scalable, generative framework fo... | 2017 | Georgia Tech |
21234 | 708 | Scalable Generative Models for Multi-label Lea... | Piyush Rai | 8844-708 | We present a scalable, generative framework fo... | 2017 | IIT Kanpur |
17876 rows × 7 columns
Number of individual papers
df["paperid"].nunique()
4415
We can see how the conference grew over time
df.groupby("year")["paperid"].nunique().plot()
plt.ylabel("papers")
pass
These are the Authors with most contributions
df.groupby("author")["paperid"].nunique().sort_values(ascending=False).head(20)
author
Sergey Levine 40
Masashi Sugiyama 36
Pieter Abbeel 30
Gang Niu 26
Mihaela van der Schaar 24
Stefano Ermon 24
Michael Jordan 22
Andreas Krause 22
Shimon Whiteson 21
Tong Zhang 21
Bernhard Schölkopf 21
Chelsea Finn 21
Bo Han 21
Jun Zhu 20
Percy Liang 20
Yoshua Bengio 19
Steven Wu 19
Zhaoran Wang 19
Zhuoran Yang 19
Tommi Jaakkola 18
Name: paperid, dtype: int64
These are the institutions contributing most
df_leads = df.groupby(["institution", "year"])["paperid"].nunique().unstack().sort_values(2022, ascending=False)
df_leads.to_csv("Leading Institutions.csv")
df_leads.head(30)
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
year | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 |
---|---|---|---|---|---|---|
institution | ||||||
Carnegie Mellon University | 28.0 | 22.0 | 23.0 | 30.0 | 38.0 | 50.0 |
14.0 | 30.0 | 44.0 | 61.0 | 54.0 | 49.0 | |
Tsinghua University | 4.0 | 10.0 | 12.0 | 18.0 | 19.0 | 44.0 |
Stanford University | 15.0 | 27.0 | 24.0 | 47.0 | 47.0 | 41.0 |
UC Berkeley | 18.0 | 27.0 | 30.0 | 41.0 | 45.0 | 38.0 |
MIT | 15.0 | 21.0 | 29.0 | 52.0 | 46.0 | 36.0 |
Peking University | 5.0 | 8.0 | 11.0 | 10.0 | 22.0 | 32.0 |
University of Oxford | 10.0 | 15.0 | 18.0 | 25.0 | 33.0 | 30.0 |
DeepMind | 18.0 | 27.0 | 21.0 | 42.0 | 30.0 | 26.0 |
ETH Zurich | 8.0 | 7.0 | 14.0 | 16.0 | 19.0 | 26.0 |
Google Brain | 17.0 | 21.0 | 28.0 | 36.0 | 31.0 | 26.0 |
Google Research | 5.0 | 4.0 | 19.0 | 32.0 | 41.0 | 25.0 |
University of Texas at Austin | 7.0 | 6.0 | 8.0 | 21.0 | 15.0 | 23.0 |
Microsoft Research | 26.0 | 14.0 | 19.0 | 32.0 | 38.0 | 21.0 |
Stanford | 6.0 | 7.0 | 6.0 | 22.0 | 18.0 | 20.0 |
University of Cambridge | 11.0 | 9.0 | 10.0 | 13.0 | 16.0 | 19.0 |
Massachusetts Institute of Technology | 5.0 | 7.0 | 9.0 | 22.0 | 13.0 | 18.0 |
KAIST | 2.0 | 3.0 | 13.0 | 13.0 | 13.0 | 18.0 |
Amazon | 5.0 | 5.0 | 3.0 | 9.0 | 14.0 | 17.0 |
University of Washington | 5.0 | 6.0 | 10.0 | 16.0 | 21.0 | 17.0 |
Microsoft | 8.0 | 4.0 | 7.0 | 9.0 | 23.0 | 17.0 |
University of California, Berkeley | NaN | 6.0 | 10.0 | 21.0 | 15.0 | 17.0 |
National University of Singapore | 2.0 | 3.0 | 4.0 | 15.0 | 14.0 | 15.0 |
University of Wisconsin-Madison | 3.0 | 4.0 | 4.0 | 6.0 | 10.0 | 15.0 |
Princeton University | 10.0 | 13.0 | 15.0 | 25.0 | 25.0 | 15.0 |
Seoul National University | 1.0 | 2.0 | 8.0 | 5.0 | 7.0 | 15.0 |
Purdue University | 2.0 | 9.0 | 5.0 | 5.0 | 7.0 | 15.0 |
Columbia University | 6.0 | 10.0 | 9.0 | 14.0 | 11.0 | 14.0 |
New York University | 4.0 | 7.0 | 9.0 | 8.0 | 17.0 | 14.0 |
EPFL | 5.0 | 12.0 | 12.0 | 14.0 | 14.0 | 14.0 |
I am particularily interested in Northeastern, KIT, Tübingen, Munich, Zürich, and RWTH
print("Tübingen", df[df["institution"].str.contains("Tübingen")]["paperid"].nunique())
print("Northeastern", df[df["institution"].str.contains("Northeastern")]["paperid"].nunique())
print("Karlsruhe", df[df["institution"].str.contains("Karlsruhe")]["paperid"].nunique())
print("Munich", df[df["institution"].str.contains("Munich")]["paperid"].nunique())
print("RWTH", df[df["institution"].str.contains("RWTH")]["paperid"].nunique())
print("ETH Zürich", df[df["institution"].str.contains("ETH")]["paperid"].nunique())
Tübingen 51
Northeastern 24
Karlsruhe 2
Munich 32
RWTH 1
ETH Zürich 101
df[df["institution"].str.contains("Northeastern")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Huy Nguyen 3
Robin Walters 3
Hao Wu 2
Kaidi Xu 2
Jung Yeon Park 2
Jonathan Ullman 2
Jan-Willem van de Meent 2
Hongyang Zhang 2
Linfeng Zhao 2
Xiaolong Ma 2
Name: paperid, dtype: int64
df[df["institution"].str.contains("Tübingen")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Bernhard Schölkopf 19
Matthias Hein 5
Ulrike von Luxburg 3
Philipp Hennig 3
Nathanael Bosch 2
Lars Mescheder 2
Nicholas Krämer 2
Erik Daxberger 2
Niki Kilbertus 2
Georgios Arvanitidis 2
Name: paperid, dtype: int64
df[df["institution"].str.contains("ETH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Andreas Krause 22
Martin Vechev 9
Ce Zhang 6
Aurelien Lucchi 6
Thomas Hofmann 5
Bastian Rieck 4
Karsten Borgwardt 4
Francesco Locatello 4
Timon Gehr 3
Giambattista Parascandolo 3
Name: paperid, dtype: int64
df[df["institution"].str.contains("Munich")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Stephan Günnemann 9
Aleksandar Bojchevski 3
Sandra Hirche 3
Daniel Zügner 3
Thomas Frerix 2
Jonas Umlauft 2
Johannes Gasteiger 2
Stefan Feuerriegel 2
Hinrich Schuetze 2
Bertrand Charpentier 2
Name: paperid, dtype: int64
df[df["institution"].str.contains("RWTH")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Ciwan Ceylan 1
Name: paperid, dtype: int64
df[df["institution"].str.contains("Karlsruhe")].groupby("author")["paperid"].nunique().sort_values(ascending=False).head(10)
author
Johannes Fischer 1
Martin Frank 1
Steffen Schotthöfer 1
Tianbai Xiao 1
Name: paperid, dtype: int64