Graph-Foundation-Models-Are-Already-Here

PRs Welcome

Accompanied repositories for our ICML 2024 spotlight paper Position:Graph Foundation Models Are Already Here

Existing GFM papers

Domain-specific GFMs

  • (BioRxiv '22)[Protein]Language models of protein sequences at the scale of evolution enable accurate structure prediction [Paper]
  • (Arxiv '24)[Molecule]A foundation model for atomistic materials chemistry [Paper]
  • (ICLR '24)[Molecule] From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction [Paper]
  • (Arxiv '23)[Molecule]Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning [Paper]
  • (Arxiv '23)[Molecule] DPA-2: Towards a universal large atomic model for molecular and material simulation [Paper]
  • (Arxiv '24)[Molecule]MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures[Paper]
  • (ICLR '23)[Molecule]Mole-BERT: Rethinking Pre-training Graph Neural Networks for Molecules [Paper]
  • (Arxiv '24)[Molecule]On the Scalability of GNNs for Molecular Graphs [Paper]
  • (Arxiv '24)[Molecule]MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning [Paper]
  • (ICML '24)[Molecule] A Graph is Worth K Words: Euclideanizing Graph using Pure Transformer [Paper]

Task-specific GFMs

  • (Arxiv '24)[Knowledge Graph Reasoning] Zero-shot Logical Query Reasoning on any Knowledge Graph [Paper]
  • (ICLR '24)[Knowledge Graph Reasoning] Towards Foundation Models for Knowledge Graph Reasoning [Paper]
  • (Arxiv '24)[Graph Reasoning] Let Your Graph Do the Talking: Encoding Structured Data for LLMs [Paper]
  • (Arxiv '24)[Node Classification] GraphFM: A Scalable Framework for Multi-Graph Pretraining [Paper]
  • (Arxiv '24)[Node Classification] GraphAny: A Foundation Model for Node Classification on Any Graph [Paper]
  • (Arxiv '23)[Node Classification] GraphText: Graph Reasoning in Text Space [Paper]
  • (LoG '22) [Algorithm Reasoning]A Generalist Neural Algorithmic Learner [Paper]

Primitive GFMs

  • (Arxiv '24) GOFA: A Generative One-For-All Model for Joint Graph Language Modeling [Paper]
  • (Arxiv '24) Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models [Paper]
  • (ICLR '24) One For All: Towards Training One Graph Model For All Classification Tasks [Paper]
  • (NeurIPS '23) PRODIGY: Enabling In-context Learning Over Graphs [Paper]

Scaling Law

  • (Arxiv '24) Towards Neural Scaling Laws for Foundation Models on Temporal Graphs [Paper]
  • (Arxiv '24) Neural Scaling Laws on Graphs [Paper]
  • (NIPS '23) Uncovering neural scaling laws in molecular representation learning [Paper]
  • (NMI '23) Neural scaling of deep chemical models [Paper]

Gathering more real-world data

We put a collection of large-scale real-world datasets below.

Name URL Description
TU-Dataset https://chrsmrrs.github.io/datasets/ A collection of graph-level prediction datasets
NetworkRepository https://networkrepository.com/ The largest graph datasets, with graphs coming from 30+ different domains
Open Graph Benchmark https://ogb.stanford.edu/ Contains a bunch of large-scale graph datasets
Pyg https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html Official datasets provided by Pyg, containing popular datasets for benchmark
SNAP https://snap.stanford.edu/data/ Mainly focus on social network
Aminer https://www.aminer.cn/data/ A collection of academic graphs
OAG https://www.aminer.cn/open-academic-graph A large-scale academic graph
MalNet https://www.mal-net.org/#home A large-scale function calling graph for malware detection
ScholKG https://scholkg.kmi.open.ac.uk/ A large-scale scholarly knowledge graph
Graphium https://github.com/datamol-io/graphium A massive dataset for molecular property prediction
Live Graph Lab https://livegraphlab.github.io/ A large-scale temporal graph for NFT transactions
Temporal Graph Benchmark https://docs.tgb.complexdatalab.com/ A large-scale benchmark for temporal graph learning
MoleculeNet https://moleculenet.org/ A benchmark for molecular machine learning
Recsys data https://cseweb.ucsd.edu/~jmcauley/datasets.html A collection of datasets for recommender systems
LINKX https://github.com/CUAI/Non-Homophily-Large-Scale A collection of large-scale non-homophilous graphs
GADBench https://github.com/squareRoot3/GADBench A collection of datasets for graph-based anomaly detection
Text-attributed datasets https://github.com/CurryTang/TSGFM A collection of text-attributed graph datasets
TAGLAS https://arxiv.org/abs/2406.14683 A collection of text-attributed graph datasets

Theoretical backgrounds for GFM development

A curated list of papers grounding the theoretical foundations of GFMs

General Theory

  • Geometric ML Book Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [Book]
  • (Arxiv '24) Understanding Transformer Reasoning Capabilities via Graph Algorithms [Paper]
  • (Arxiv '24) Future Directions in Foundations of Graph Machine Learning [Paper]
  • (ICML '23) On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology [Paper]
  • (ICLR '22) Understanding over-squashing and bottlenecks on graphs via curvature [Paper]
  • (ICLR '21) HOW NEURAL NETWORKS EXTRAPOLATE: FROM FEEDFORWARD TO GRAPH NEURAL NETWORKS [Paper]
  • (ICMLW '20) A Note on Over-Smoothing for Graph Neural Networks [Paper]
  • (ICLR '20) WHAT CAN NEURAL NETWORKS REASON ABOUT? [Paper]
  • (ICLR '20) The Logical Expressiveness of Graph Neural Networks [Paper]

Node-level tasks (Node classification)

  • (Arxiv '24) Understanding Heterophily for Graph Neural Networks [Paper]
  • (NeurIPS '23) Demystifying Structural Disparity in Graph Neural Networks: Can One Size Fit All? [Paper]
  • (NeurIPS '23) When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability [Paper]
  • (NIPS '23) Demystifying Oversmoothing in Attention-Based Graph Neural Networks [Paper]
  • (ICLR '23) Graph domain adaptation via theory-grounded spectral regularization [Paper]
  • (NIPS '22) Revisiting Heterophily For Graph Neural Networks [Paper]
  • (NIPS '22) Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum [Paper]
  • (ICLR '22) Is Homophily a Necessity for Graph Neural Networks? [Paper]
  • (ICLR '20) Graph Neural Networks Exponentially Lose Expressive Power for Node Classification [Paper]
  • (Annual Review of Sociology '01) Birds of a Feather: Homophily in Social Networks [Paper]

Link-level tasks (Link prediction)

  • (ICLR '24) Revisiting link prediction a data perspective [Paper]
  • (NeurIPS '23) A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge Graphs [Paper]
  • (NIPSW '23) A Multi-Task Perspective for Link Prediction with New Relation Types and Nodes [Paper]
  • (NIPSW '23) Double Equivariance for Inductive Link Prediction for Both New Nodes and New Relation Types [Paper]
  • (ACL '23) Are Message Passing Neural Networks Really Helpful for Knowledge Graph Completion? [Paper]
  • (TMLR '23) You Only Transfer What You Share: Intersection-Induced Graph Transfer Learning for Link Prediction [Paper]
  • (ICLR '22) Equivariant and Stable Positional Encoding for More Powerful Graph Neural Networks [Paper]
  • (NeurIPS '21) Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning [Paper]
  • (ICLR '20) On the Equivalence between Positional Node Embeddings and Structural Graph Representations [Paper]
  • (Scientific Reports '19) Structural transition in social networks: The role of homophily [Paper]
  • (ICML '19) Position-aware Graph Neural Networks [Paper]
  • (NeurIPS '18) Link Prediction Based on Graph Neural Networks [Paper]
  • (TKDE '15) Triadic closure pattern analysis and prediction in social networks [Paper]
  • (KDD '02) SimRank: A Measure of Structural-Context Similarity [Paper]
  • (Social Networks '03) Friends and neighbors on the Web [Paper]
  • (Psychometrika '53) A new status index derived from sociometric analysis [Paper]

Graph-level tasks (Graph classification)

  • (ICLR '24) On the Stability of Expressive Positional Encodings for Graph Neural Networks [Paper]
  • (ICLR '24) Beyond weisfeiler-lehman: A quantitative framework for gnn expressiveness [Paper]
  • (NeurIPS '23) Fine-grained Expressivity of Graph Neural Networks [Paper]
  • (JMLR '23) Weisfeiler and Leman go Machine Learning: The Story so far [Paper]
  • (ICML '23) WL meet VC [Paper]
  • (ICLR '23) Sign and Basis Invariant Networks for Spectral Graph Representation Learning [Paper]
  • (ICLR '23) Rethinking the Expressive Power of GNNs via Graph Biconnectivity [Paper]
  • (NeurIPS '22) Tree Mover’s Distance: Bridging Graph Metrics and Stability of Graph Neural Networks [Paper]
  • (NeurIPS '21) Rethinking Graph Transformers with Spectral Attention [Paper]
  • (Physics Reports '20) Networks beyond pairwise interactions: structure and dynamics [Paper]
  • (Science '16) Higher-order organization of complex networks [Paper]
  • (JMLR '10) Graph Kernels [Paper]
  • (Science '02) Network Motifs: Simple Building Blocks of Complex Networks [Paper]

Transferring across tasks (practical techniques)

  • (Arxiv '23) Graph Prompt Learning: A Comprehensive Survey and Beyond [Paper]
  • (NIPS '23) Universal Prompt Tuning for Graph Neural Networks [Paper]
  • (KDD '23) All in One: Multi-Task Prompting for Graph Neural Networks [Paper]
  • (WWW '23) GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks [Paper]
  • (KDD '22) GPPT: Graph Pre-training and Prompt Tuning to Generalize Graph Neural Networks [Paper]

GFM development: A data perspective

Handling feature heterogeneity

  • (ICMLW '24) Zero-Shot Generalization of GNNs over Distinct Attribute Domains [Paper]
  • (Arxiv '24) GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment [Paper]
  • (Arxiv '24) Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights [Paper]
  • (ICLR '24) One For All: Towards Training One Graph Model For All Classification Tasks [Paper]
  • (Arxiv '23) GraphText: Graph Reasoning in Text Space [Paper]
  • (ICML '23) GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets [Paper]
  • (CVPR '23) Deep graph reprogramming [Paper]
  • (ICLR '23) Confidence-Based Feature Imputation for Graphs with Partially Known Features [Paper]
  • (Future Generation Computer Systems '21) Graph convolutional networks for graphs containing missing features [Paper]

Synthetic data generation

  • (Arxiv '24) Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models [Paper]
  • (NIPS '23) Data-Centric Learning from Unlabeled Graphs with Diffusion Model [Paper]
  • (ICLR '23) Digress: Discrete denoising diffusion for graph generation [Paper]
  • (ICML '22) Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations [Paper]
  • (KDD '22) GraphWorld: Fake Graphs Bring Real Insights for GNNs [Paper]
  • (ICML '21) Graphdf: A discrete flow model for molecular graph generation [Paper]
  • (ICML '20) Hierarchical Generation of Molecular Graphs using Structural Motifs [Paper]
  • (JMLR '10) Kronecker Graphs: An Approach to Modeling Networks [Paper]
  • (NIPS '08) Mixed membership stochastic blockmodels [Paper]
  • (Social Networks '07) An introduction to exponential random graph (p*) models for social networks [Paper]
  • (Reviews of modern physics '02) Statistical mechanics of complex networks [Paper]

GFM development: Backbone models

Message-passing NN & Graph transformers

A paper list of graph transformers [Awesome graph transformers]

  • (TMLR '23) Graph Neural Networks Designed for Different Graph Types: A Survey [Paper]
  • (Arxiv '23) GraphGPT: Graph Learning with Generative Pre-trained Transformers [Paper]
  • (Arxiv '23) Attending to Graph Transformers [Paper]
  • (Arxiv '22) GPS++: An Optimised Hybrid MPNN/Transformer for Molecular Property Prediction [Paper]

LLM for Graph

  • (Arxiv' 24) Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? [Paper]
  • (Arxiv '24) STARK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [Paper]
  • (Arxiv '24) G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [Paper]
  • (Arxiv '24) Let Your Graph Do the Talking: Encoding Structured Data for LLMs [Paper]
  • (Arxiv '24) InstructGraph: Boosting Large Language Models via Graph-centric Instruction Tuning and Preference Alignment [Paper]
  • (ICML '24) LLaGA: Large Language and Graph Assistant [Paper]
  • (ICLR '24) Talk like a Graph: Encoding Graphs for Large Language Models [Paper]
  • (WWW '24) GraphGPT: Graph Instruction Tuning for Large Language Models [Paper]