Deep Learning for Theorem Proving (DL4TP)

Welcome to our repository! This is a curated collection of resources related to deep learning for theorem proving.

We categorize papers primarily based on the applications of deep learning models, organizing them into five task-specific categories and two dataset categories. A single paper may appear in multiple categories due to its relevance to different tasks or datasets. Additionally, each paper is labeled with the used theorem prover/proof calculus/problem domain to help users quickly find the resources that best match their interests or needs. For example, papers using/generating theorems/proofs in natural language are labeled with [NL].

For more details, please refer to our survey paper: A Survey on Deep Learning for Theorem Proving (COLM 2024).

Surveys
Tutorials
Tasks
Datasets
- Data Collection
- Data Generation
Related Surveys
Citation

Surveys

Towards the Automatic Mathematician CADE 2021 [paper]

Rabe, Markus N and Szegedy, Christian
Theorem Proving in Artificial Neural Networks: New Frontiers in Mathematical AI EJPS 2024 [paper]

Pantsar, Markus
Learning Guided Automated Reasoning: A Brief Survey arXiv 2024 [paper]

Blaauwbroek, Lasse and Cerna, David and Gauthier, Thibault and Jakubův, Jan and Kaliszyk, Cezary and Suda, Martin and Urban, Josef
A Survey on Deep Learning for Theorem Proving COLM 2024 [paper]

Li, Zhaoyu and Sun, Jialiang and Murphy, Logan and Su, Qidong and Li, Zenan and Zhang, Xian and Yang, Kaiyu and Si, Xujie

Tutorials

A Tutorial on Neural Theorem Proving IJCAI 2023 Tutorial [link] [Lean, Isabelle]

Welleck, Sean
Tutorial on Machine Learning for Theorem Proving NeurIPS 2023 Tutorial [link] [Coq, Isabelle, Lean]

First, Emily and Jiang, Albert and Yang, Kaiyu

Tasks

Autoformalization

First Experiments with Neural Translation of Informal to Formal Mathematics CICM 2018 [paper] [NL, Mizar]

Wang, Qingxiang and Kaliszyk, Cezary and Urban, Josef
Exploration of Neural Machine Translation in Autoformalization of Mathematics in Mizar CPP 2020 [paper] [NL, Mizar]

Wang, Qingxiang and Brown, Chad and Kaliszyk, Cezary and Urban, Josef
Learning Alignment between Formal & Informal Mathematics AITP 2020 [paper] [NL, HOL Light]

Bansal, Kshitij and Szegedy, Christian
A Promising Path Towards Autoformalization and General Artificial Intelligence CICM 2020 [paper]

Szegedy, Christian
Autoformalization with Large Language Models NeurIPS 2022 [paper] [NL, Isabelle]

Wu, Yuhuai and Jiang, Albert Qiaochu and Li, Wenda and Rabe, Markus and Staats, Charles and Jamnik, Mateja and Szegedy, Christian
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs EMNLP 2022 MathNLP Workshop [paper] [NL, Coq]

Cunningham, Garett and Bunescu, Razvan C and Juedes, David
Towards a Mathematics Formalisation Assistant using Large Language Models arXiv 2022 [paper] [NL, Lean]

Agrawal, Ayush and Gadgil, Siddhartha and Goyal, Navin and Narayanan, Ashvni and Tadipatri, Anand
Towards Automating Formalisation of Theorem Statements using Large Language Models NeurIPS 2022 MATH-AI Workshop [paper] [NL, Lean]

Gadgil, Siddhartha and Tadipatri, Anand Rao and Agrawal, Ayush and Narayanan, Ashvni and Goyal, Navin
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs ICLR 2023 [paper] [NL, Isabelle]

Jiang, Albert Q and Welleck, Sean and Zhou, Jin Peng and Li, Wenda and Liu, Jiacheng and Jamnik, Mateja and Lacroix, Timothée and Wu, Yuhuai and Lample, Guillaume
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning EMNLP 2023 Findings [paper] [NL, Prover9, Z3]

Pan, Liangming and Albalak, Alon and Wang, Xinyi and Wang, William Yang
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers EMNLP 2023 [paper] [NL, Prover9]

Olausson, Theo X and Gu, Alex and Lipkin, Benjamin and Zhang, Cedegao E and Solar-Lezama, Armando and Tenenbaum, Joshua B and Levy, Roger
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting NeurIPS 2023 [paper] [NL, Z3]

Ye, Xi and Chen, Qiaochu and Dillig, Isil and Durrett, Greg
FIMO: A Challenge Formal Dataset for Automated Theorem Proving arXiv 2023 [paper] [NL, Lean]

Liu, Chengwu and Shen, Jianhao and Xin, Huajian and Liu, Zhengying and Yuan, Ye and Wang, Haiming and Ju, Wei and Zheng, Chuanyang and Yin, Yichun and Li, Lin and Zhang, Ming Zhang and Liu, Qun
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving arXiv 2023 [paper] [NL, Isabelle]

Zhao, Xueliang and Li, Wenda and Kong, Lingpeng
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics arXiv 2023 [paper] [NL, Lean]

Azerbayev, Zhangir and Piotrowski, Bartosz and Schoelkopf, Hailey and Ayers, Edward W and Radev, Dragomir and Avigad, Jeremy
Multilingual Mathematical Autoformalization arXiv 2023 [paper] [NL, Isabelle, Lean]

Jiang, Albert Q and Li, Wenda and Jamnik, Mateja
Lyra: Orchestrating Dual Correction in Automated Theorem Proving arXiv 2023 [paper] [NL, Isabelle]

Zheng, Chuanyang and Wang, Haiming and Xie, Enze and Liu, Zhengying and Sun, Jiankai and Xin, Huajian and Shen, Jianhao and Li, Zhenguo and Li, Yu
A New Approach Towards Autoformalization arXiv 2023 [paper] [NL, Lean]

Patel, Nilay and Flanigan, Jeffrey and Saha, Rahul
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data ICLR 2024 [paper] [NL, Lean]

Huang, Yinya and Lin, Xiaohan and Liu, Zhengying and Cao, Qingxing and Xin, Huajian and Wang, Haiming and Li, Zhenguo and Song, Linqi and Liang, Xiaodan
Don't Trust: Verify - Grounding LLM Quantitative Reasoning with Autoformalization ICLR 2024 [paper] [NL, Isabelle]

Zhou, Jin Peng and Staats, Charles E and Li, Wenda and Szegedy, Christian and Weinberger, Kilian Q and Wu, Yuhuai
LEGO-Prover: Neural Theorem Proving with Growing Libraries ICLR 2024 [paper] [NL, Isabelle]

Wang, Haiming and Xin, Huajian and Zheng, Chuanyang and Li, Lin and Liu, Zhengying and Cao, Qingxing and Huang, Yinya and Xiong, Jing and Shi, Han and Xie, Enze and Yin, Jian and Li, Zhenguo and Liao, Heng and Liang, Xiaodan
Llemma: An Open Language Model for Mathematics ICLR 2024 [paper] [NL, Isabelle, Lean]

Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Santos, Marco Dos and McAleer, Stephen and Jiang, Albert Q and Deng, Jia and Biderman, Stella and Welleck, Sean
LeanReasoner: Boosting Complex Logical Reasoning with Lean NAACL 2024 [paper] [NL, Lean]

Jiang, Dongwei and Fonseca, Marcio and Cohen, Shay B
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models arXiv 2024 [paper] [NL, Isabelle]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Zhang, Mingchuan and Li, YK and Wu, Y and Guo, Daya
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning arXiv 2024 [paper] [NL, Lean]

Ying, Huaiyuan and Zhang, Shuo and Li, Linyang and Zhou, Zhejian and Shao, Yunfan and Fei, Zhaoye and Ma, Yichuan and Hong, Jiawei and Liu, Kuikun and Wang, Ziyi and Wang, Yudong and Wu, Zijian and Li, Shuaibin and Zhou, Fengzhe and Liu, Hongwei and Zhang, Songyang and Zhang, Wenwei and Yan, Hang and Qiu, Xipeng and Wang, Jiayu and Chen, Kai and Lin, Dahua
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving arXiv 2024 [paper] [NL, Isabelle]

Quan, Xin and Valentino, Marco and Dennis, Louise A and Freitas, André
Process-Driven Autoformalization in Lean 4 arXiv 2024 [paper] [NL, Lean]

Lu, Jianqiao and Liu, Zhengying and Wan, Yingjia and Huang, Yinya and Wang, Haiming and Yang, Zhicheng and Tang, Jing and Guo, Zhijiang
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data arXiv 2024 [paper] [NL, Lean]

Xin, Huajian and Guo, Daya and Shao, Zhihong and Ren, Zhizhou and Zhu, Qihao and Liu, Bo and Ruan, Chong and Li, Wenda and Liang, Xiaodan
Improving Autoformalization using Type Checking arXiv 2024 [paper] [NL, Lean]

Poiroux, Auguste and Weiss, Gail and Kunčak, Viktor and Bosselut, Antoine
Lean Workbook: A Large-Scale Lean Problem Set Formalized from Natural Language Math Problems arXiv 2024 [paper] [NL, Lean]

Ying, Huaiyuan and Wu, Zijian and Geng, Yihan and Wang, Jiayu and Lin, Dahua and Chen, Kai

Premise Selection

Deepmath - Deep Sequence Models for Premise Selection NeurIPS 2016 [paper] [Mizar]

Irving, Geoffrey and Szegedy, Christian and Alemi, Alexander A and Eén, Niklas and Chollet, François and Urban, Josef
HolStep: A Machine Learning Dataset for Higher-Order Logic Theorem Proving ICLR 2017 [paper] [HOL Light]

Kaliszyk, Cezary and Chollet, François and Szegedy, Christian
Tree-structure CNN for Automated Theorem Proving ICONIP 2017 [paper] [HOL Light]

Peng, Kebin and Ma, Dianfu
Premise Selection for Theorem Proving by Deep Graph Embedding NeurIPS 2017 [paper] [HOL Light]

Wang, Mingzhe and Tang, Yihe and Wang, Jian and Deng, Jia
Premise Selection with Neural Networks and Distributed Representation of Features arXiv 2018 [paper] [Mizar]

Kucik, Andrzej Stanisław and Korovin, Konstantin
HOList: An Environment for Machine Learning of Higher-Order Logic Theorem Proving ICML 2019 [paper] [HOL Light]

Kshitij Bansal and Sarah M. Loos and Markus Norman Rabe and Christian Szegedy and Stewart Wilcox
Learning Representations of Logical Formulae using Graph Neural Networks NeurIPS 2019 GRL Workshop [paper] [Mizar]

Glorot, Xavier and Anand, Ankit and Aygun, Eser and Mourad, Shibl and Kohli, Pushmeet and Precup, Doina
Property Invariant Embedding for Automated Reasoning ECAI 2019 [paper] [Mizar]

Olšák, Miroslav and Kaliszyk, Cezary and Urban, Josef
Usefulness of Lemmas via Graph Neural Networks AITP 2019 [paper] [Mizar]

Goertzel, Zarathustra and Urban, Josef
Improving Graph Neural Network Representations of Logical Formulae with Subgraph Pooling arXiv 2019 [paper] [Mizar, HOL Light]

Crouse, Maxwell and Abdelaziz, Ibrahim and Cornelio, Cristina and Thost, Veronika and Wu, Lingfei and Forbus, Kenneth and Fokoue, Achille
Directed Graph Networks for Logical Reasoning (Extended Abstract) PAAR 2020 [paper] [Mizar]

Rawson, Michael and Reger, Giles
Stateful Premise Selection by Recurrent Neural Networks LPAR 2020 [paper] [Mizar]

Piotrowski, Bartosz and Urban, Josef
Premise Selection in Natural Language Mathematical Texts ACL 2020 [paper] [NL]

Ferreira, Deborah and Freitas, André
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text LREC 2020 [paper] [NL]

Ferreira, Deborah and Freitas, André
TextGraphs 2022 Shared Task on Natural Language Premise Selection TextGraphs 2020 [paper] [NL]

Valentino, Marco and Ferreira, Deborah and Thayaparan, Mokanarangan and Freitas, André and Ustalov, Dmitry
IJS at TextGraphs-16 Natural Language Premise Selection Task: Will Contextual Information Improve Natural Language Premise Selection? TextGraphs 2020 [paper] [NL]

Tran, Thi Hong Hanh and Martinc, Matej and Doucet, Antoine and Pollak, Senja
UNLPS at TextGraphs-16 Natural Language Premise Selection Task: Unsupervised Natural Language Premise Selection in Mathematical Text using Sentence-MPNet TextGraphs 2020 [paper] [NL]

Trust, Paul and Kadusabe, Provia and Younis, Haseeb and Minghim, Rosane and Milios, Evangelos and Zahran, Ahmed
Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving TextGraphs 2020 [paper] [NL]

Dastgheib, Doratossadat and Asgari, Ehsaneddin
TextGraphs-16 Natural Language Premise Selection Task: Zero-Shot Premise Selection with Prompting Generative Language Models TextGraphs 2020 [paper] [NL]

Kovriguina, Liubov and Teucher, Roman and Wardenga, Robert
Attention Recurrent Cross-Graph Neural Network for Selecting Premises IJMLC 2021 [paper] [Mizar]

Liu, Qinghua and Xu, Yang and He, Xingxing
Graph Representations for Higher-Order Logic and Theorem Proving AAAI 2020 [paper] [HOL Light]

Paliwal, Aditya and Loos, Sarah and Rabe, Markus and Bansal, Kshitij and Szegedy, Christian
Improving Stateful Premise Selection with Transformers CICM 2021 [paper] [Mizar]

Proroković, Krsto and Wand, Michael and Schmidhuber, Jürgen
Contrastive Graph Representations for Logical Formulas Embedding TKDE 2021 [paper] [Mizar]

Lin, Qika and Liu, Jun and Zhang, Lingling and Pan, Yudai and Hu, Xin and Xu, Fangzhi and Zeng, Hongwei
Graph Contrastive Pre-training for Effective Theorem Reasoning ICML 2021 SSL Workshop [paper] [Coq]

Li, Zhaoyu and Chen, Binghong and Si, Xujie
NaturalProofs: Mathematical Theorem Proving in Natural Language NeurIPS 2021 [paper] [NL]

Welleck, Sean and Liu, Jiacheng and Bras, Ronan Le and Hajishirzi, Hannaneh and Choi, Yejin and Cho, Kyunghyun
Contrastive Finetuning of Generative Language Models for Informal Premise Selection AITP 2021 [paper] [NL]

Han, Jesse Michael and Xu, Tao and Polu, Stanislas and Neelakantan, Arvind and Radford, Alec
STAR: Cross-modal [STA]tement [R]epresentation for Selecting Relevant Mathematical Premises EACL 2021 [paper] [NL]

Ferreira, Deborah and Freitas, André
A Study of Continuous Vector Representations for Theorem Proving Journal of Logic and Computation 2021 [paper] [Mizar]

Purgał, Stanisław and Parsert, Julian and Kaliszyk, Cezary
Proof Artifact Co-Training for Theorem Proving with Language Models ICLR 2022 [paper] [Lean]

Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas
The Isabelle ENIGMA ITP 2022 [paper] [Isabelle]

Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas
Formal Premise Selection with Language Models AITP 2022 [paper] [Isabelle]

Tworkowski, Szymon and Mikuła, Maciej and Odrzygóżdż, Tomasz and Czechowski, Konrad and Antoniak, Szymon and Jiang, Albert and Szegedy, Christian and Kuciński, Łukasz and Miłoś, Piotr and Wu, Yuhuai
MizAR 60 for Mizar 50 ITP 2023 [paper] [Mizar]

Jakubův, Jan and Chvalovský, Karel and Goertzel, Zarathustra and Kaliszyk, Cezary and Olšák, Mirek and Piotrowski, Bartosz and Schulz, Stephan and Suda, Martin and Urban, Josef
Graph Sequence Learning for Premise Selection AITP 2023 [paper] [Mizar]

Holden, Edvard K and Korovin, Konstantin
CoProver: A Recommender System for Proof Construction CICM 2023 [paper] [PVS]

Yeh, Eric and Hitaj, Briland and Owre, Sam and Quemener, Maena and Shankar, Natarajan
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models NeurIPS 2023 [paper] [Lean]

Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima
MLFMF: Data Sets for Machine Learning for Mathematical Formalization NeurIPS 2023 [paper] [Agda, Lean]

Bauer, Andrej and Petković, Matej and Todorovski, Ljupco
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving AAAI 2024 [paper]

Lamont, Sean and Norrish, Michael and Dezfouli, Amir and Walder, Christian and Montague, Paul
Magnushammer: A Transformer-Based Approach to Premise Selection ICLR 2024 [paper] [Isabelle]

Mikuła, Maciej and Antoniak, Szymon and Tworkowski, Szymon and Jiang, Albert Qiaochu and Zhou, Jin Peng and Szegedy, Christian and Kuciński, Łukasz and Miłos, Piotr and Wu, Yuhuai

Proofstep Generation

Holophrasm: A Neural Automated Theorem Prover for Higher-Order Logic arXiv 2016 [paper] [Metamath]

Whalen, Daniel
GamePad: A Learning Environment for Theorem Proving ICLR 2019 [paper] [Coq]

Huang, Daniel and Dhariwal, Prafulla and Song, Dawn and Sutskever, Ilya
HOList: An Environment for Machine Learning of Higher-Order Logic Theorem Proving ICML 2019 [paper] [HOL Light]

Kshitij Bansal and Sarah M. Loos and Markus Norman Rabe and Christian Szegedy and Stewart Wilcox
Learning to Prove Theorems via Interacting with Proof Assistants ICML 2019 [paper] [Coq]

Yang, Kaiyu and Deng, Jia
Graph Representations for Higher-Order Logic and Theorem Proving AAAI 2020 [paper] [HOL Light]

Paliwal, Aditya and Loos, Sarah and Rabe, Markus and Bansal, Kshitij and Szegedy, Christian
Learning to Prove Theorems by Learning to Generate Theorems NeurIPS 2020 [paper] [Metamath]

Wang, Mingzhe and Deng, Jia
Generating Correctness Proofs with Neural Networks MAPL 2020 [paper] [Coq]

Sanchez-Stern, Alex and Alhessi, Yousef and Saul, Lawrence and Lerner, Sorin
TacTok: Semantics-Aware Proof Synthesis OOPSLA 2020 [paper] [Coq]

First, Emily and Brun, Yuriy and Guha, Arjun
Automated Theorem Proving via Interacting with Proof Assistants by Dynamic Strategies BigCom 2020 [paper] [Coq]

Mo, Guangshuai and Xiong, Yan and Huang, Wenchao and Ma, Lu
Generative Language Modeling for Automated Theorem Proving arXiv 2020 [paper] [Metamath]

Polu, Stanislas and Sutskever, Ilya
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving ICLR 2021 [paper] [inequality]

Wu, Yuhuai and Jiang, Albert Qiaochu and Ba, Jimmy and Grosse, Roger
TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning NeurIPS 2021 [paper] [HOL4]

Wu, Minchao and Norrish, Michael and Walder, Christian and Dezfouli, Amir
LISA: Language Models of Isabelle Proofs AITP 2021 [paper] [Isabelle]

Jiang, Albert Qiaochu and Li, Wenda and Han, Jesse Michael and Wu, Yuhuai
Retrieval-Augmented Proof Step Synthesis AITP 2021 [paper] [HOL Light]

Szegedy, Christian and Rabe, Markus and Michalewski, Henryk
Proof Artifact Co-Training for Theorem Proving with Language Models ICLR 2022 [paper] [Lean]

Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression EMNLP 2022 [paper] [geometry]

Chen, Jiaqi and Li, Tong and Qin, Jinghui and Lu, Pan and Lin, Liang and Chen, Chongyu and Liang, Xiaodan
Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs EMNLP 2022 MathNLP Workshop [paper] [NL, Coq]

Cunningham, Garett and Bunescu, Razvan C and Juedes, David
HyperTree Proof Search for Neural Theorem Proving NeurIPS 2022 [paper] [Metamath, Lean]

Lample, Guillaume and Lacroix, Timothee and Lachaux, Marie-Anne and Rodriguez, Aurelien and Hayat, Amaury and Lavril, Thibaut and Ebner, Gabriel and Martinet, Xavier
NaturalProver: Grounded Mathematical Proof Generation with Language Models NeurIPS 2022 [paper] [NL]

Welleck, Sean and Liu, Jiacheng and Lu, Ximing and Hajishirzi, Hannaneh and Choi, Yejin
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers NeurIPS 2022 [paper] [Isabelle]

Jiang, Albert Qiaochu and Li, Wenda and Tworkowski, Szymon and Czechowski, Konrad and Odrzygóżdż, Tomasz and Miłos, Piotr and Wu, Yuhuai and Jamnik, Mateja
Diversity-Driven Automated Formal Verification ICSE 2022 [paper] [Coq]

First, Emily and Brun, Yuriy
Formal Premise Selection with Language Models AITP 2022 [paper] [Isabelle]

Tworkowski, Szymon and Mikuła, Maciej and Odrzygóżdż, Tomasz and Czechowski, Konrad and Antoniak, Szymon and Jiang, Albert and Szegedy, Christian and Kuciński, Łukasz and Miłoś, Piotr and Wu, Yuhuai
Passport: Improving Automated Formal Verification Using Identifiers TOPLAS 2023 [paper] [Coq]

Sanchez-Stern, Alex and First, Emily and Zhou, Timothy and Kaufman, Zhanna and Brun, Yuriy and Ringer, Talia
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs ICLR 2023 [paper] [NL, Isabelle]

Jiang, Albert Q and Welleck, Sean and Zhou, Jin Peng and Li, Wenda and Liu, Jiacheng and Jamnik, Mateja and Lacroix, Timothée and Wu, Yuhuai and Lample, Guillaume
Formal Mathematics Statement Curriculum Learning ICLR 2023 [paper] [Lean]

Polu, Stanislas and Han, Jesse Michael and Zheng, Kunhao and Baksys, Mantas and Babuschkin, Igor and Sutskever, Ilya
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving arXiv 2023 [paper] [NL, Isabelle]

Zhao, Xueliang and Li, Wenda and Kong, Lingpeng
CoProver: A Recommender System for Proof Construction CICM 2023 [paper] [PVS]

Yeh, Eric and Hitaj, Briland and Owre, Sam and Quemener, Maena and Shankar, Natarajan
Baldur: Whole-Proof Generation and Repair with Large Language Models ESEC/FSE 2023 [paper] [Isabelle]

First, Emily and Rabe, Markus and Ringer, Talia and Brun, Yuriy
Peano: Learning Formal Mathematical Reasoning Philosophical Transactions of the Royal Society A 2023 [paper] [Peano]

Poesia, Gabriel and Goodman, Noah D
Mathematical Capabilities of ChatGPT NeurIPS 2023 [paper] [NL]

Simon Frieder and Luca Pinchetti and Alexis Chevalier and Ryan-Rhys Griffiths and Tommaso Salvatori and Thomas Lukasiewicz and Philipp Christian Petersen and Julius Berner
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models NeurIPS 2023 [paper] [Lean]

Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima
LLM vs ITP NeurIPS 2023 MATH-AI Workshop [paper] [NL]

Frieder, Simon and Trimmel, Martin and Alawadhi, Rashid and Gy, Klaus
LLMSTEP: LLM Proofstep Suggestions in Lean NeurIPS 2023 MATH-AI Workshop [paper] [Lean]

Welleck, Sean and Saha, Rahul
Towards Large Language Models as Copilots for Theorem Proving in Lean NeurIPS 2023 MATH-AI Workshop [paper] [Lean]

Song, Peiyang and Yang, Kaiyu and Anandkumar, Anima
Temperature-Scaled Large Language Models for Lean Proofstep Prediction NeurIPS 2023 MATH-AI Workshop [paper] [Lean]

Gloeckle, Fabian and Roziere, Baptiste and Hayat, Amaury and Synnaeve, Gabriel
DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function ACL 2023 [paper] [Isabelle, Lean]

Wang, Haiming and Yuan, Ye and Liu, Zhengying and Shen, Jianhao and Yin, Yichun and Xiong, Jing and Xie, Enze and Shi, Han and Li, Yujun and Li, Lin and Yin, Jian and Li, Zhenguo and Liang, Xiaodan
Getting More out of Large Language Models for Proofs AITP 2023 [paper] [Coq]

Zhang, Shizhuo Dylan and Ringer, Talia and First, Emily
UniMath: A Foundational and Multimodal Mathematical Reasoner EMNLP 2023 [paper] [geometry]

Liang, Zhenwen and Yang, Tianyu and Zhang, Jipeng and Zhang, Xiangliang
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models EMNLP 2023 [paper] [Lean]

Xiong, Jing and Shen, Jianhao and Yuan, Ye and Wang, Haiming and Yin, Yichun and Liu, Zhengying and Li, Lin and Guo, Zhijiang and Cao, Qingxing and Huang, Yinya and Zheng, Chuanyang and Liang, Xiaodan and Zhang, Ming and Liu, Qun
Learning Proof Transformations and Its Applications in Interactive Theorem Proving FroCoS 2023 [paper] [Coq]

Zhang, Liao and Blaauwbroek, Lasse and Kaliszyk, Cezary and Urban, Josef
Evaluating Language Models for Mathematics through Interactions arXiv 2023 [paper] [NL]

Collins, Katherine M and Jiang, Albert Q and Frieder, Simon and Wong, Lionel and Zilka, Miri and Bhatt, Umang and Lukasiewicz, Thomas and Wu, Yuhuai and Tenenbaum, Joshua B and Hart, William and Gowers, Timothy and Li, Wenda and Weller, Adrian and Jamnik, Mateja
Large Language Models for Mathematicians arXiv 2023 [paper] [NL]

Scheidt, Gregor vom
Experimental Results from Applying GPT-4 to An Unpublished Formal Language arXiv 2023 [paper] [Axiotome]

Scheidt, Gregor vom
Large Language Models' Understanding of Math: Source Criticism and Extrapolation arXiv 2023 [paper] [Lean]

Yousefzadeh, Roozbeh and Cao, Xuenan
Lyra: Orchestrating Dual Correction in Automated Theorem Proving arXiv 2023 [paper] [NL, Isabelle]

Zheng, Chuanyang and Wang, Haiming and Xie, Enze and Liu, Zhengying and Sun, Jiankai and Xin, Huajian and Shen, Jianhao and Li, Zhenguo and Li, Yu
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method arXiv 2023 [paper] [Lean]

Vishwakarma, Rahul and Mishra, Subhankar
An In-Context Learning Agent for Formal Theorem-Proving arXiv 2023 [paper] [Lean, Coq]

Thakur, Amitayush and Tsoukalas, George and Wen, Yeming and Xin, Jimmy and Chaudhuri, Swarat
LEGO-Prover: Neural Theorem Proving with Growing Libraries ICLR 2024 [paper] [NL, Isabelle]

Wang, Haiming and Xin, Huajian and Zheng, Chuanyang and Li, Lin and Liu, Zhengying and Cao, Qingxing and Huang, Yinya and Xiong, Jing and Shi, Han and Xie, Enze and Yin, Jian and Li, Zhenguo and Liao, Heng and Liang, Xiaodan
Llemma: An Open Language Model for Mathematics ICLR 2024 [paper] [NL, Isabelle, Lean]

Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Santos, Marco Dos and McAleer, Stephen and Jiang, Albert Q and Deng, Jia and Biderman, Stella and Welleck, Sean
Solving Proof Block Problems Using Large Language Models SIGCSE 2024 [paper] [NL]

Poulsen, Seth and Sarsa, Sami and Prather, James and Leinonen, Juho and Becker, Brett A and Hellas, Arto and Denny, Paul and Reeves, Brent N
Solving Olympiad Geometry without Human Demonstrations Nature 2024 [paper] [geometry]

Trinh, Trieu H and Wu, Yuhuai and Le, Quoc V and He, He and Luong, Thang
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving AAAI 2024 [paper]

Lamont, Sean and Norrish, Michael and Dezfouli, Amir and Walder, Christian and Montague, Paul
Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem proving arXiv 2024 [paper] [Coq]

Rute, Jason and Olšák, Miroslav and Blaauwbroek, Lasse and Massolo, Fidel Ivan Schaposnik and Piepenbrock, Jelle and Pestun, Vasily
Selene: Pioneering Automated Proof in Software Verification ACL 2024 [paper] [Isabelle]

Zhang, Lichen and Lu, Shuai and Duan, Nan
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models arXiv 2024 [paper] [NL, Isabelle]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Zhang, Mingchuan and Li, YK and Wu, Y and Guo, Daya
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning arXiv 2024 [paper] [NL, Lean]

Ying, Huaiyuan and Zhang, Shuo and Li, Linyang and Zhou, Zhejian and Shao, Yunfan and Fei, Zhaoye and Ma, Yichuan and Hong, Jiawei and Liu, Kuikun and Wang, Ziyi and Wang, Yudong and Wu, Zijian and Li, Shuaibin and Zhou, Fengzhe and Liu, Hongwei and Zhang, Songyang and Zhang, Wenwei and Yan, Hang and Qiu, Xipeng and Wang, Jiayu and Chen, Kai and Lin, Dahua
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search arXiv 2024 [paper] [Dafny, Coq, Lean]

Brandfonbrener, David and Raja, Sibi and Prasad, Tarun and Loughridge, Chloe and Yang, Jianang and Henniger, Simon and Byrd, William E and Zinkov, Robert and Amin, Nada
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data arXiv 2024 [paper] [NL, Lean]

Xin, Huajian and Guo, Daya and Shao, Zhihong and Ren, Zhizhou and Zhu, Qihao and Liu, Bo and Ruan, Chong and Li, Wenda and Liang, Xiaodan
Proving Theorems Recursively arXiv 2024 [paper] [Isabelle]

Wang, Haiming and Xin, Huajian and Liu, Zhengying and Li, Wenda and Huang, Yinya and Lu, Jianqiao and Yang, Zhicheng and Tang, Jing and Yin, Jian and Li, Zhenguo and Liang, Xiaodan
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving arXiv 2024 [paper] [Isabelle]

Lin, Xiaohan and Cao, Qingxing and Huang, Yinya and Wang, Haiming and Lu, Jianqiao and Liu, Zhengying and Song, Linqi and Liang, Xiaodan
Lean-STaR: Learning to Interleave Thinking and Proving arXiv 2024 [paper] [Isabelle]

Lin, Haohan and Sun, Zhiqing and Yang, Yiming and Welleck, Sean

Proof Search

Deep Network Guided Proof Search LPAR 2017 [paper] [E]

Loos, Sarah and Irving, Geoffrey and Szegedy, Christian and Kaliszyk, Cezary
Automated Theorem Proving in Intuitionistic Propositional Logic by Deep Reinforcement Learning arXiv 2018 [paper] [IPL]

Kusumoto, Mitsuru and Yahata, Keisuke and Sakai, Masahiro
ENIGMA-NG: Efficient Neural and Gradient-Boosted Inference Guidance for E CADE 2019 [paper] [ENIGMA]

Chvalovský, Karel and Jakubův, Jan and Suda, Martin and Urban, Josef
Hammering Mizar by Learning Clause Guidance ITP 2019 [paper] [ENIGMA]

Jakubův, Jan and Urban, Josef
Learning Dynamic Polynomial Proofs NeurIPS 2019 [paper] [polynomial inequality]

Fawzi, Alhussein and Malinowski, Mateusz and Fawzi, Hamza and Fawzi, Omar
A Neurally-Guided, Parallel Theorem Prover FroCoS 2019 [paper] [Z3]

Rawson, Michael and Reger, Giles
Property Invariant Embedding for Automated Reasoning ECAI 2019 [paper] [leanCoP]

Olšák, Miroslav and Kaliszyk, Cezary and Urban, Josef
Automated Theorem Proving via Interacting with Proof Assistants by Dynamic Strategies BigCom 2020 [paper] [Coq]

Mo, Guangshuai and Xiong, Yan and Huang, Wenchao and Ma, Lu
Guiding Inferences in Connection Tableau by Recurrent Neural Networks CICM 2020 [paper] [connection tableau]

Piotrowski, Bartosz and Urban, Josef
ENIGMA Anonymous: Symbol-Independent Inference Guiding Machine (System Description) IJCAR 2020 [paper] [ENIGMA]

Jakubův, Jan and Chvalovský, Karel and Olšák, Miroslav and Piotrowski, Bartosz and Suda, Martin and Urban, Josef
Deep Reinforcement Learning for Synthesizing Functions in Higher-Order Logic LPAR 2020 [paper] [HOL4]

Gauthier, Thibault
Generative Language Modeling for Automated Theorem Proving arXiv 2020 [paper] [Metamath]

Polu, Stanislas and Sutskever, Ilya
Learning to Prove from Synthetic Theorems arXiv 2020 [paper] [saturation]

Aygün, Eser and Ahmed, Zafarali and Anand, Ankit and Firoiu, Vlad and Glorot, Xavier and Orseau, Laurent and Precup, Doina and Mourad, Shibl
An Experimental Study of Formula Embeddings for Automated Theorem Proving in First-Order Logic arXiv 2020 [paper] [TRAIL]

Abdelaziz, Ibrahim and Thost, Veronika and Crouse, Maxwell and Fokoue, Achille
Training a First-Order Theorem Prover from Synthetic Data ICLR 2021 MATH-AI Workshop [paper] [saturation]

Firoiu, Vlad and Aygün, Eser and Anand, Ankit and Ahmed, Zafarali and Glorot, Xavier and Orseau, Laurent and Zhang, Lei and Precup, Doina and Mourad, Shibl
Learned Provability Likelihood for Tactical Search SCSS 2021 [paper] [HOL4]

Gauthier, Thibault
TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning NeurIPS 2021 [paper] [HOL4]

Wu, Minchao and Norrish, Michael and Walder, Christian and Dezfouli, Amir
Vampire with a Brain Is a Good ITP Hammer FroCoS 2021 [paper] [Vampire]

Suda, Martin
Fast and Slow Enigmas and Parental Guidance FroCoS 2021 [paper] [ENIGMA]

Goertzel, Zarathustra A and Chvalovský, Karel and Jakubův, Jan and Olšák, Miroslav and Urban, Josef
Towards Finding Longer Proofs TABLEAUX 2021 [paper] [leanCoP]

Zombori, Zsolt and Csiszárik, Adrián and Michalewski, Henryk and Kaliszyk, Cezary and Urban, Josef
lazyCoP: Lazy Paramodulation Meets Neurally Guided Search TABLEAUX 2021 [paper] [leanCoP]

Rawson, Michael and Reger, Giles
The Role of Entropy in Guiding a Connection Prover TABLEAUX 2021 [paper] [leanCoP]

Zombori, Zsolt and Urban, Josef and Olšák, Miroslav
Learning Theorem Proving Components TABLEAUX 2021 [paper] [ENIGMA]

Chvalovský, Karel and Jakubův, Jan and Olšák, Miroslav and Urban, Josef
A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving AAAI 2021 [paper] [TRAIL]

Crouse, Maxwell and Abdelaziz, Ibrahim and Makni, Bassem and Whitehead, Spencer and Cornelio, Cristina and Kapanipathi, Pavan and Srinivas, Kavitha and Thost, Veronika and Witbrock, Michael and Fokoue, Achille
Improving ENIGMA-Style Clause Selection While Learning From History CADE 2021 [paper] [Vampire]

Suda, Martin
Proving Theorems using Incremental Learning and Hindsight Experience Replay ICML 2022 [paper] [saturation]

Aygün, Eser and Anand, Ankit and Orseau, Laurent and Glorot, Xavier and Mcaleer, Stephen M and Firoiu, Vlad and Zhang, Lei M and Precup, Doina and Mourad, Shibl
HyperTree Proof Search for Neural Theorem Proving NeurIPS 2022 [paper] [Metamath, Lean]

Lample, Guillaume and Lacroix, Timothee and Lachaux, Marie-Anne and Rodriguez, Aurelien and Hayat, Amaury and Lavril, Thibaut and Ebner, Gabriel and Martinet, Xavier
Learning to Prove Trigonometric Identities arXiv 2022 [paper] [trigonometric identity]

Liu, Zhou and Li, Yujun and Liu, Zhengying and Li, Lin and Li, Zhenguo
Learning to Guide a Saturation-Based Theorem Prover TPAMI 2022 [paper] [TRAIL]

Abdelaziz, Ibrahim and Crouse, Maxwell and Makni, Bassem and Austil, Vernon and Cornelio, Cristina and Ikbal, Shajith and Kapanipathi, Pavan and Makondo, Ndivhuwo and Srinivas, Kavitha and Witbrock, Michael and Fokoue, Achille
Machine Learning Meets the Herbrand Universe arXiv 2022 [paper] [instantiation]

Piepenbrock, Jelle and Urban, Josef and Korovin, Konstantin and Olšák, Miroslav and Heskes, Tom and Janota, Mikolaš
Guiding An Automated Theorem Prover with Neural Rewriting IJCAR 2022 [paper] [Prover9]

Piepenbrock, Jelle and Heskes, Tom and Janota, Mikoláš and Urban, Josef
The Isabelle ENIGMA ITP 2022 [paper] [ENIGMA]

Goertzel, Zarathustra A and Jakubův, Jan and Kaliszyk, Cezary and Olšák, Miroslav and Piepenbrock, Jelle and Urban, Josef
Formal Mathematics Statement Curriculum Learning ICLR 2023 [paper] [Lean]

Polu, Stanislas and Han, Jesse Michael and Zheng, Kunhao and Baksys, Mantas and Babuschkin, Igor and Sutskever, Ilya
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations IJCAI 2023 [paper] [TRAIL]

Fokoue, Achille and Abdelaziz, Ibrahim and Crouse, Maxwell and Ikbal, Shajith and Kishimoto, Akihiro and Lima, Guilherme and Makondo, Ndivhuwo and Marinescu, Radu
Peano: Learning Formal Mathematical Reasoning Philosophical Transactions of the Royal Society A 2023 [paper] [Peano]

Poesia, Gabriel and Goodman, Noah D
MizAR 60 for Mizar 50 ITP 2023 [paper] [ENIGMA]

Jakubův, Jan and Chvalovský, Karel and Goertzel, Zarathustra and Kaliszyk, Cezary and Olšák, Mirek and Piotrowski, Bartosz and Schulz, Stephan and Suda, Martin and Urban, Josef
Reinforcement Learning for Guiding the E Theorem Prover FLAIRS 2023 [paper] [E]

McKeown, Jack and Sutcliffe, Geoff
Guiding an Instantiation Prover with Graph Neural Networks LPAR 2023 [paper] [iProver]

Chvalovský, Karel and Korovin, Konstantin and Piepenbrock, Jelle and Urban, Josef
How Much Should This Symbol Weigh? A GNN-Advised Clause Selection LPAR 2023 [paper] [Vampire]

Bártěk, Filip and Suda, Martin
gym-saturation: Gymnasium Environments for Saturation Provers (System Description) TABLEAUX 2023 [paper] [Vampire, iProver]

Shminke, Boris
DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-Level Value Function ACL 2023 [paper] [Isabelle, Lean]

Wang, Haiming and Yuan, Ye and Liu, Zhengying and Shen, Jianhao and Yin, Yichun and Xiong, Jing and Xie, Enze and Shi, Han and Li, Yujun and Li, Lin and Yin, Jian and Li, Zhenguo and Liang, Xiaodan
An In-Context Learning Agent for Formal Theorem-Proving arXiv 2023 [paper] [Lean, Coq]

Thakur, Amitayush and Tsoukalas, George and Wen, Yeming and Xin, Jimmy and Chaudhuri, Swarat
Verified Multi-Step Synthesis using Large Language Models and Monte Carlo Tree Search arXiv 2024 [paper] [Dafny, Coq, Lean]

Brandfonbrener, David and Raja, Sibi and Prasad, Tarun and Loughridge, Chloe and Yang, Jianang and Henniger, Simon and Byrd, William E and Zinkov, Robert and Amin, Nada
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving arXiv 2024 [paper] [Lean]

An, Chenyang and Chen, Zhibo and Ye, Qihao and First, Emily and Peng, Letian and Zhang, Jiayun and Wang, Zihan and Lerner, Sorin and Shang, Jingbo
Proving Olympiad Algebraic Inequalities without Human Demonstrations arXiv 2024 [paper] [Inequality]

Wei, Chenrui and Sun, Mengzhou and Wang, Wei

Other Tasks

Transformers as Soft Reasoners over Language IJCAI 2020 [paper] [NL]

Clark, Peter and Tafjord, Oyvind and Richardson, Kyle
PRover: Proof Generation for Interpretable Reasoning over Rules EMNLP 2020 [paper] [NL]

Saha, Swarnadeep and Ghosh, Sayan and Srivastava, Shashank and Bansal, Mohit
First Neural Conjecturing Datasets and Experiments CICM 2020 [paper] [Mizar]

Urban, Josef and Jakubův, Jan
Guiding Inferences in Connection Tableau by Recurrent Neural Networks CICM 2020 [paper] [connection tableau]

Piotrowski, Bartosz and Urban, Josef
Mathematical Reasoning in Latent Space ICLR 2020 [paper] [HOL Light]

Lee, Dennis and Szegedy, Christian and Rabe, Markus N and Loos, Sarah M and Bansal, Kshitij
Explaining Answers with Entailment Trees EMNLP 2021 [paper] [NL]

Dalvi, Bhavana and Jansen, Peter and Tafjord, Oyvind and Xie, Zhengnan and Smith, Hannah and Pipatanangkura, Leighanna and Clark, Peter
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language ACL 2021 [paper] [NL]

Tafjord, Oyvind and Mishra, Bhavana Dalvi and Clark, Peter
Mathematical Reasoning via Self-supervised Skip-tree Training ICLR 2021 [paper] [HOL Light]

Rabe, Markus N and Lee, Dennis and Bansal, Kshitij and Szegedy, Christian
Latent Action Space for Efficient Planning in Theorem Proving AITP 2021 [paper] [inequality]

Wu, Minchao and Wu, Yuhuai
IsarStep: A Benchmark for High-Level Mathematical Reasoning ICLR 2021 [paper] [Isabelle]

Li, Wenda and Yu, Lei and Wu, Yuhuai and Paulson, Lawrence C
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning ICML 2021 [paper] [Isabelle, HOL Light, Metamath, Lean]

Wu, Yuhuai and Rabe, Markus N and Li, Wenda and Ba, Jimmy and Grosse, Roger B and Szegedy, Christian
FaiRR: Faithful and Robust Deductive Reasoning over Natural Language ACL 2022 [paper] [NL]

Sanyal, Soumya and Singh, Harman and Ren, Xiang
Natural Language Deduction through Search over Statement Compositions EMNLP 2022 [paper] [NL]

Bostrom, Kaj and Sprague, Zayne and Chaudhuri, Swarat and Durrett, Greg
MetGen: A ModuleBased Entailment Tree Generation Framework for Answer Explanation NAACL 2022 [paper] [NL]

Hong, Ruixin and Zhang, Hongming and Yu, Xintong and Zhang, Changshui
Towards Teachable Reasoning Systems: Using a Dynamic Memory of User Feedback for Continual System Improvement arXiv 2022 [paper] [NL]

Mishra, Bhavana Dalvi and Tafjord, Oyvind and Clark, Peter
Generating Natural Language Proofs with Verifier-Guided Search EMNLP 2022 [paper] [NL]

Yang, Kaiyu and Deng, Jia and Chen, Danqi
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning EMNLP 2022 [paper] [NL]

Tafjord, Oyvind and Mishra, Bhavana Dalvi and Clark, Peter
Proof Artifact Co-Training for Theorem Proving with Language Models ICLR 2022 [paper] [Lean]

Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas
Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic ICML 2023 [paper] [NL]

Morishita, Terufumi and Morio, Gaku and Yamaguchi, Atsuki and Sogawa, Yasuhiro
Language Models are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought ICLR 2023 [paper] [NL]

Saparov, Abulhair and He, He
Exploring Mathematical Conjecturing with Large Language Models NeSy 2023 [paper] [Isabelle]

Johansson, Moa and Smallbone, Nicholas
BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs EACL 2023 [paper] [NL]

Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B
REFACTOR: Learning to Extract Theorems from Proofs ICLR 2024 [paper] [Metamath]

Zhou, Jin Peng and Wu, Yuhuai and Li, Qiyang and Grosse, Roger
Machine Learning and Information Theory Concepts Towards an AI Mathematician AMS 2024 [paper]

Bengio, Yoshua and Nikolay Malkin
Learning Formal Mathematics From Intrinsic Motivation arXiv 2024 [paper]

Poesia, Gabriel and Broman, David and Haber, Nick and Goodman, Noah D

Datasets

Data Collection

Premise Selection for Mathematics by Corpus Analysis and Kernel Methods Journal of Automated Reasoning 2014 [paper] [Mizar]

Alama, Jesse and Heskes, Tom and Kühlwein, Daniel and Tsivtsivadze, Evgeni and Urban, Josef
MizAR 40 for Mizar 40 Journal of Automated Reasoning 2015 [paper] [Mizar]

Kaliszyk, Cezary and Urban, Josef
The TPTP Problem Library and Associated Infrastructure Journal of Automated Reasoning 2017 [paper] [TPTP]

Sutcliffe, Geoff
HolStep: A Machine Learning Dataset for Higher-Order Logic Theorem Proving ICLR 2017 [paper] [HOL Light]

Kaliszyk, Cezary and Chollet, François and Szegedy, Christian
Reinforcement Learning of Theorem Proving NeurIPS 2018 [paper] [Mizar]

Kaliszyk, Cezary and Urban, Josef and Michalewski, Henryk and Olšák, Miroslav
GamePad: A Learning Environment for Theorem Proving ICLR 2019 [paper] [Coq]

Huang, Daniel and Dhariwal, Prafulla and Song, Dawn and Sutskever, Ilya
HOList: An Environment for Machine Learning of Higher-Order Logic Theorem Proving ICML 2019 [paper] [HOL Light]

Kshitij Bansal and Sarah M. Loos and Markus Norman Rabe and Christian Szegedy and Stewart Wilcox
Learning to Prove Theorems via Interacting with Proof Assistants ICML 2019 [paper] [Coq]

Yang, Kaiyu and Deng, Jia
TacticToe: Learning to Prove with Tactics Journal of Automated Reasoning 2020 [paper] [HOL4]

Gauthier, Thibault and Kaliszyk, Cezary and Urban, Josef
Generative Language Modeling for Automated Theorem Proving arXiv 2020 [paper] [pretraining, Metamath]

Polu, Stanislas and Sutskever, Ilya
The Tactician: A Seamless, Interactive Tactic Learner and Prover for Coq CICM 2020 [paper] [Coq]

Blaauwbroek, Lasse and Urban, Josef and Geuvers, Herman
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text LREC 2020 [paper] [NL]

Ferreira, Deborah and Freitas, André
IsarStep: A Benchmark for High-Level Mathematical Reasoning ICLR 2021 [paper] [Isabelle]

Li, Wenda and Yu, Lei and Wu, Yuhuai and Paulson, Lawrence C
NaturalProofs: Mathematical Theorem Proving in Natural Language NeurIPS 2021 [paper] [NL]

Welleck, Sean and Liu, Jiacheng and Bras, Ronan Le and Hajishirzi, Hannaneh and Choi, Yejin and Cho, Kyunghyun
LISA: Language Models of Isabelle Proofs AITP 2021 [paper] [Isabelle]

Jiang, Albert Qiaochu and Li, Wenda and Han, Jesse Michael and Wu, Yuhuai
Proof Artifact Co-Training for Theorem Proving with Language Models ICLR 2022 [paper] [Lean]

Han, Jesse Michael and Rute, Jason and Wu, Yuhuai and Ayers, Edward W and Polu, Stanislas
MiniF2F: A Cross-System Benchmark for Formal Olympiad-Level Mathematics ICLR 2022 [paper] [Metamath, Lean, Isabelle, HOL Light]

Zheng, Kunhao and Han, Jesse Michael and Polu, Stanislas
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression EMNLP 2022 [paper] [geometry]

Chen, Jiaqi and Li, Tong and Qin, Jinghui and Lu, Pan and Lin, Liang and Chen, Chongyu and Liang, Xiaodan
NaturalProver: Grounded Mathematical Proof Generation with Language Models NeurIPS 2022 [paper] [NL]

Welleck, Sean and Liu, Jiacheng and Lu, Ximing and Hajishirzi, Hannaneh and Choi, Yejin
A Parallel Corpus of Natural Language and Isabelle Artefacts AITP 2022 [paper] [NL, Isabelle]

Bordg, Anthony and Stathopoulos, Yiannos A and Paulson, Lawrence C
Proof Repair Infrastructure for Supervised Models: Building a Large Proof Repair Dataset ITP 2023 [paper] [Coq]

Reichel, Tom and Henderson, R and Touchet, Andrew and Gardner, Andrew and Ringer, Talia
BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs EACL 2023 [paper] [NL]

Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models NeurIPS 2023 [paper] [Lean]

Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima
MLFMF: Data Sets for Machine Learning for Mathematical Formalization NeurIPS 2023 [paper] [Agda, Lean]

Bauer, Andrej and Petković, Matej and Todorovski, Ljupco
Mathematical Capabilities of ChatGPT NeurIPS 2023 [paper] [NL]

Simon Frieder and Luca Pinchetti and Alexis Chevalier and Ryan-Rhys Griffiths and Tommaso Salvatori and Thomas Lukasiewicz and Philipp Christian Petersen and Julius Berner
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models EMNLP 2023 [paper] [Lean]

Xiong, Jing and Shen, Jianhao and Yuan, Ye and Wang, Haiming and Yin, Yichun and Liu, Zhengying and Li, Lin and Guo, Zhijiang and Cao, Qingxing and Huang, Yinya and Zheng, Chuanyang and Liang, Xiaodan and Zhang, Ming and Liu, Qun
Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math arXiv 2023 [paper] [pretraining]

Wang, Zengzhi and Xia, Rui and Liu, Pengfei
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics arXiv 2023 [paper] [pretraining, NL, Lean]

Azerbayev, Zhangir and Piotrowski, Bartosz and Schoelkopf, Hailey and Ayers, Edward W and Radev, Dragomir and Avigad, Jeremy
FIMO: A Challenge Formal Dataset for Automated Theorem Proving arXiv 2023 [paper] [NL, Lean]

Liu, Chengwu and Shen, Jianhao and Xin, Huajian and Liu, Zhengying and Yuan, Ye and Wang, Haiming and Ju, Wei and Zheng, Chuanyang and Yin, Yichun and Li, Lin and Zhang, Ming Zhang and Liu, Qun
Llemma: An Open Language Model for Mathematics ICLR 2024 [paper] [pretraining]

Azerbayev, Zhangir and Schoelkopf, Hailey and Paster, Keiran and Santos, Marco Dos and McAleer, Stephen and Jiang, Albert Q and Deng, Jia and Biderman, Stella and Welleck, Sean
Magnushammer: A Transformer-Based Approach to Premise Selection ICLR 2024 [paper] [Isabelle]

Mikuła, Maciej and Antoniak, Szymon and Tworkowski, Szymon and Jiang, Albert Qiaochu and Zhou, Jin Peng and Szegedy, Christian and Kuciński, Łukasz and Miłos, Piotr and Wu, Yuhuai
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text ICLR 2024 [paper] [pretraining]

Paster, Keiran and Santos, Marco Dos and Azerbayev, Zhangir and Ba, Jimmy
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models arXiv 2024 [paper] [pretraining]

Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Zhang, Mingchuan and Li, YK and Wu, Y and Guo, Daya
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning arXiv 2024 [paper] [pretraining]

Ying, Huaiyuan and Zhang, Shuo and Li, Linyang and Zhou, Zhejian and Shao, Yunfan and Fei, Zhaoye and Ma, Yichuan and Hong, Jiawei and Liu, Kuikun and Wang, Ziyi and Wang, Yudong and Wu, Zijian and Li, Shuaibin and Zhou, Fengzhe and Liu, Hongwei and Zhang, Songyang and Zhang, Wenwei and Yan, Hang and Qiu, Xipeng and Wang, Jiayu and Chen, Kai and Lin, Dahua
Selene: Pioneering Automated Proof in Software Verification ACL 2024 [paper] [Isabelle]

Zhang, Lichen and Lu, Shuai and Duan, Nan
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving arXiv 2024 [paper] [Isabelle]

Lin, Xiaohan and Cao, Qingxing and Huang, Yinya and Wang, Haiming and Lu, Jianqiao and Liu, Zhengying and Song, Linqi and Liang, Xiaodan
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition arXiv 2024 [paper] [Isabelle, Lean]

Tsoukalas, George and Lee, Jasper and Jennings, John and Xin, Jimmy and Ding, Michelle and Jennings, Michael and Thakur, Amitayush and Chaudhuri, Swarat
miniCodeProps: a Minimal Benchmark for Proving Code Properties arXiv 2024 [paper] [Lean]

Lohn, Evan and Welleck, Sean

Data Generation

Learning to Reason in Large Theories without Imitation arXiv 2019 [paper] [HOL Light]

Bansal, Kshitij and Szegedy, Christian and Rabe, Markus N and Loos, Sarah M and Toman, Viktor
Learning to Prove Theorems by Learning to Generate Theorems NeurIPS 2020 [paper] [Metamath]

Wang, Mingzhe and Deng, Jia
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving ICLR 2021 [paper] [inequality]

Wu, Yuhuai and Jiang, Albert Qiaochu and Ba, Jimmy and Grosse, Roger
Training a First-Order Theorem Prover from Synthetic Data ICLR 2021 MATH-AI Workshop [paper] [saturation]

Firoiu, Vlad and Aygün, Eser and Anand, Ankit and Ahmed, Zafarali and Glorot, Xavier and Orseau, Laurent and Zhang, Lei and Precup, Doina and Mourad, Shibl
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning ICML 2021 [paper] [reasoning primitives]

Wu, Yuhuai and Rabe, Markus N and Li, Wenda and Ba, Jimmy and Grosse, Roger B and Szegedy, Christian
Proving Theorems using Incremental Learning and Hindsight Experience Replay ICML 2022 [paper] [saturation]

Aygün, Eser and Anand, Ankit and Orseau, Laurent and Glorot, Xavier and Mcaleer, Stephen M and Firoiu, Vlad and Zhang, Lei M and Precup, Doina and Mourad, Shibl
Synthetic Proof Term Data Augmentation for Theorem Proving with Language Models AITP 2022 [paper] [Lean]

Palermo, Joseph and Ye, Johnny and Han, Jesse Michael
Learning to Prove Trigonometric Identities arXiv 2022 [paper] [trigonometric identity]

Liu, Zhou and Li, Yujun and Liu, Zhengying and Li, Lin and Li, Zhenguo
Formal Mathematics Statement Curriculum Learning ICLR 2023 [paper] [Lean]

Polu, Stanislas and Han, Jesse Michael and Zheng, Kunhao and Baksys, Mantas and Babuschkin, Igor and Sutskever, Ilya
Multilingual Mathematical Autoformalization arXiv 2023 [paper] [NL, Isabelle, Lean]

Jiang, Albert Q and Li, Wenda and Jamnik, Mateja
Solving Olympiad Geometry without Human Demonstrations Nature 2024 [paper] [geometry]

Trinh, Trieu H and Wu, Yuhuai and Le, Quoc V and He, He and Luong, Thang
LEGO-Prover: Neural Theorem Proving with Growing Libraries ICLR 2024 [paper] [Isabelle]

Wang, Haiming and Xin, Huajian and Zheng, Chuanyang and Li, Lin and Liu, Zhengying and Cao, Qingxing and Huang, Yinya and Xiong, Jing and Shi, Han and Xie, Enze and Yin, Jian and Li, Zhenguo and Liao, Heng and Liang, Xiaodan
REFACTOR: Learning to Extract Theorems from Proofs ICLR 2024 [paper] [Metamath]

Zhou, Jin Peng and Wu, Yuhuai and Li, Qiyang and Grosse, Roger
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data ICLR 2024 [paper] [NL, Lean]

Huang, Yinya and Lin, Xiaohan and Liu, Zhengying and Cao, Qingxing and Xin, Huajian and Wang, Haiming and Li, Zhenguo and Song, Linqi and Liang, Xiaodan
ATG: Benchmarking Automated Theorem Generation for Generative Language Models NAACL 2024 Findings [paper] [Metamath]

Lin, Xiaohan and Cao, Qingxing and Huang, Yinya and Yang, Zhicheng and Liu, Zhengying and Li, Zhenguo and Liang, Xiaodan
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving arXiv 2024 [paper] [Lean]

An, Chenyang and Chen, Zhibo and Ye, Qihao and First, Emily and Peng, Letian and Zhang, Jiayun and Wang, Zihan and Lerner, Sorin and Shang, Jingbo
Process-Driven Autoformalization in Lean 4 arXiv 2024 [paper] [NL, Lean]

Lu, Jianqiao and Liu, Zhengying and Wan, Yingjia and Huang, Yinya and Wang, Haiming and Yang, Zhicheng and Tang, Jing and Guo, Zhijiang
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data arXiv 2024 [paper] [NL, Lean]

Xin, Huajian and Guo, Daya and Shao, Zhihong and Ren, Zhizhou and Zhu, Qihao and Liu, Bo and Ruan, Chong and Li, Wenda and Liang, Xiaodan
Lean Workbook: A Large-Scale Lean Problem Set Formalized from Natural Language Math Problems arXiv 2024 [paper] [NL, Lean]

Ying, Huaiyuan and Wu, Zijian and Geng, Yihan and Wang, Jiayu and Lin, Dahua and Chen, Kai
Proving Olympiad Algebraic Inequalities without Human Demonstrations arXiv 2024 [paper] [Inequality]

Wei, Chenrui and Sun, Mengzhou and Wang, Wei

Related Surveys

QED at Large: A Survey of Engineering of Formally Verified Software Foundations and Trends® in Programming Languages 2019 [paper]

Ringer, Talia and Palmskog, Karl and Sergey, Ilya and Gligoric, Milos and Tatlock, Zachary
A Survey of Deep Learning for Mathematical Reasoning ACL 2023 [paper]

Lu, Pan and Qiu, Liang and Yu, Wenhao and Welleck, Sean and Chang, Kai-Wei
Mathematical Language Models: A Survey arXiv 2023 [paper]

Liu, Wentao and Hu, Hanglei and Zhou, Jie and Ding, Yuyang and Li, Junsong and Zeng, Jiayi and He, Mengliang and Chen, Qin and Jiang, Bo and Zhou, Aimin and He, Liang
Large Language Models for Mathematical Reasoning: Progresses and Challenges EACL 2024 [paper]

Ahn, Janice and Verma, Rishu and Lou, Renze and Liu, Di and Zhang, Rui and Yin, Wenpeng
AI for Math Resources Google Sheet [link]

Available to Everyone

Citation

If you find this repository useful, please consider citing our survey paper:

@inproceedings{li2024dl4tp,
   title={A Survey on Deep Learning for Theorem Proving}, 
   author={Li, Zhaoyu and Sun, Jialiang and Murphy, Logan and Su, Qidong and Li, Zenan and Zhang, Xian and Yang, Kaiyu and Si, Xujie},
   booktitle={First Conference on Language Modeling},
   year={2024},
   url={https://openreview.net/forum?id=zlw6AHwukB}
}

zhaoyu-li/DL4TP