Isabelle github data
Closed this issue · 1 comments
wellecks commented
Add Isabelle to the process_github.py
script. Some considerations:
- github contains multiple copies of the AFP
- the PISA evaluation set is built from the AFP
wellecks commented
Working on this in https://github.com/EleutherAI/math-lm/tree/isabelle-github
For 1: we keep only mirror-afp-devel
and filter other copies of the AFP out
For 2: we remove a theorem and its proof when the theorem name occurs in the universal_test_theorems
set of PISA (https://github.com/albertqjiang/Portal-to-ISAbelle/blob/main/universal_test_theorems.tar.gz)
- specifically, given a document, we check if a theorem name is present using exact string match. We remove the theorem name and all subsequent tokens up to
\n\n
as a heuristic for where the proof ends. If\n\n
isn't present, we remove the remainder of the document.