/tex-submission-purify

Remove comments and unused images to prepare an arXiv submission.

Primary LanguagePython

Tex Submission Purifier

Prepares a Latex article for submission to arxiv.org, inspired by https://github.com/google-research/arxiv-latex-cleaner. Removes comments from .tex files and copies only the referenced resources.

Features:

  • recursive descent into files imported with \input
  • remove inline comments starting with % as well as \comment blocks
  • remove other specified commands
  • short-circuit commands (for example Hello \kl{world}! -> Hello world!)
  • copy only the files referenced by \input and \includegraphics
  • notification about unused files

Right now the images are not altered.

Dependencies

We use TexSoup for tex parsing and click for console arguments.

pip install TexSoup click

CLI

The program can be invoked from the command line in the following way:

python3 tex_submission_purify.py src_root_document.tex output_directory

Configuration options:

--remove-comments-completely
By default we leave the % in place of a comment, to prevent empty lines from confusing Latex. This option will remove the while comment including the %.

--remove-cmd cmdA,cmdB,cmdC
The invocations of cmdA, cmdB, cmdC will be all removed. (for example \cmdA{something}). Separate commands with , but not with spaces.

--short-circuit-cmd cmdA,cmdB
The invocations of cmdA, cmdB are replaced with their contents. For example This \cmdA{word} is special is transformed to This word is special. Separate commands with , but not with spaces.

--keep-file some_file.txt
The file will be copied to the output directory even if it is not referenced. Multiple files can be specified by repeating this option.

--out-root-doc-name out_name
By default the output root document is renamed to ms.tex. You can specify a different name here.

--clear-out-dir
The output directory is deleted and re-created before running.

Example command for the test file:

python3 tex_submission_purify.py \
	test_src/comment_parsing_test.tex \
	/tmp/tex_submission_test_out \
	--remove-cmd KL \
	--short-circuit-cmd kl \
	--out-root-doc-name out_doc \
	--clear-out-dir \
	--keep-empty-comments

Python interface

from tex_submission_purify import TexSubmissionCleaner

c = TexSubmissionCleaner(
	'test_src/comment_parsing_test.tex',
	'/tmp/tex_submission_test_out',
)

c.clear_out_dir()

c.keep_empty_comments = True

c.commands_to_remove('KL')
c.commands_to_short_circuit('kl')

c.additional_files_to_keep(
	'ieee.bst',
)
c.run()

c.notify_about_unused_files()

c.print_statistics()