This repository contains the main components of the prototype implementation of Beta Writer, the algorithmic author of the first machine-generated research book published by Springer Nature, developed by Niko Schenk, Samuel Rönnqvist and other members of the Applied Computational Linguistics Lab.
brew install python3
pip3 install numpy
pip3 install sklearn
pip3 install scipy
pip3 install matplotlib
pip3 install gensim
Install Mate tools
and place libraries and models into the /mate
directory.
See mate/README.txt for more details.
Download StanfordCoreNLP and citeproc-java
Ideally open beta_writer
as Netbeans project, link downloaded .jar files to project, and build beta_writer.jar
.
The executable .jar should appear in beta_writer/dist/
.
The script pipeline.sh
contains all modules for end-to-end book generation.
Please point PYTHON to your local python installation (change line 32 in pipeline.sh
)
sh pipeline.sh CORPUS_DIR gen/
where CORPUS_DIR
= path to your A++ files and
gen/
= directory containing all generated files
Inspect generated book.html
in gen/
folder.
Note that Beta Writer has originally been tailored to consume and process Springer custom-specific document type formats (A++) and does not (yet) support generic PDF.
We currently provide the scripts for the major text processing tasks including:
- Preprocessing (e.g., entity masking of chemical compounds with
mask_entities.py
) - Book structure generation (
mkstructure_html.py
) and visualization (plot.py
) - Syntactic restructuring/paraphrasing (
restructuring.py
) - Synonym generation (
synonyms.py
)
The current release makes use of textrank for extractive summarization.
For more implementational details, please refer to our system pipeline description in Section 2.3..
This project is open source software and released under the MIT license.