Verbaendeliste-Bundestag Extractor

Use pdftohtml to get an XML file from the pdf.

pdftohtml -xml input.pdf output.xml

Then use the extractor with first and last relevant page number to convert to parsed JSON:

python extract_lobby.py 4 690 < lobbylist.xml > lobbylist.json

License: MIT-License

stefanw/verbaendeliste-bundestag