/ucompanies

Primary LanguagePythonMIT LicenseMIT

ucompanies

  • downloading PDF-forms
  • extracting texts (including check marks) from the PDF-forms using pdfplumber
  • structuring the texts into CSV- and DTA-files
ucompanies/
├── README.md
├── code
│   ├── download_pdfs.py
│   └── pdfs2txts2csv.py
└── data
    ├── Beneficiaries2022_FDZ.dta
    ├── extracted_structured_data
    │   ├── 328_applforms.csv
    │   └── 328_applforms.dta
    └── raw_pdfs