PDF Liberation
A commons for the work of liberating data from PDF files
Washington, DC; San Francisco; New York; Chicago
Pinned Repositories
amnestydata
Amnesty International Torture data
financial_disclosure_scraping
(DC team) experimenting with available options for extracting info from PFDs
Jersey-City-Budget-PDF-Liberation
This project will liberate data from pdf files found on http://www.cityofjerseycity.com/pub-info.aspx?id=2430 and will create .csv and .json files to be uploaded on https://data.openjerseycity.org/dataset/jersey-city-2013-budget-adopted-spending
knowledge
A place to collect and share knowledge about liberating data from PDFs
OCRToolkit
Tools for working with Optical Character Recognition output
pdf-hackathon
Resources related to PDF Liberation hackathon
pdf_table_extraction
experimenting with pdf2text and python pdf-table-extract
pdfHarvester
python-hocrgeo
Python tool for converting hOCR files to geographic file formats
whatwordwhere
Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
PDF Liberation's Repositories
pdfliberation/knowledge
A place to collect and share knowledge about liberating data from PDFs
pdfliberation/whatwordwhere
Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
pdfliberation/pdf-hackathon
Resources related to PDF Liberation hackathon
pdfliberation/pdf_table_extraction
experimenting with pdf2text and python pdf-table-extract
pdfliberation/Jersey-City-Budget-PDF-Liberation
This project will liberate data from pdf files found on http://www.cityofjerseycity.com/pub-info.aspx?id=2430 and will create .csv and .json files to be uploaded on https://data.openjerseycity.org/dataset/jersey-city-2013-budget-adopted-spending
pdfliberation/amnestydata
Amnesty International Torture data
pdfliberation/financial_disclosure_scraping
(DC team) experimenting with available options for extracting info from PFDs
pdfliberation/OCRToolkit
Tools for working with Optical Character Recognition output
pdfliberation/python-hocrgeo
Python tool for converting hOCR files to geographic file formats
pdfliberation/pdfHarvester
pdfliberation/pdfliberation.github.io
Homepage for this organization
pdfliberation/assembly
A forum of sorts. Where we gather to discuss Issues.
pdfliberation/NYCEDCprosedatascraper
This uses regular expressions (in php, but can be any language) get data from the NYC EDC newsletters
pdfliberation/pdf-hacks-2014
PDF liberation Hackaton - http://pdfliberation.wordpress.com/
pdfliberation/python-popplergeo
package to convert pdftotext bbox xhtml output to geojson
pdfliberation/USAID-DEC
Data from the United States Agency for International Development (USAID) Development Experience Clearinghouse (DEC).
pdfliberation/crime-stats-utah
Crime Statistics for the State of Utah
pdfliberation/housedisc
pdfliberation/NYC_Economic_Unemployment
pdfliberation/pdf-liberation-examples
displaying various pdf liberation tools, at PDF Liberation Hackathon