Code for parsing and analytics of Dataquick assessor and sales data
This repository documents data parsing and analysis performed in 2014-15. None of the database servers referenced are live anymore.
Additional documentation
• Indicators
• Data processing procedures
=========
Contacts
• Miriam Zuk, project lead – mzuk@berkeley.edu
• Sam Maurer, code – maurer@berkeley.edu
Documentation from the data vendor
• "Layout" spreadsheets with database schema information
• "Totals Report" spreadsheets with record counts by county
• "Catalog/Data Dictionary" spreadsheet with more detailed field definitions
=========
Assessor
• one record per legal property
• most recent snapshot, from 2013 or 2014 depending on the county
• provided as one large fixed-width text file (dated 5-28-2014)
• identical data provided as sequence of smaller tab-delimited files (dated 7-11-2014)
• do not use intermediate tab-delimited version from June, which had formatting errors
Assessor history
• yearly snapshots of records from each county (frequency varies beginning in 2012)
• data covers 2004–2013 for all counties, plus 2014 for some counties
• provided as sequence of large fixed-width text files (dated 7-20 or 7-21-2014)
• do not use earlier May/June version of the files, which didn't match the schema
Sales
• one record per transaction
• covers 1988 to mid-2014, plus earlier transactions for certain counties
• provided as one large fixed-width text file (dated 5-28-2014)
• equivalent data provided as sequence of smaller tab-delimited files (dated 7-9-2014), including some additional transaction types
• do not use intermediate tab-delimited version from June, which had formatting errors
Foreclosure
• provided as one large fixed-width text file (dated 5-28-2014)
• same data also provided as a tab-delimited file (dated 6-16-2014)