cfpb/HMDA_Data_Science_Kit

Create Analysis Notebook 3

Opened this issue · 1 comments

Create example code and instructions to segment single-family products with the below filters:

  • single-family
  • first-lien
  • owner-occupied
  • conventional
  • home purchase

This code should be provided as a function that accepts:

  • extensions to the WHERE clause (for example: geography, action type, lender)
  • table name
  • database name
  • schema name
  • host name

These filters should accept single inputs, or list-like inputs.

This function should have an option that allows the user to write the query results to a pipe-delimited file with a .txt extension.

In the instructions inside the Jupyter notebook, discuss:

  • what these filters mean and how they affect the mortgage product
  • why a homogenous product is important to analysis
  • the presence of action type in the HMDA data and how that affects analysis

Produce the following outputs:

  • flat file with a pipe-delimiter and .TXT extension
  • Pandas dataframe (shown inline)
  • SQL script (located in the SQL folder)
  • analysis of a subset of HMDA data showing comparisons of product types in two different states over time. The comparison should use 2004-2017 data that was written to a file and reloaded. This analysis should account for action taken type and use Pandas to generate an aggregate measure of the data.
  • one or more example of visualizations of the data. For example originated loan amount averages for several MSAs from 2004-2017.

The goal of this example is to demonstrate how to get a dataset of a homogenous mortgage product, save the dataset to disk, load the data to Pandas, produce aggregate metrics, and graph them in a meaningful way.

This can probably be combined with the issue for creating a function library.