sfbrigade/datasci-housing-pipeline

Create Data Model for Analysis

qwwqwwq opened this issue · 3 comments

We need to create a data model that will support our analyses. The current form will not support meaningful analysis very well.

It would be nice to collapse each project into a single row, which would contains the date of each transition in the pipeline.

Have you been able to speak with Jason Lally yet? I know he is trying to work with the housing department to streamline their data, and it might be good for you to get an idea of that direction so what you come up with is as similar as possible.

@qwwqwwq Jeff if you would be able to elaborate on this, I'd love to make sure we don't lose track of this issue. Can you help me understand what dimensions of analysis we haven't been able to support with the existing data model?

Hey Clare,

This was referring to doing ETL and feature engineering so that the dataset will be amenable to summary statistics, visualization, machine learning. This involves recoding variables, etc.

We got pretty far with this when I was working on the project if I recall. We never actually tried running any sort of machine learning algorithm on the dataset though, we did do some light exploration in R and jupyter which is checked into the repository.

Let me know if you have any other questions,

Jeff