This project is a part of the Data Science Working Group at Code for San Francisco. Other DSWG projects can be found at the main GitHub repo.
The purpose of this project is to analyze the impact of zoning laws on the development of housing in San Francisco. We use datasets provided by the city of San Francisco to analyze the initiation, progress, and completion of entitlements for the construction of housing. We want to empower policymakers and citizens with the facts and data that help explain the potential impact of housing policy decisions.
See our Doc for more info and specific objectives for the project.
- Inferential Statistics
- Data Visualization
- Predictive Modeling
- Python
- Pandas, jupyter
We focus on utilizing our distinct skillset of data science and the SF Planning Department’s pipeline data to a) develop quantitative facts about the state of the pipeline of planned housing in SF, and b) use predictive methods to model potential impact of policies. The potential impact could take multiple forms, such as in units built, free market price, and affordable unit availability.
- Accessible San Francisco -- We are invested in providing facts that allow other parties, some political, to make data-informed decisions that enable San Francisco to become more inclusive and therefore healthier.
- Science is A-political -- While we do believe in an accessible San Francisco, we are not a policy advocate. Our role is to provide the most rigorous data analysis to define facts, and use sound statistical methods and scientific investigation to predict how those facts may shift given a policy change.
- Correlation is not causation -- Hand-in-hand with being a-political is the responsibility to clearly articulate the findings and limitations of analysis. Most analysis and prediction will only be able to leverage correlative relationships, and will likely be unable to demonstrate causative relationships. Natural experiments that would give us this kind of increased conviction in causative factors are rare.
- Equal Access to Facts -- Our work will be shared through blog format to be accessible to all stakeholders at the same time.
- Open to Everyone -- We are an all-volunteer organization, and our team welcomes all people of all skill bases and backgrounds to join our team. There is always a way for you to contribute!
The best place to get started is our list of Issues in Github. We have workstreams spread across these skillsets:
Data modeling and analysis towards a better understanding of housing pipeline changes over time.
Creating data cleanup tools and methods for parsing or scraping new datasets.
Model market incentives and assumptions, enabling prediction of impact based on changes.
Manage the volunteer network, communicate with stakeholders, and network with new people to learn new needs.
Help create visualizations of facts and predictions with Data Scientists
Help us understand the policy strategies being discussed and work with the PMs on whether we should investigate them.
Help us publicize our work and build a stronger community overall.
- Raw Data is being kept here within this repo.
- Data processing/transformation scripts are being kept here
- The Jupyter notebook contains prior analyses
- Navigate to a folder where you want the project folder to be located
- Clone the repo with the following command
git clone git@github.com:sfbrigade/datasci-housing-pipeline.git
for help see this tutorial).
- navigate into your newly created project folder
cd datasci-housing-pipeline
We use Pipenv for environment management, follow the installation guides below if you don't have it.
Install all project and development dependencies:
pipenv install --dev
pipenv run python -m ipykernel install --user --name=`pipenv run basename '$VIRTUAL_ENV'`
Launch Jupyter and select datasci-housing-pipeline kernel
in Jupyter.
jupyter notebook
Check your currently installed version of Python 3.
python3 --version
If you don't have Python version 3.7, the Pipfile will not complete installation. Install Python 3 using Homebrew:
brew install python3
Or, upgrade Python 3 from an earlier dot version (like 3.6) using Homebrew:
brew upgrade python3
Run this in your terminal:
brew install pipenv
NOTE: dependencies will only be available within the pipenv
virtualenv. Enter the virtualenv with pipenv shell
, or run a single command with pipenv run my-cool-command
.
Run this in your terminal:
brew install postgresql
Postgres should start automatically. If you run into trouble, refer to this guide.
My personal recommendation is Cmder
Install chocolatey
Check your currently installed version of Python.
python --version
If you don't have Python version 3.7, install or upgrade to Python 3 using Chocolatey:
choco install python
Python3 should install pip automatically, but check for updates with the following command:
python -m pip install -U pip
Now install pipenv with a User installation:
pip install --user pipenv
NOTE: If pipenv isn't available in your console after installing and running refreshenv
you will need to add the user base's binary directory to your PATH. This is relatively simple, read the Yellow Box on this tutorial page
NOTE 2: dependencies will only be available within the pipenv
virtualenv. Enter the virtualenv with pipenv shell
, or run a single command with pipenv run my-cool-command
.
Postgres requires a password parameter, so run the following command, with your own password to be assigned to the postgres user:
choco install postgresql10 --params '/Password:YOURPASSWORDHERE' --params-global
Postgres should start automatically. If you run into trouble, refer to the Postgres website.
- Rocio Ng - @rocio
- Shantanu Bala - @Shantanu Bala
- Anders Engnell - @Anders Engnell
- Andrew Roberts - @Andrew Roberts
Name | Slack Handle |
---|---|
Andrew Roberts | @Andrew Roberts |
- If you haven't joined the SF Brigade Slack, you can do that here.
- Our slack channel is
#datasci-projectname
- Feel free to contact team leads with any questions or if you are interested in contributing!