Data analytics
This Jupyter Notebook was based on a dataset found at: https://archive.ics.uci.edu/ml/datasets/Housing
The data set concerns housing values in suburbs of Boston, and contains the following attributes.
- CRIM: per capita crime rate by town
- ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS: proportion of non-retail business acres per town
- CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX: nitric oxides concentration (parts per 10 million)
- RM: average number of rooms per dwelling
- AGE: proportion of owner-occupied units built prior to 1940
- DIS: weighted distances to five Boston employment centres
- RAD: index of accessibility to radial highways
- TAX: full-value property-tax rate per $10,000
- PTRATIO: pupil-teacher ratio by town
- B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT: % lower status of the population
- MEDV: Median value of owner-occupied homes in $1000's
The analysis looks at:
- The relationship between median housing values and the home age
- The correlation between crime rate and student/ teacher ratio at local schools
- The proportion of black citizens within a suburb and the distance from that suburb to employment centers
- The connection between home values, nitrus oxide contamination, and proximity to industrial centers
- The relationship between crime rate and proportion of non-retail businesses
This Jupyter Notebook was based on a dataset from the Seattle Police Department, outlining Incident Reports in the city of Seattle, as of January 28, 2017.
The dataset can be found here: https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Incident/7ais-f98f
- CAD CDW ID
- CAD Event Number
- General Offense Number
- Event Clearance Code
- Event Clearance Description
- Event Clearance SubGroup
- Event Clearance Group
- Event Clearance Date
- Hundred Block Location
- District/Sector
- Zone/Beat
- Census Tract
- Longitude
- Latitude
- Incident Location
- Initial Type Description
- Initial Type Subgroup
- Initial Type Group
- At Scene Time
The data was originally displayed in rows of information separated by commas. The first row contained the column names, also separated by commas. To access this information, I created a Pandas dataframe that specified the comma as the delimiter, and the first row as the header row. I set low_memory to False, because the data set included a mix of data types.
Because these column names consisted of multi-word strings, often with spaces between the words, I wound up renaming them as simple, shortened strings (ex: "DESC"). This made it easier to use these names to access information.