/final-project-rooibos

final-project-rooibos created by GitHub Classroom

Primary LanguageJupyter Notebook

final-project-rooibos

This is the final project of team rooibos. We tried to see how the occurrence of gender-specific crimes in China relates to socioeconomic factors. We scraped gender-specific crime data from a 'bot' webpage documenting such incidents and matched most of them to their happening cities. In the meantime, we gather socioeconomic data on these cities, and eventually conduct quantative analyses to arrive at conclusions.

Web-scraping

The libraries/packages used in this part include:

This part of code was entirely written on Jupyter Notebook; to run this part, open the "neat version" file and run each chunk. Thanks =)

Social data collection and match

After getting crime data by web-scraping, we collect social-economic data. The data resource is the EPS (Easy Professional Superior) data platform, which is a systematic information service platform and where there is access to many databases.

Web: https://www.epsnet.com.cn/index.html#/Home

The original data are about 21 variables, from two databases in the EPS platform. The last three come from Chinses Regional Economic Database, the others come from Chinese City Database. For each variable, we collect panel data from 2014-2018, the number of observations (cities) vary from different variables. The final version of socio-economic data is that, for each city, the value of a variable is the mean of the values of the variable from 2014-2018. Finally, we match the socio-economic data with crime data, that is only remaining observations which we both have their crime data and socio-economic data.

The libraries/packages used in this part include:

  • pandas version: 1.2.0
  • csv version: 1.0

Files:

  • socioeco_variables_final.csv:
  • total_data_impute.csv: The data after imputation
  • data_reg.csv: The data picked after checking correlation(multicolinearity among predictors)

Visualization

Besides packages covered above, additional packages & data sets in this part include:

Two interactive maps (NumberOnMap & WordCloudOnMap) are included in the repository. Learn more about our results by browsing the maps!

image