/BigDataProject

BigData project

Primary LanguagePython

Authors: Anshal Savla, Drumil Bakhai, Tannavee Watts Project: BigData Part1 Data: 311 complaints -https://data.cityofnewyork.us/Social-Services/311/wpe2-h2i5

Subdirectories: -CleaningScripts This directory contains 52 scripts to find dataquality issues in each of the 52 columns in the dataset and generates a txt file for each column indicating the [basetype, semantictype and label]. The output looks like:

    ===========================================================================
    |          BaseType         |     Semantic Type        |      Label       |
    |---------------------------|--------------------------|------------------|
    |int,float, string, datetime|Key, Address, Phonenumber,|Valid, Invalid,   |
    |                           |Zipcode, Location,DateTime|N/A               |
    ===========================================================================

Details on running the cleaning script are present in the README.md file in the CleaningScripts directory.

-AnalysisScripts This directory contains scripts used to extract information from the large dataset and helps summarize the data.