/Data-Wrangling

Wrangling a dataset that contains transactional retail data from an online electronics store (DigiCO) in Melbourne, Australia.

Primary LanguageJupyter Notebook

Data-Wrangling

Data cleansing is an iterative process. This repository comprises 3 different data frames that came from the same source, but each one of them has anomalies that must be fixed. This dataset contains transactional retail data from an online electronics store (DigiCO) located in Melbourne, Australia. The store operation is exclusively online, and it has three warehouses around Melbourne from which goods are delivered to customers. The data anomalies can be classified at a high level into three categories and must be found a corrected in each data frame:

Syntactic Anomalies: Describe characteristics concerning the format and values used for representation of the entities.

  • Lexical errors
  • Domain format errors
  • Syntactical error
  • Irregularities

Semantic Anomalies: Hinder the data collection from being a comprehensive and non-redundant representation of the mini-world.

  • Integrity constraint violations
  • Contradictions
  • Duplicates
  • Invalid tuples

Coverage Anomalies: Decrease the amount of entities and entity properties from the mini-world that are represented in the data collection.

  • Missing values
  • Missing tuples