/cleaning-joining-and-debugging-files-with-python

This project is a script in python to clean many files, with different atributtes n order to homogenize the information.

Primary LanguageJupyter Notebook

cleaning-joining-and-debugging-files-with-python

This project is a script in python to clean many files, with different atributtes in order to homogenize the information. For the python script it has been used a Python 3 Notebook in Google Colab Research.

Data Set

The files to clean are in the data folder in this repository. For this we work with data from the INE (www.ine.es) of free access, specifically, of the INEBASE section, which should be adequately treated for structuring in database.

This data set contain a file txt with information about the "Censo Agrario de España", and many Excel files with information about the "Padron Municipal" and "Padron Continuo".

Clean and debugging files

The first step is join all the excel files in one in order to facilite the future jobs, and delete inncesary data for example the headers.

Then the clean txt file is necesary only obtain the data about a province specifies, in this case the province with the code "05". For this depuration we writed a script un python, which do the clean an debbuging.

The final step is join all files in one file with clean data.

Script python

For this script we are used openpyxl python package to work with excel files.

!pip install openpyxl