Scraping Immobiliare.it site and Hash Functions

The project consists in two parts: scaping a website and cluster its informations in different ways and use hash functions to find duplicates in a big set of passwords.

Scaping a web site

The scraping part of the project was based on this link in the Immobiliare.it site.

This data will be the corpus for our analysis.

Script descriptions

Homework_4.ipynb:

This script provides the code of our analisys. It is possible that some plots are not clearly viewed, so to visualize them correctly the notebook is also shown here.
data.csv:

This file contains all the informations of the annoucements taken from Immobiliare.it.
Files containing all the functions of the different parts of the homework:
1. first_part_functions.py:
  
  This file contains all the functions used during the part 1 of the HW.
2. scraping.py:
  
  This file contains all the functions used for the scaping part.
3. Hash_functions.py:
  
  This file contains all the functions used in the 2nd part of the HW.
KMeans.py and KMeans_map_reduce.py:

These files contain the Classes of the KMeans algorithm implemented with and without map-reduce.

ceschi/Scraping-Immobiliare.it

Scraping Immobiliare.it site and Hash Functions

Scaping a web site

Script descriptions