BDM project

Project structure

The project is implemented in Java. Below we give an overview of the file tree with a brief description of their purpose:

pom.xml                             # maven project dependencies
import2mongo.sh                     # Script to import data to mongoDB
src/
└── main/
   └── java/
      ├── data/                     # folder with Helper classes to parse rows
      │  ├── Incident.java          # Parser for incident information data
      │  ├── IncomeInfo.java        # Parser for opendata bcn income information
      │  └── RentInformation.java   # Parser for idealista rent information
      ├── Exploitation.java         # KPI calculation
      ├── Formatted.java            # Data cleaning and consolidation
      ├── Main.java                 # Program entrypoint
      ├── Model.java                # ML model
      └── Streaming.java            # Kafka stream

Running the project

To run the project use any java IDE and with maven. The main class is Main, and it accepts as its first argument the name of the stage you want to run.

There are 4 stages corresponding to the different parts of the project:

  1. formatted
  2. exploitation
  3. model
  4. streaming

Each stage depends on the files generated by the previous stages, so you should run them in order shown above.

Additionally, a mongo server with the appropriate collections has to be running for the formatted stage in order to properly obtain the data. To that end, we have included a small bash script (./import2mongo.sh) to takes care of running the appropriate mongoimport commands. You may need to modify the environment variable DATA_DIR to point to the folder with the P2_data and our additional data source incidents. The source-data folder can be found on Google Drive, publicly available to upc.edu accounts.