The project is implemented in Java. Below we give an overview of the file tree with a brief description of their purpose:
pom.xml # maven project dependencies
import2mongo.sh # Script to import data to mongoDB
src/
└── main/
└── java/
├── data/ # folder with Helper classes to parse rows
│ ├── Incident.java # Parser for incident information data
│ ├── IncomeInfo.java # Parser for opendata bcn income information
│ └── RentInformation.java # Parser for idealista rent information
├── Exploitation.java # KPI calculation
├── Formatted.java # Data cleaning and consolidation
├── Main.java # Program entrypoint
├── Model.java # ML model
└── Streaming.java # Kafka stream
To run the project use any java IDE and with maven. The main class is Main, and it accepts as its first argument the name of the stage you want to run.
There are 4 stages corresponding to the different parts of the project:
- formatted
- exploitation
- model
- streaming
Each stage depends on the files generated by the previous stages, so you should run them in order shown above.
Additionally, a mongo
server with the appropriate collections
has to be running for the formatted stage in order to properly
obtain the data. To that end, we have included a small bash
script (./import2mongo.sh
) to takes care of running the appropriate
mongoimport
commands. You may need to modify the environment variable
DATA_DIR
to point to the folder with the P2_data
and our additional
data source incidents
. The source-data folder can be found on Google Drive,
publicly available to upc.edu
accounts.