Homogeneous Ensemble learning, implementing Hoeffding Trees (Random Forest) under Concept Drift Conditions
Notes:
/** How many Hoeffding Trees we have in our Ensemble **/
int number_of_hoeffding_trees = Integer.parseInt(params.get("number_of_HT"));
/** Combination function can take 3 values.
* 1 => Majority Voting
* 2 => Weighted Voting with threshold-θ => ε (0,1)
* 3 => Weighted Voting with top-k => ε (1,number_of_hoeffding_trees).
**/
int combination_function = Integer.parseInt(params.get("combination_function"));
/* Given combination_function = 2, weighted voting option can take 2 values. 1 => threshold-θ, 2 => top-k */
int weighted_voting_option = Integer.parseInt(params.get("weighted_voting_option"));
/** Age_of_maturity indicates after how many instances seen, we will start accepting testing instances.
* Aka how long our model is in a stable phase.
**/
int age_of_maturity = Integer.parseInt(params.get("age_of_maturity"));
/* Drift Detection Method used
* If drift_detection_method_id = 0 then No Concept Drift
* If drift_detection_method_id = 1 then DDM
* If drift_detection_method_id = 2 then EDDM
*/
int drift_detection_method_id = Integer.parseInt(params.get("drift_detection_method_id"));
@info: Reading from Kafka Source the input stream. For more information: https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/kafka.html
@param input_stream:
The input stream is structured as follows (field_1,...,field_n,true_label,purpose_id, instance_id). The number of fields in a particular has to be the same across the input data and it is irrelevant to the Population Function. But, the number of times each incoming instance will be populated is user-defined and therefore the program has to work with any value.
Print the execution Plan: https://flink.apache.org/visualizer/