Ensemble learning related books, papers, videos, and toolboxes
PythonMIT
Awesome Ensemble Learning
Ensemble Learning
(also known as Ensembling) is an exciting yet challenging field.
Ensembling leverages multiple base models to achieve better predictive
performance, which is often better than any of the constituent models alone [19].
It has been proven critical in many practical applications and data science
competitions [4], e.g., Kaggle.
To promote the learning of ensembling, we create this repository with:
Books & Academic Papers
Online Courses and Videos
Open-source and Commercial Libraries/Toolboxes and Datasets
Key Conferences & Journals
More items will be added to the repository.
Please feel free to suggest other key resources by opening an issue report,
submitting a pull request, or dropping me an email @ (zhaoy@cmu.edu).
Enjoy reading!
Ensemble Machine Learning: Methods and Applications
edited by Oleg Okun [28]: Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques,
including various contributions from researchers in leading industrial research labs.
Applications of Supervised and Unsupervised Ensemble Methods
edited by Oleg Okun [17]: This book contains the extended papers presented at the 2nd Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA),
in conjunction with ECAI’2008.
Data Mining and Knowledge Discovery Handbook Chapter 45 (Ensemble Methods for Classifiers):
by Lior Rokach [22]: This chapter provides an overview of ensemble methods in classification tasks. We present all important types of ensemble method including boosting and bagging.
Combining methods and modeling issues such as ensemble diversity and ensemble size are discussed.
[Python] combo: combo is a comprehensive Python toolbox for combining machine learning (ML) models and scores for various tasks, including classification, clustering, and anomaly detection. It supports the combination of ML models from core libraries such as scikit-learn and xgboost (documentation).
[Python] pycobra: python library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
[Python] DESlib: A Python library for dynamic classifier and ensemble selection.
[Python] imbalanced-learn: A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning (documentation).
3.2. Datasets
As a subfield of machine learning, ensemble learning is usually tested against
general machine learning benchmark datasets. Some helpful links can be found below:
Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham.
Chen, T. and Guestrin, C., 2016, August. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794). ACM.
Chen, J., Sathe, S., Aggarwal, C. and Turaga, D., 2017, June. Outlier detection with autoencoder ensembles. SIAM International Conference on Data Mining, pp. 90-98. Society for Industrial and Applied Mathematics.
Dietterich, T.G., 2000, June. Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer, Berlin, Heidelberg.
Freund, Y. and Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), pp.119-139.
Gao, J., Fan, W. and Han, J., 2010. On the power of ensemble: Supervised and unsupervised methods reconciled. In Tutorial on SIAM Data Mining Conference (SDM), Columbus, OH.
Gomes, H.M., Barddal, J.P., Enembreck, F. and Bifet, A., 2017. A survey on ensemble learning for data stream classification. ACM Computing Surveys (CSUR), 50(2), p.23.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146-3154).
Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J. and Woźniak, M., 2017. Ensemble learning for data stream analysis: A survey. Information Fusion, 37, pp.132-156.
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J. and Moore, J.H., 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining, 10(1), p.36.
[19]
(1, 2) Opitz, D. and Maclin, R., 1999. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, 11, pp.169-198.
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V. and Gulin, A., 2018. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (pp. 6638-6648).
Vega-Pons, S. and Ruiz-Shulcloper, J., 2011. A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03), pp.337-372.
Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM), pp. 585-593. Society for Industrial and Applied Mathematics.
Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22.