The purpose of this repository is to provide a curated list of state-of-the-art works in the field of ML/DL evaluation metrics.
Generally, the metrics are using for measures of quantitative assessment which commonly used for comparing and tracking the performance of the production.
In the case of ML/DL development also, performance evaluation is an important step of the machine learning process. Evaluation metrics are used to measure the quality (include performance) of the machine learning(or deep learning) model in the performance evaluation process. Using evaluation metrics, various characteristics, and quality (and performance) factors of the ML/DL model can be quantified.
Most evaluation metrics are tied to machine learning tasks. The choice of evaluation metric completely depends on the type of ML/DL model and the implementation plan of the ML/DL model. There are different metrics for the tasks of classification, regression, ranking, clustering, topic modeling, etc. Some metrics, such as precision-recall, can be useful for multiple tasks.
Contributions and comments are always welcome. Please contact us at hollobit@etri.re.kr or send a pull request. You can have to add links through pull requests or create an issue which something I missed or need to start a discussion.
- 1. General
- 2. Classification
- 3. Prediction
- 4. Segmentation
- 5. Deep Generative Model
- 6. Detection
- 7. Regression Metrics
- 8. Ranking Metrics
- 9. Statistical Metrics
- 10. Computer Vision Metrics
- 11. NLP Metrics
- 12. Super resolution
- Appendix : Bias
-
[PDF] Proben1: A set of neural network benchmark problems and benchmarking rules L Prechelt - 1994 - Citeseer (Scholar) (Semantic) (Connected)
-
[BUCH] Combining pattern classifiers: methods and algorithms LI Kuncheva - 2014 - books.google.com (Scholar) (Semantic) (Connected)
-
20 Popular Machine Learning Metrics. Part 1: Classification & Regression Evaluation Metrics
-
20 Popular Machine Learning Metrics. Part 2: Ranking, & Statistical Metrics
(accuracy, precision, recall, F1-score, ROC, AUC, …)
-
ISO/IEC 4213:2022 Information technology — Artificial intelligence — Assessment of machine learning classification performance
-
24 Evaluation Metrics for Binary Classification (And When to Use Them)
-
An experimental comparison of performance measures for classification C Ferri, J Hernández-Orallo, R Modroiu Pattern Recognition Letters 30 (1), 27-38 (Scholar) (Semantic) (Connected)
-
[BUCH] Evaluating learning algorithms: a classification perspective N Japkowicz, M Shah - 2011 - books.google.com (Scholar) (Semantic) (Connected)
-
[HTML] A systematic analysis of performance measures for classification tasks M Sokolova, G Lapalme - Information processing & management, 2009 - Elsevier (Scholar) (Semantic) (Connected)
-
[PDF] A review on evaluation metrics for data classification evaluations M Hossin, MN Sulaiman - International Journal of Data Mining & …, 2015 - academia.edu (Scholar) (Semantic) (Connected)
-
Evaluation of performance measures for classifiers comparison V Labatut, H Cherifi - arXiv preprint arXiv:1112.4133, 2011 - arxiv.org - (Scholar) (Semantic) (Connected)
-
A survey of predictive modeling on imbalanced domains P Branco, L Torgo, RP Ribeiro - ACM Computing Surveys (CSUR), 2016 - dl.acm.org - (Scholar) (Semantic) (Connected)
-
Multi-label learning by exploiting label dependency ML Zhang, K Zhang - Proceedings of the 16th ACM SIGKDD …, 2010 - dl.acm.org - (Scholar) (Semantic) (Connected)
-
[PDF] Classifier chains for multi-label classification J Read, B Pfahringer, G Holmes, E Frank - Machine learning, 2011 - Springer (Scholar) (Semantic) (Connected)
-
A review on multi-label learning algorithms ML Zhang, ZH Zhou - IEEE transactions on knowledge and …, 2013 - ieeexplore.ieee.org (Scholar) (Semantic) (Connected)
- A comparison of MCC and CEN error measures in multi-class prediction
G Jurman, S Riccadonna, C Furlanello - PloS one, 2012 - journals.plos.org - (Scholar) (Semantic) (Connected)
-
ISO/IEC DIS 16466 Information Technology - 3D Printing and scanning - Assessment methods of 3D scanned data for 3D printing model
-
http://www.visceral.eu/resources/evaluatesegmentation-software/
-
[HTML] Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool AA Taha, A Hanbury - BMC medical imaging, 2015 - Springer - (Scholar) (Semantic) (Connected)
-
3차원 의료 영상 분할 평가 지표에 관한 고찰, 김장우, 김종효 - Review of Evaluation Metrics for 3D Medical Image Segmentation, 대한의학영상정보학회지 2017년 23권 1호 p.14 ~ 20
-
A review of recent evaluation methods for image segmentation YJ Zhang - Proceedings of the Sixth International Symposium …, 2001 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
-
[PDF] An overview of current evaluation methods used in medical image segmentation V Yeghiazaryan, I Voiculescu - Department of Computer Science …, 2015 - cs.ox.ac.uk - (Scholar) (Semantic) (Connected)
-
[HTML] Blood vessel segmentation algorithms—review of methods, datasets and evaluation metrics S Moccia, E De Momi, S El Hadji, LS Mattos - Computer methods and …, 2018 - Elsevier - (Scholar) (Semantic) (Connected)
-
Current methods in medical image segmentation DL Pham, C Xu, JL Prince - Annual review of biomedical …, 2000 - annualreviews.org - (Scholar) (Semantic) (Connected)
-
[HTML] Image segmentation evaluation: A survey of unsupervised methods H Zhang, JE Fritts, SA Goldman - computer vision and image understanding, 2008 - Elsevier - (Scholar) (Semantic) (Connected)
-
A benchmark for 3D mesh segmentation X Chen, A Golovinskiy, T Funkhouser - Acm transactions on graphics …, 2009 - dl.acm.org - (Scholar) (Semantic) (Connected)
-
A review on deep learning techniques applied to semantic segmentation A Garcia-Garcia, S Orts-Escolano, S Oprea… - arXiv preprint arXiv …, 2017 - arxiv.org - (Scholar) (Semantic) (Connected)
-
[HTML] Unsupervised image segmentation evaluation and refinement using a multi-scale approach B Johnson, Z Xie - ISPRS Journal of Photogrammetry and Remote …, 2011 - Elsevier - (Scholar) (Semantic) (Connected)
-
[HTML] A comparative evaluation of interactive segmentation algorithms K McGuinness, NE O'connor - Pattern Recognition, 2010 - Elsevier https://scholar.google.com/scholar?cites=12481616241604244476&as_sdt=2005&sciodt=0,5&hl=de (Semantic) (Connected)
-
Comparison and evaluation of methods for liver segmentation from CT datasets T Heimann, B Van Ginneken, MA Styner… - … on medical imaging, 2009 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
-
Medical image segmentation methods, algorithms, and applications A Norouzi, MSM Rahim, A Altameem, T Saba… - IETE Technical …, 2014 - Taylor & Francis - (Scholar) (Semantic) (Connected)
(Inception score, Frechet Inception distance)
-
Progressive growing of gans for improved quality, stability, and variation T Karras, T Aila, S Laine, J Lehtinen - arXiv preprint arXiv:1710.10196, 2017 - arxiv.org - (Scholar) (Semantic) (Connected)
-
Analyzing and improving the image quality of stylegan T Karras, S Laine, M Aittala, J Hellsten… - Proceedings of the …, 2020 - openaccess.thecvf.com - (Scholar) (Semantic) (Connected)
-
How good is my GAN? K Shmelkov, C Schmid… - Proceedings of the …, 2018 - openaccess.thecvf.com - (Scholar) (Semantic) (Connected)
-
[HTML] Pros and cons of gan evaluation measures A Borji - Computer Vision and Image Understanding, 2019 - Elsevier - (Scholar) (Semantic) (Connected)
-
A note on the inception score S Barratt, R Sharma - arXiv preprint arXiv:1801.01973, 2018 - arxiv.org - (Scholar) (Semantic) (Connected)
-
An empirical study on evaluation metrics of generative adversarial networks Q Xu, G Huang, Y Yuan, C Guo, Y Sun, F Wu… - arXiv preprint arXiv …, 2018 - arxiv.org - (Scholar) (Semantic) (Connected)
-
Metrics for deep generative models N Chen, A Klushyn, R Kurle, X Jiang… - International …, 2018 - proceedings.mlr.press - (Scholar) (Semantic) (Connected)
-
Assessing generative models via precision and recall MSM Sajjadi, O Bachem, M Lucic… - Advances in Neural …, 2018 - papers.nips.cc - (Scholar) (Semantic) (Connected)
-
Improved precision and recall metric for assessing generative models T Kynkäänniemi, T Karras, S Laine… - Advances in Neural …, 2019 - papers.nips.cc - (Scholar) (Semantic) (Connected)
-
What makes for effective detection proposals? J Hosang, R Benenson, P Dollár… - IEEE transactions on …, 2015 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
-
A survey on performance metrics for object-detection algorithms R Padilla, SL Netto, EAB da Silva - … International Conference on …, 2020 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
(MSE, MAE)
(MRR, Precision@ K, DCG & NDCG, MAP, Kendall’s tau, Spearman’s rho)
- A short introduction to learning to rank H Li - IEICE TRANSACTIONS on Information and Systems, 2011 - search.ieice.org - (Scholar) (Semantic) (Connected)
(Correlation)
(PSNR, SSIM, IoU)
-
A Quick Overview of Methods to Measure the Similarity Between Images
-
Image quality assessment: from error visibility to structural similarity Z Wang, AC Bovik, HR Sheikh… - IEEE transactions on …, 2004 - ieeexplore.ieee.org, Cited by 29229 Related articles All 42 versions - (Scholar) (Semantic) (Connected)
-
Image quality metrics: PSNR vs. SSIM A Hore, D Ziou - 2010 20th international conference on pattern …, 2010 - ieeexplore.ieee.org, Cited by 1438 Related articles All 12 versions -(Scholar) (Semantic) (Connected)
-
Seven challenges in image quality assessment: past, present, and future research DM Chandler - International Scholarly Research Notices, 2013 - hindawi.com, Cited by 398 Related articles All 8 versions -(Scholar) (Semantic) (Connected)
-
Full reference image quality assessment based on saliency map analysis Y Tong, H Konik, F Cheikh… - Journal of Imaging …, 2010 - ingentaconnect.com, Cited by 69 Related articles All 20 versions -(Scholar) (Semantic) (Connected)
-
[PDF] Metrics performance comparison for color image database N Ponomarenko, F Battisti, K Egiazarian… - … workshop on video …, 2009 - comlab.uniroma3.it, Cited by 144 Related articles All 11 versions -(Scholar) (Semantic) (Connected)
-
[PDF] IEM: a new image enhancement metric for contrast and sharpness measurements VL Jaya, R Gopikakumari - International Journal of Computer Applications, 2013 - Citeseer, Cited by 81 Related articles All 2 versions -(Scholar) (Semantic) (Connected)
-
Predicting deeper into the future of semantic segmentation P Luc, N Neverova, C Couprie… - Proceedings of the …, 2017 - openaccess.thecvf.com, Cited by 164 Related articles All 15 versions -(Scholar) (Semantic) (Connected)
-
A survey on deep learning techniques for image and video semantic segmentation A Garcia-Garcia, S Orts-Escolano, S Oprea… - Applied Soft …, 2018 - Elsevier, Cited by 202 Related articles All 4 versions -(Scholar) (Semantic) (Connected)
-
Fsrnet: End-to-end learning face super-resolution with facial priors Y Chen, Y Tai, X Liu, C Shen… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com, Cited by 184 Related articles All 13 versions -(Scholar) (Semantic) (Connected)
(Perplexity, BLEU score)
- ISO/IEC AWI 23282 Artificial Intelligence - Evaluation methods for accurate natural language processing systems
-
Deep learning for image super-resolution: A survey Z Wang, J Chen, SCH Hoi - IEEE Transactions on Pattern …, 2020 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
-
Single-image super-resolution: A benchmark CY Yang, C Ma, MH Yang - European Conference on Computer Vision, 2014 - Springer - (Scholar) (Semantic) (Connected)
-
Image super-resolution using deep convolutional networks C Dong, CC Loy, K He, X Tang - IEEE transactions on pattern …, 2015 - ieeexplore.ieee.org - (Scholar) (Semantic) (Connected)
- [PDF] On over-fitting in model selection and subsequent selection bias in performance evaluation GC Cawley, NLC Talbot - The Journal of Machine Learning Research, 2010 - jmlr.org - (Scholar) (Semantic) (Connected)