In large-scale networks in IT or telco, in order to slow down the degradation process of the live system and reduce its impact on the quality of end-user experience, preventive maintenance (PM) with minimal repair at failures is required. Network nodes have stochastic behavior for failures with relation to alarm and health-check status shown before failure happens. The more major or critical alarm generated, the probability of failure increases. To predict failure and to reduce financial and non-financial loss, it is necessary to have a proper approach and proper model to address prioritization of failure for preventive maintenance.
SadeghKrmi/codedive2022
How to model stochastic behavior of failures in telco or IT systems using machine learning
PythonApache-2.0