/RCA-BSc-thesis-project-

Primary LanguageJupyter NotebookMIT LicenseMIT

RCA-BSc-thesis-project-

In today’s computer systems, logs which are generated by these systems as important records, offering insights into the system's functionality and problem explanation. By increasing computer systems usage, logs are generated more and more. Logs containing errors are particularly significant, as they offer opportunities for system developers and maintainers to pinpoint and address issues to solve problems. Automated solutions in this realm of technology, falling under the umbrella of Artificial Intelligence for IT Operations (AIOps), prove valuable in tackling challenges posed by latent problems or those not generated by the individual system sections that caused the error. Root cause analysis (RCA), a subdomain of AIOps, is dedicated to analyzing data and identifying core issues. This project introduces a machine learning-driven solution, fine-tuning the BERT language model, for root cause detection tasks in logs collected from the Hadoop framework. Despite encountering challenges, the developed model showcases promising results due to the volume of the log's content and our dataset's imbalance distribution of samples through classes. Our result is achieving an accuracy of 70% on the test set and 76% on the validation set.