Handling-Missing-Records

Fixing inconsistent data records

This data set was originally obtained from kaggle https://www.kaggle.com/datasets/fedesoriano/covid19-effect-on-liver-cancer-prediction-dataset. The original objective is to assess the impact of the COVID-19 pandemic on patients with newly diagnosed liver cancer but on a quick glance at the data, it had lots of missing records and I decided to handle these errors, though some fields had lots of errors that it had to be remove but in a real world case, such inconsistencies and missing records would need to be addressed by the appropriate staekholders involved in order to understand why such a data set is being presented for analysis.

According to the source, the data was prospectively collected on all patients referred to the Newcastle-upon-Tyne NHS Foundation Trust (NUTH) hepatopancreatobiliary multidisciplinary team (HPB MDT) in the first 12 months of the pandemic (March 2020-February 2021), comparing to a retrospective observational cohort of consecutive patients presenting in the 12 months immediately preceding it (March 2019-February 2020). All new cases with a diagnosis of hepatocellular carcinoma (HCC) or intrahepatic cholangiocarcinoma (ICC) confirmed radiologically or histologically, following international guidelines, were included.

Details can be found https://www.kaggle.com/datasets/fedesoriano/covid19-effect-on-liver-cancer-prediction-dataset.