Ken Cheligeer

Data Scientist, Machine Learning Engineer, Researcher

Technical Skills: Python, Snowflake, SQL, Machine Learning, Deep Learning, Natural Language Processing

Education

Ph.D., System Engineering | Concordia University (Aug 2022)
M.S., Information Technology | Monash University (December 2015)
M.S., Business Information Systems | Monash University (December 2015)
B.S., Computer Science | LanZhou University (July 2012)

Work Experience

Data Scientist @ Alberta Health Services (Sept 2022 - Present)

Led and executed multiple initiatives, including Deep Neural Networks and Large Language Models (LLM), to understand and analyze Electronic Medical Records (EMRs) for various downstream tasks, such as disease identification, adverse event detection, clinical note summarizing, and automated medical code generation.
Provided consultancy on applying machine learning, deep learning, and NLP strategies, pivotal in integrating state-of-the-art AI technologies into healthcare analytics practices, enhancing system-wide operational efficiency and data-driven innovations.
Spearheaded the development and implementation of machine learning and deep learning algorithms, specializing in the extraction and interpretation of complex information from electronic medical record (EMR) text data, with a focus on improving patient care and treatment outcomes.
Conducted comprehensive analysis of large-scale hospital administrative and cancer registry datasets, employing advanced machine learning techniques to uncover patterns, trends, and insights that inform clinical decision-making and healthcare policy development.

Data Scientist @ Center for Health Informatics, University of Calgary (Sept 2021 - Present)

Conducted data collection, processing, and analysis for novel studies evaluating the validities of EMR note data and LLMs on identifying Hospital Adverse Events.
Develop and implement machine learning and deep learning algorithms to extract information from electronic medical record (EMR) text data.
Analyze large hospital administrative datasets using machine learning techniques to identify patterns and trends in desired outcome.
Collaborate with healthcare professionals (epidemiologists) to understand and address healthcare domain requirements and challenges and translate them to feasible machine learning problems.

Faculty Member @ Hohhot Minzu University, Inner Mongolia, China (Sept 2016 - Sept 2018)

Taught courses in Data Structure and Algorithm Analysis, Website Development and Front-end Programming, and Practical Computer Programming.
Participated in a project to train Mongolian Language Models using artificial neural networks.
Managed university's high performance computing lab and helped researchers to deploy their algorithms.

Projects

Data-Driven Surgical-site Infection Identification with XGBoost

Developed a pipeline to identify surgical-site infections from EMRs, using data labeled by the Provincial Infection Control Group. EMR data was extracted via SQL, cleaned with Python, and preprocessed using NLP techniques. Features were extracted with Scikit-Learn, and an XGBoost model addressed data imbalance with undersampling and SMOTE. The model showed promising results across key evaluation metrics.

Using LLMs for Interpretation and Reasoning of Pathology Reports

Pioneered the use of Large Language Models (LLMs) such as GPT-3.5/4 to interpret pathology reports, significantly enhancing the determination of complete pathological responses (pCR) and thus the accuracy of cancer treatment assessments.
Developed a secure framework for processing de-identified reports and implemented an additional in-house pipeline for analyzing original pathology reports with a locally deployed LLM, addressing patient and physician privacy concerns.
Employed the Low-Rank Adaptation (LoRA) technique to fine-tune transformer-based models, achieving a substantial increase in model performance with 91% Sensitivity and 95% Positive Predictive Value (PPV) on a chart review dataset.

Estimating Cancer Diagnostic Interval Using Deep Learning Method

Orchestrated the design and implementation of a Convolutional Neural Network (CNN) to analyze care patterns, achieving effective classification of four major cancer types and advancing early detection efforts.
Tailored the CNN to identify nuanced patterns in patient visits, integrating visit type and timing to uncover early indicators of cancer, thereby contributing to potential improvements in diagnostic timeliness.

Skills & Abilities

Machine Learning: Expert in-depth knowledge of machine learning and deep learning principles, with practical experience in applying these techniques to healthcare data analysis and predictive modeling.
Natural Language Processing (NLP): Expert in NLP, with a focus on extracting clinical insights from unstructured healthcare data, including EMR text analysis and medical literature review.
Programming & Software Development: Expert in Python and its data-centric frameworks, with a robust foundation in object-oriented programming. Proficient in SQL and NoSQL databases, with hands-on experience in data warehousing solutions like Snowflake.
Communication: Strong skills in delivering clear presentations, writing concise reports, and data visualization. Adept at collaborating with cross-functional teams and facilitating effective communication channels.

Selected Publications

Cheligeer, C., Wu, G., Lee, S., et al. (2023). Using a Neural Network-based Language Understanding Model to Identify Inpatient Falls from Electronic Medical Records. JMIR Medical Informatics. Preprint.
Cheligeer, C., Huang, J., Wu, G., Bhuiyan, N., Xu, Y., & Zeng, Y. (2022). Machine learning in requirements elicitation: a literature review. AI EDAM, 36, e32.
Cheligeer, C., Wu, G., Xie, J., Chen, E., Quan, M. L., Cheung, W., & Xu, Y. (2023). The use of new primary cancer screening among stage IV cancer patients in Alberta: A population-based real-world study. Paper presented at Applied Research in Cancer Control, 2023.
Cheligeer, C., Quan, M. L., Cheung, W., & Xu, Y. (2023). Estimating cancer diagnostic interval using deep learning method based on population-based real-world health data. Paper presented at the Canadian Cancer Research Conference.
Cheligeer, C., Yang, J., Bayatpour, A., Miklin, A., Dufresne, S., Lin, L., ... & Zeng, Y. (2023). A Hybrid Semantic Networks Construction Framework for Engineering Design. Journal of Mechanical Design, 145(4), 041405.
Cheligeer, C., Yang, L., Nandi, T., Doktorchik, C., Quan, H., Zeng, Y., & Singh, S. (2023). Natural language processing (NLP) aided qualitative method in health research. Journal of Integrated Design and Process Science, 27(1), 41-58.
Wu, G., Cheligeer, C., Southern, D. A., Martin, E. A., Xu, Y., Leal, J., ... & Eastwood, C. A. (2023). Development of machine learning models for the detection of surgical site infections following total hip and knee arthroplasty: a multicenter cohort study. Antimicrobial Resistance & Infection Control, 12(1), 88.
Wu, G., Cheligeer, C., Brisson, A. M., Quan, M. L., Cheung, W. Y., Brenner, D., ... & Xu, Y. (2023). A New Method of Identifying Pathologic Complete Response After Neoadjuvant Chemotherapy for Breast Cancer Patients Using a Population-Based Electronic Medical Record System. Annals of Surgical Oncology, 30(4), 2095-2103.
Wu, G., Khair, S., Yang, F., Cheligeer, C., Southern, D., Zhang, Z., ... & Eastwood, C. A. (2022). Performance of machine learning algorithms for surgical site infection case detection and prediction: A systematic review and meta-analysis. Annals of Medicine and Surgery, 104956.
Sandhu, N., Whittle, S., Southern, D., Li, B., Youngson, E., Bakal, J., Mcleod, C., Hilderman, L., Williamson, T., Cheligeer, C. and Walker, R., (2023). Health Data Governance for Research Use in Alberta. International Journal of Population Data Science, 8(4).

clger007/resume