/imputation

Investigating the use of Machine Learning methods on Imputation

Primary LanguageR

A consideration of the use of XGBoost as a replacement technique for imputation of missing values in official statistics.

Background

This work was originally undertaken as part of a Data Science Academy project. A scheme internal to the UK's Office for National Statistics to allow its staff to do short (2 weeks) projects into machine learning techniques applied into domains which are known to the mentee supervised by members of @datasciencecampus.

This project was created for @Vinayak-NZ.

Contents

Techniques compared include:

  • XGBoost
  • CANCEIS
  • RBEIS
  • Mixed methods

Future work is currently ongoing outside of this repository for other approaches which includes multiple imputation and a consideration of the use of more sophisticated techniques (e.g. autoencoders).

There is also an alternate workstream considering the use of genetic algorithms for this type of work. This is an Academy project for a separate member of the imputation methodology team.

Future plans include a methodological consideration of the suitability of these techniques in practice and to provide a more abstract consideration of what the most suitable mechanisms for this would be.

The project write up is available on the gh-pages branch and there is the presentation given after completion of the work on the presentation branch.