Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity
Marta Pineda-Moncusí, Freya Allery, Antonella Delmestri, Thomas Bolton, John Nolan, Johan H. Thygesen, Alex Handy, Amitava Banerjee, Spiros Denaxas, Christopher Tomlinson, Alastair K. Denniston, Cathie Sudlow, Ashley Akbari, Angela Wood, Gary S. Collins, Irene Petersen, Laura C. Coates, Kamlesh Khunti, Daniel Prieto-Alhambra & Sara Khalid, on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium
Inequality in health has been highlighted by the COVID-19 pandemic, where people from ethnically diverse backgrounds were disproportionately affected. But we know inequity is not limited to the pandemic as it is a long-standing, multi-faceted issue.
An example is technology for predicting a person’s future health risks. This involves routinely collected health information, which is fed into a computer model which in turn produces a health risk score for a patient, and that is used by doctors to decide patient care. If there is bias in the data or bias in the model, the doctor can potentially make wrong decisions and patients can get the wrong care or no care, meaning some groups of patients might inappropriately be prioritised over other for booster vaccines, hospital beds, or life-saving treatments, which in turn can affect patient and public trust, and cost the NHS.
This proposal aims to improve existing technology for predicting personalised future risk of health conditions, particularly those affecting overlooked groups of patients. We aim to do so by a) improving the way recorded ethnicity is used in research (results shown in this repository), b) observe Women’s health and ethnic disparities in mortality and cardiovascular disease (results shown in sub-project CCU037_02) and c) improving the modelling process to build risk prediction models tailored to ethnicity groups and therefore more reliable in practice (results shown in sub-project CCU037_05).
We propose to develop a calculator to predict cardiovascular disease (CVD) in COVID-19 patients as an example to demonstrate our approach, as here ethnic biases are known to exist. As inequity in data and models affects all disease areas, our approach is sustainable and can be applied to other clinical areas in the NHS. The calculator can be used by public to guide lifestyle choices, and by doctors to provide better care.
The publicly available algorithm for improving information on ethnicity can be used by researchers nationwide doing health research involving ethnicity.
This work will be based on anonymised health information that represents almost everyone currently living in England. By extending to Scotland, Wales and Northern Ireland in future, we hope that this work will help to make health equal and fair for everyone in the UK.
Pineda-Moncusí, M., Allery, F., Delmestri, A. et al. Ethnicity data resource in population-wide health records: completeness, coverage and granularity of diversity. Sci Data 11, 221 (2024). https://doi.org/10.1038/s41597-024-02958-1
- View the analysis code used in NHS England's SDE for England
- View the phenotyping algorithms and codelists used in NHS England's SDE for England
This is a sub-project of project CCU037 approved by the CVD-COVID-UK / COVID-IMPACT Approvals & Oversight Board (sub-project: CCU037_01).
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.