/youlldie

A web app that statistically predicts your life expectancy based on your inherited risk factors and lifestyle choices, leveraging data from peer-reviewed literature and public databases

Primary LanguageRGNU Affero General Public License v3.0AGPL-3.0

YOULLDIE

Executive Summary

youlldie is an open-source, data-driven, AI-powered app that statistically predicts life expectancy. Specifically, it determines the risk of death from different causes based on inherited risk factors, environment and lifestyle choices. The app enables efficient decision-making and solves several immediate problems in the healthcare, life insurance, financial planning and wealth management industries. The app also aims to incite users to adopt healthy lifestyles. This project makes the app’s source code fully transparent and aims to gather input from the open-source community to improve the algorithm's accuracy. This project also aims to build a data lakehouse comprising mortality data from global healthcare systems to make the app totally data-driven, accurate, and relevant to the world's population.

How the app works

The app is currently available at https://youlldie.com

Users enter risk factors into the apps. Specifically, without any input, the apps provide the life expectancy of the world's population. Every input refines the life expectancy prediction to a more specific population, down to the individual level.

Mission

This project’s mission is to improve global life expectancy by optimizing health-related decision-making with data-driven solutions. The goal is to create a global mortality data lakehouse and develop an app that can predict life expectancy based on the data compiled in the lakehouse. The idea behind the project is that metrics are powerful enablers of improvements, and life expectancy can be improved if it can be predicted accurately.

Problems Addressed

Inaccurate life expectancy prediction model

Predicting life expectancy is complex. It must consider many factors to be accurate and useful. For example, gender, race, world region, education, income, alcohol use, tobacco use, physical activity, sleep, blood pressure, body mass index, medical history and family medical history are important factors that impact life expectancy. Currently, no life expectancy model can account for all those risk factors. Thus, their accuracy is questionable. Moreover, no life expectancy algorithm is actualized with current data. As such, they fail to account for the ever-changing global context, including social behaviours, epidemics, climates, industrial hazards, and geopolitical risks.

Ineffective life expectancy prediction tool for the general public

No accurate life expectancy prediction tool is available to the general public. People must thus rely on guesswork riddled with cognitive biases to assess their life expectancy. As such, the general public can hardly estimate their life expectancy and make informed decisions about their future when planning their finances and managing their wealth. Furthermore, health authorities have never effectively conveyed the impact of different risk factors on life expectancy to the general public. Public health warnings are often undermined by cognitive bias in individuals who are misinformed or consider themselves an exception to the rule. For example, many consider alcohol harmless or even good in moderation, whereas the scientific community agrees that there is no safe amount of alcohol.

Unavailability of structured worldwide mortality data for research

The multifactorial nature of life expectancy cannot be fully understood without holistically assessing the impact of all risk factors on all causes of death. Unfortunately, no standardized worldwide mortality data pool exists to allow for this. As such, solid conclusions about global life expectancy can hardly be drawn.

Solution Offered

The youlldie app can account for an unlimited number of risk factors and their interactions. Also, with the development of a global mortality data lakehouse, the app can be data-driven by the most current global mortality data. As such, the app has the potential to be the most accurate life expectancy prediction tool available. It can foster a realistic understanding of risks to life expectancy and contribute to better decision-making related to life choices. Indeed, it can lead to a better appreciation of life in general. Moreover, by leveraging a global mortality data lakehouse, the app has the potential to make predictions beyond current scientific knowledge and act as a beacon to orient further research.

Emerging Opportunity

Global mortality data holds invaluable information that can be used to improve global health. Moreover, modern data communication and warehousing capabilities allow the pooling of global mortality data into a structured data lakehouse. Such data lakehouse would make it possible to train a statistical model that can predict life expectancy with unprecedented accuracy and roll it out as a decision-making supporting tool. The potential of such data lakehouse and life-expectancy prediction tools is great and spans many industries. Specifically, an accurate life expectancy prediction tool can support health-related decisions made by healthcare professionals, researchers, insurance providers, financial advisors, and the general public. As such, developing a life expectancy prediction tool capable of leveraging global mortality data represents an opportunity to make the world population generally healthier.

Market Segmentation

Healthcare

Knowing a patient's potential life expectancy can help healthcare professionals tailor personalized interventions and preventive care. Specifically, a life expectancy prediction tool can help healthcare providers provide targeted interventions for conditions that most impact patients' life expectancies.

Moreover, a life expectancy prediction tool can optimize the implementation of public health programs. Specifically, it can improve the effectiveness of resource allocation and ensure that individuals with potentially shorter life expectancies receive appropriate care. Finally, accurate life expectancy predictions can sensibilize the population to health risks and reduce early death and associated costs. As such, a life expectancy prediction tool can improve healthcare systems' efficiency.

Health Research

For health and socio-economic researchers, a data lakehouse comprising global mortality data represents a valuable source of information for guiding research. Moreover, a life expectancy prediction tool can help researchers better understand population health trends, disparities and factors influencing longevity.

Life Insurance

For the life insurance industry, an accurate life expectancy prediction tool improves risk assessments and allows companies to offer fair and balanced insurance products. This confers a competitive advantage as it allows insurance products to be provided to a broader audience while reducing losses. This can translate into improving global health as more people get coverage for the treatment and care they need while reducing the burden on the payers.

Financial Planning and Wealth Management

Knowing how much time one has left is of great value for any long-term commitment. Specifically, for financial planning and wealth management, a life expectancy prediction tool can help answer the question "When do you plan to retire?" which is central in determining how much one needs to save and invest to ensure sufficient funds throughout retirement. Furthermore, for estate planning, predicting one’s life expectancy allows one to plan responsibly and ensure the well-being of loved ones. As such, a life expectancy prediction tool conveys a competitive advantage to financial planning and wealth management companies as it can serve as a crystal ball to help determine when major life milestones, such as starting a family and retiring, should occur. This allows financial planning and wealth management companies to assess their clients' financial needs more accurately and make their offers more attractive.

Wellness

For wellness management, a life expectancy prediction tool can provide a realistic perspective of death that can improve lifestyle choices. It can, for example, present incentives for adopting healthy lifestyles. This, in turn, can improve quality of life and reduce the risk of preventable deaths. As such, a life expectancy prediction tool can help users fully appreciate the value of healthy habits.

Objectives & Milestones

The algorithm behind the app was built based on information gathered from peer-reviewed literature and public databases. As such, the app's output aligns with the current scientific knowledge. The app was also built to be data-driven and be able to "learn" from current mortality data.

The next step is to build a data lakehouse comprising global mortality data. The accumulation of this data will serve to gradually improve the app's accuracy and relevance to the world's population.

For the first year, the goal is to establish a proof of concept that mortality data can be used to train an algorithm to predict life expectancy. For the second year, the goal is to demonstrate that the algorithm has value for strategic decision-making about patient care, insurance coverage, financial planning, wealth management and lifestyle choice. As such, the following key milestones are set over the first two years:

Year 1

  • Objective 1: Gain access to mortality data from a major healthcare system.
  • Objective 2: Establish a standardized framework for acquiring data from global health providers.
  • Objective 3: Execute machine learning to improve the app’s predictive power and usefulness.
  • Objective 4: Validate the app's ability to predict life expectancy with a 90% accuracy.
  • Objective 5 / Milestone: Publish a proof of concept (POC) article in a peer-reviewed scientific journal describing the app and the outcome of one year of machine learning.

Year 2

  • Objective 6: Gain data access to mortality data in major economies (USA, UK, EU, Japan). *Objective 6: Gain endorsement from high-potential companies (i.e., healthcare providers, insurance companies, financial planning and wealth management companies, wellness companies)

Risks and Contingencies

Regulatory Framework Related to Personal Health Information

There are significant regulatory challenges related to the collection, use and sharing of Personal Health Information (PHI). Those regulations stem from the universal need to protect individuals' privacy. Indeed, protecting privacy is crucial and must always take precedence when dealing with health data. In many countries, there are stringent regulations based on privacy laws that govern the use of PHI. Compliance with those regulations is crucial to avoid legal consequences. (See HIPAA in the United States and GDPR in the European Union:).

To comply with PHI-related regulations, a de-identifying scheme must be implemented during data acquisition and before data storage to ensure that the data stored in the data lakehouse contains no PHI and is very difficult to link back to a diseased person and its living family. Robust security measures to prevent and respond to breaches of the data lakehouse must also be implemented.

The use of a deceased individual's personal health information is subject to privacy laws and regulations, which can vary by jurisdiction. In many cases, privacy laws, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States or the General Data Protection Regulation (GDPR) in the European Union, primarily protect the privacy and confidentiality of living individuals. The privacy protections that apply during an individual's lifetime may no longer be as stringent after death. However, the specifics can vary, and there might still be some restrictions on the use and disclosure of deceased individuals' health information.

The regulatory frameworks concerning the use of PHI are continuously evolving. Therefore, contributors to this project must be aware and stay informed about changes in legislation and privacy policies to adjust practices accordingly and ensure regulatory compliance and public trust.

Ethical Considerations

There are significant ethical considerations surrounding the concept of life expectancy predictions. Indeed, life expectancy predictions can have adverse psychological effects on individuals. As such, care must be taken to ensure the app is rolled out responsibly. The web app's header states, "This app is not a death sentence generator. It can’t determine when you’ll die for sure. It is just a statistical analysis model that illustrates what the numbers look like. And you’re not a number". This is an important statement. The public impression of the app will be monitored using pop-up surveys to ensure that the app’s output does not harm users' well-being.

Moreover, healthcare institutions adhere to ethical standards and internal policies that guide the respectful and responsible use of post-mortem PHI. As such, contributors to this project must adhere to global ethical standards for handling post-mortem PHI and recognize local regulations and policies when negotiating access to PHI with individual healthcare institutions.

Public Opinion

The app's development must be transparent to avoid public scorn. That is why the algorithm behind the app is shared with the public as an open-source project, which ensures complete transparency. Using this approach, the app is always open for discussion and can be modified in a controlled manner by anyone with a valid argument.

Mortality Data Lakehouse

Contributors to the mortality data lakehouse are asked to share what they want to share. Contributors are asked to provide discreet mortality datasets through a secure File Transfer Protocol (sFTP) server.

Data Transfer Specifications

The lakehouse variables are listed below. A Data Transfer Specification (DTS) should be executed to document how the data provided by a contributor maps to the lakehouse variables. The lakehouse mapping step specifically aims to standardize the data provided by contributors and drop personal identifier information like names. If contributors wish to exclude certain data from the lakehouse, it may be specified on the DTS.

  • Death ID
  • Primary cause of death
  • Gender
  • Race
  • Age of Death
  • World Bank Region
  • Financial Status
  • Highest level of Schooling
  • Number of drinks per week
  • Number of smokes per week
  • Number of moderate-intensity physical activity minutes per week
  • Number of vigorous-intensity physical activity minutes per week
  • Number of hours of sleep per day
  • Systolic Blood Pressure
  • Body Mass Index
  • High Blood Cholesterol (Yes/No)
  • Cardiovascular Disease (Yes/No)
  • Chronic Obstructive Pulmunary Disease (Yes/No)
  • Diabetes (Yes/No)
  • Depression (Yes/No)
  • Cancer (Yes/No)
  • Alzeimer (Yes/No)
  • Family History of Cardiovascular Disease (Yes/No)
  • Family History of Family History of Chronic Obstructive
  • Pulmunary (Yes/No)
  • Family History of Diabetes (Yes/No)
  • Family History of Depression (Yes/No)
  • Family History of Cancer (Yes/No)
  • Family History of Alzeimer (Yes/No)

The Code

The code behind the app follows the standard Shiny application structure. Please see the following for more information on Shiny apps: https://shiny.rstudio.com/tutorial/.

Essentially, the code has a "ui" component that serves to control the webpage display and a "server" component that provides the instructions to the server to create the output.

As such, all the code contained within "ui <- fluidPage()" serve to present input fields to users. Those input fields correspond to risk factors that have an impacts on the age, risk and population that dies from each cause of death. "ui <- fluidPage()" also contains some codes to display the outputs but the actual calculations take place within the "server <- function(input, output){}" portion of the code.

The Calculation

The algorithm works as follows:

  1. A dataframe is built within the app to tabulate the most common causes of death and their baseline age of death (AGE), risk of death (RISK) and rate of death (RATE).

    • The baseline AGE corresponds to the average age of death associated with each cause of death
    • The baseline RISK corresponds to the death rate associated with each causes of death (n / 100,000) divided by the sum of the death rates associated with all causes of death. As such, RISK is a probability that death from a given cause will happen. It is a value confined between 0 and 1.
    • The baseline RATE corresponds to the total population dying from each cause of death per year. It is also known as the Crude Death Rate.
  2. Two other dataframes, which are external to the app, are used to tabulate values corresponding to continuous and discreet risk factors parameters and their impacts on the baseline AGE, RISK and RATE for different causes of death. Specifically, the values corresponding to the risk factors parameters’ impacts act as multipliers of the baseline AGE, RISK and RATE. Namely, risk factors parameters with impact values >1 increase the baseline AGE, RISK and RATE values whereas risk factors parameters with impact values <1 decrease the baseline AGE, RISK and RATE values. Risk factors parameters that increase AGE and decrease RISK and RATE are beneficial. Risk factors parameter that decrease AGE and increase RISK and RATE are detrimental. For example, "male" and "female" are the two parameters of the risk factor "sex". The value for the impact of the "male" parameter on AGE is <1 for the cause of death "cardiovascular diseases" because males die from cardiovascular diseases at a younger age than the average population.

The values currently present in the dataframes have been determined from an extensive review of publicly available databases and peer-reviewed articles. Those are available on the apps References page.

One of the project’s objectives is to actualize those values with real-world data. Specifically, the data compiled in the global mortality data lakehouse should be used to continually update the values present in those dataframes, so that the app remains accurate and consistently reflects the current global risks of death.

The Input

The web app presents input fields to the user which correspond to risk factors’ parameters. The user selects risk factors’ parameters that correspond to the life expectancy profile of interest.

The Output

The calculations performed above yield an updated dataframe which is plotted as a bubble plot with the AGE of Death on the x-axis and the RISK of Death on the y-axis. The size of the bubbles corresponds to the RATE of death aka the Crude Death Rate.

Final Note

Everybody should contemplate their position in the stream of life. Everybody should, sometime in their lifetime, consider death. Observe skulls and skeletons and wonder what it will be like to go to sleep and never wake up. Never! That is a very gloomy thing for contemplation. But it’s just like manure. Just as manure fertilizes the plants and so on, so the contemplation of death and the acceptance of death is very highly generative of creating life. You’ll get wonderful things out of that. Death is important to think about. It must not be swept under the carpet. Thinking about and accepting death brings a trust in life. It incites one to let go. Stop clinging to constantly changing things that cannot be clung to. Recognize oneself as an unstable particle in the constantly changing flux of eternal life. Acknowledge the union and inseparability from everything else that there is. The contemplation of death allows one to change point of view and to find oneself. It awakens the senses. Thinking about death gives the opportunity to understand what life is all about and see what this universe is for. It is conducive to liberation. Understanding that everything is in the right place is the opportunity presented by the contemplation of death. -Alan Watts