/ETC5512-Assignment3

Primary LanguageHTMLApache License 2.0Apache-2.0

ETC5512-Assignment3

"Covid-19 had a significant impact on the fiscal, home and work experiences of Melbourne residents during 2020. To understand the impact of changes in 2020, researchers conducted a survey of 1000 Melbourne residents to explore the difference between their 2019 and 2020 experience with regards to mental health, financial stability and home life.

The survey was collected online through a survey platform called Qualtrics in the month of January, 2021. Respondents were invited to participate through a mailed invitation, from where they were encouraged to scan a QR code and complete the survey. Respondents first gave informed consent for the collection and use of their data, and the eventual distribution of de-identified data. At the conclusion of the survey, respondents gave an email address to which a Coles giftcard worth $20 was sent in compensation for their time.

This survey is a hypothetical example of data that might have been collected during this period. It does not have ethics approval, nor was it actually sent to participants. The data you have been given is simulated for teaching purposes and is not intended for analysis or interpretation." ~ Kennedy L.A.,(2021) Data Custodian. Simulated Data Survey.

The published data is de-identified with respect to the demographics of the participants owing to the ethical responsibility and to comply with the Australian Privacy Principles (APPs) in the Privacy Act 1988. Ethics is a balance of risk and benefit. In this case the risk of identification is balanced with the utility of the data with respect to mental health, financial stability and home life. The indicators of financial stability are income and work schedule stability which determine if the participant earned well and had a stable work schedule week to week. Additionally, working hours and work from home determine the differences between the work in 2019 and 2020. In fact, each of the indicators can be compared across 2019 and 2020 to analyze and determine the impact of the corona virus pandemic. Home life is indicated by home life description. This description might be a related to the number of adults and children in the participant's household and the income. Lastly, Mental health is indicated by mental health description which may be again dependent on home life. Further analysis on this data can be used to prove such associations.

An important understanding of this published data is that it conforms to the tidy data definition: each variable is in its own column, each observation is in its own row and each value is in its own cell. Owing to this norm, multiple records exist for the same participant. Therefore for each participant the income occurs for 3 years, 2019, 2020 and 2021; the combination of "workfromhome", "hrperweek",
"wrkschedulestability" and "mentalhealth" occur twice, each for the year 2019 and 2020; "wrktimechoice" occurs ten times(five for each year, 2019 and 2020); and "homelife" occurs twelve times(six for each year, 2019 and 2020). Therefore there are 356, 90 records for each participant and redundancy must be considered accordingly. Additionally, "wrktimechoiceeoption" and "homelife" is provided to refer if that option was selected by the participant for "wrktimechoice"and "homelife" respectively. If an option was selected then the option number is the value otherwise 0. An important understanding here while filtering records here is to understand that there are participants who have not responded to any of the options and therefore for that particular year, all of the options( [1,5] for "wrktimechoice" and [1,6] for "homelife") will be 0 which is equivalent to NA, that is, this question was not answered by the participant. On the other hand, a participant had answered the question if any one of the option is not 0.

For further details refer the data dictionary which shows the meaning of each variable and relalted information and the codebook which has the factor levels and the count representating each factor that the number of participants who chose that factor. The questionnaire is provided to understand the survey.

This data is licensed under Apache License, Version 2.0, January 2004, http://www.apache.org/licenses/