Creation of the 2021/2 Output Area Classification
Metadata.R
- creating reference tables and glossaries for Census data.Downloading_data.R
– downloading 2011 and 2021 Census data from NOMIS usingnomisr
package.Comparing_Censuses.R
- data cleaning and amalgamation.Transforming_Census_data.R
– data manipulation and transformation for the classification.NI.R
– modelling 2021 data for Northern Ireland.Correlation.R
- testing correlation between the variables.Pre-clustering.R
- preparing data for clustering.Clustering.R
- clustering of the data.Post-clustering.R
- creating maps and plots of the cluster solution.Testing_clustering.R
Clustergrams.ipynb
– creating Clustergrams in Python. (Credits: Prof Alex Singleton)Industry.R
- loading Industry dataIndustry_classification.R
- creating geodemographic classification with Industry variablesGraph_comparisons.R
- comparing data with graphs
List of folders (subfolders & files) in the project:
- API - Census data downloaded and saved with use of
nomisr
package. - Clean – amalgamated data ready for the analysis. Downloadable via this link.
- Raw_counts - datasets with raw counts
- Percentages - datasets transformed into percentages
- Transformed - datasets transformed with IHS (analysis-ready)
- Final_variables - datasets with OAC variables only
- All_data_clustering - results of the clustering for all investigated datasets.
- Clustering - datasets with cluster assignment for the UK and centroids.
- Lookups - reference tables for 2011 and 2021 Census variables.
- NISRA 2021 - 2021 Census data at LGD level for Northern Ireland.
- Objects - R objects created and stored to ensure consistency of the results or load big files
- SIR – contingency tables on disability counts by age, utilised for calculation of Standardised Illness Ratio.
Downloadable via this link.
- Bar_plots - Comparison of clusters to the UK (as well as Supergroup and Group averages)
- Clustergrams – plots used to establish number of clusters at each classification level.
Downloadable via this link.