- Retrieving compounds with EDC average score > 0.9
- Using RDkit to calculate physico chemical descriptors
- Removing compounds with constant descriptors
- Removing descriptors with constant values
- Removing compounds with NA desciptor values
- Retrieving pathway NES scores across all 15 layers
- Removing constant pathways across all chemicals for each data layer
- Scaling the descriptors between 0 and 1
- Calculation of euclidean distance matrix for descriptors matrix
- Performing agglomerative nested HCA using ward method on physico-chemical descriptors (chemical) space
- Calculation of euclidean distance for the pathway score in each data layer
- Performing agnes with ward method on transcriptome space for each data layer
- Performing multi view clustering using ward method and euclidean distance on toxicogenomics data space
- Preprocessing of Rdkit descriptors and removing pathways with constant values
- Tuning of lambda for each toxicogenomics data layer-physico chemical properties pair by means of grid M-FOLD CV (folds=5)
- Making final r-CCA using optimized paramters for each pair from step 2
- Heatmap visualization of r-CCA results with descriptors in terms of Correlation distance matrix for each data layer
- Chor diagram of top 5% most correlated descriptors with pathways scores for each data layer