/GSK_analysis

Data analysis for the GSK project

Primary LanguageJupyter Notebook

GSK_analysis

Xiaonan Wang

26May2020

Summary: Data analysis of the GSK project

Introduction

The analysis was done following the strategy below:

schematic

Two reference datasets were used for data projection:

  1. Cord blood dataset (Unpublished)
  2. Kuffman dataset (Unpublished)

Notebooks

This folder contains all jupyter notebooks that were used for the data analysis. To run the notebooks, the Smart-Seq2 preprocessng package would need to be installed first as

pip install smqpp

The main analysis include:

  • CBdata_all: Analysis of Cord blood data to generate visualisation layouts as references for projection.
  • Kuffman_all: Analysis of Kuffman data to generate visualisation layouts as references for projection of 0hr data.
  • MPB1234_all: Analysis of all MPB cells together, for better batch correction, data was split by days.
  • MPB_scGen: Batch correction and prediction of perturbation using scGen for MPB 0hr and 62 hr NT and GFP+ cells.
  • MPB1234_Day0: Analysis of MPB 0hr cells
  • MPB1234_Day3: Analysis of MPB 62hr cells
  • BM789_all: Analylsis of all BM cells together, for better batch correction, data was split by days
  • BM_scGen: Batch correction and prediction of perturbation using scGen for BM 0hr and 62 hr NT and GFP+ cells.
  • BM789_Day0: Analysis of BM 0hr cells
  • BM789_Day3: Analysis of BM 62hr cells

DPT analysis:

  • CB test: Check different data integration and projection methods for CB data Batch 1 and Batch2
  • Kuffman test: Check different data integration and projection methods for Kuffman data as reference and MPB Day0 and BM Day0
  • DI Method1: Data integration and projection for CB data as reference and MPB and BM data using combat for batch correction (BC) following by destiny R package for calculating diffusion map (DM), corresponding projection (Proj) and diffusion pseudotime (DPT)
  • DI Method2: Data integration and projection for CB data as reference and MPB and BM data using combat for CB following by scanpy for calculating DM, corresponding Proj and DPT
  • DI Method3: Data integration and projection for CB data as reference and MPB and BM data using scanpy for pca calculation whcih were then corrected by reducedMNN in batchelor R package. The corrected PCA was used in scanpy for calculating DM, corresponding Proj and DPT
  • Method comparison: Comparing the three methods used for data integration