/alzTE-HHV

Primary LanguageRMIT LicenseMIT

The data and code for paper "Transposable element activation drives human herpesvirus-mediated Alzheimer’s disease".

Data Description

This document contains scripts for analyzing transposable element (TE) expression in bulk and single-cell RNA sequencing (scRNA-seq) datasets. It contains four scripts.

1. HHV RNA screening using bulk RNA-seq data

Software requirement:

Input files

  • Sample list, include all the samples to be analyzed
  • Raw fastq file directory

How to run

  • Single-end RNA-seq data: nohup sh work_VIRTUS.SE.sh sample.lst fastq_dir > log 2>&1 &
  • Paired-end RNA-seq data: nohup sh work_VIRTUS.PE.sh sample.lst fastq_dir > log 2>&1 &

2. HHV RNA screening using scRNA-seq data

Software requirement:

Input files

  • Sample list: include all the samples to be analyzed
  • Raw fastq file directory
  • Parameters.txt: consists in a list of rows where the name of each variable is followed by an equal symbol and then the value of the parameter
  • Target_file.txt: contains the list of the file paths generated by umi_tools. The files can be either .fastq or .fastq.gz files

How to run

  • nohup sh 01_umi_tools_work.sh sample.lst fastq_dir > log 2>&1 &
  • nohup Rscript Viral_Track_scanning.R ./Parameters.txt ./Target_file.txt > log 2>&1 &

3. Transposable element expression at single-cell level

Software requirement:

Input files

  • Sample list, include all the samples to be analyzed
  • Raw bam file directory
  • Use scTE generated csv.gz files as input files for scanpy

How to run

  • nohup sh 0.scTE.sh sample.lst > log 2>&1 &
  • nohup python 1.pack.py > log 2>&1 &
  • nohup python 2.norm_and_learn.py > log 2>&1 &
  • nohup python 3.diffexp_01.py > log1 2>&1 &
  • nohup python 3.diffexp_02.py > log2 2>&1 &

4. WGCNA (weighted gene co-expression network analysis)

Software requirement:

Input files

  • locusTE_gene_FPKM.txt: This dataset contains expression matrix for each brain biobank. Note that each raw corresponds to a gene or a locus TE and column to a sample
  • Metadata.csv: this table contains metadata information for the samples. Note that each raw corresponds to a sample and column to a clinical-pathologic parameter

How to run

  • nohup Rscript wgcna_pipeline.R > log 2>&1 &