/tcga-embedding

using shallow neural network layer (embedding) to infer gene-gene/sample relationship from gene expression data

Primary LanguageJupyter NotebookMIT LicenseMIT

Embedding (TCGA RNASeq)

Source code of applying embedding on TCGA RNASeqV2 RSEM normalized data.

Link

Web Interactive Embedding Projector (powered by TensorFlow)

Gene Embedding Matrix from:

Source Code

Handy python scripts to load data (load_data.py) and functions for handling embeddings (util.py) are included.

Dependencies

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • networkx
  • scipy
  • sklearn
  • fastai

Usage

  1. Clone the repo locally.
  2. Change directory to the local directory.
  3. Run python train.py --data $YOUR_INPUT_DATA --out-prefix $OUT --out-dir $OUTPUT_PATH.

Note.train.py can only be run on CUDA enabled machine. Input data must be .csv with oberservation per row and must have an ID column.

Folder Structure

tcga-embedding
|   LICENSE
|   README.rst
|   load_data.py
|   train.py
|   util.py
└───emb
    |   gemb_bias_CN.csv
    |   gemb_bias_normal.csv
    |   gemb_CN.csv
    |   gemb_normal.csv
    |   semb_bias_CN.csv
    |   semb_bias_normal.csv
    |   semb_CN.csv
    |   semb_normal.csv
    └───geneSCF
        |   gemb_d17_top_GO_BP.tsv
        |   gemb_d22_top_GO_BP.tsv
        |   gemb_d25_top_GO_BP.tsv
        |   gemb_d35_bottom_GO_BP.tsv
        |   gemb_d43_bottom_GO_BP.tsv
        |   gemb_d46_bottom_GO_BP.tsv
           
└───ipynb
    |   tcga_emb_dist.ipynb
    |   tcga_emb_pca.ipynb
    |   tcga_emb_subtyping.ipynb
    |   tcga_ioresponse.ipynb
    |   tcga_plot_emb_som_pca_heatmap.ipynb
    |   tcga_plot_gsea_compare.ipynb
    |   tcga_som.ipynb
    |   tcga_training_CN.ipynb
    |   tcga_training_normal.ipynb

└───ref
    |   genes_gids.tsv
    |   sid_ca.csv