Set2Gaussian: Embedding Gene Sets as Gaussian Distributions for Large-scale Gene Set Analysis
Set2Gaussian is a network-based embedding approach which takes a set of gene sets as input and output an embedding representation for each gene set. It could embed more than 10,000 gene sets. Instead of embedding as single points, Set2Gaussian embeds each gene set as a Gaussian distribution in order to model the diverse functions within a gene set.
Set2Gaussian: Embedding Gene Sets as Gaussian Distributions for Large-scale Gene Set Analysis. Sheng Wang, Emily Flynn, Russ B. Altman.
We provide the dataset and embeddings of 13,886 gene sets from NCI, Reactome, and MSigDB figshare A sample dataset is in the data folder. network.txt is the network in the following format:
node1 node2 weight
node_set.txt is the node set in the following format:
set1 node1
set1 node2
An example is in src/ It takes data/network.txt and data/node_set.txt as input. First replace them with your network and gene sets.
cd src
The embeddings will be saved in output_embed
- python 2.7 (with slight modification, python 3.6 can also be used to run our tool)
- python packages (numpy 1.14+, scipy 1.1+, networkx 2.3+, tensorflow 1.14.0)
For questions about the data and code, please contact We will do our best to provide support and address any issues. We appreciate your feedback!