LegNet: solving the sequence-to-expression problem with SOTA convolutional networks

Here we present a convolutional network for predicting gene expression and sequence variant effects based on data obtained by large-scale parallel reporter assays.

Our approach secured 1st place in the recent DREAM 2022 challenge in predicting gene expression from millions of promoter sequences. To achieve the top performance, we drew inspiration from EfficientNetV2, a recent state-of-the-art in image analysis, and rephrased the initial sequence-to-expression regression problem as a soft-classification task. In the framework of the DREAM challenge, our model outperformed both attention transformers and recurrent neural networks.

This repository provides several resources:

A tutorial Jupyter notebook demonstrating how LegNet can be practically used with the data from yeast gigantic parallel reporter assays.
Scripts to reproduce the analysis presented in the LegNet manuscript based on the public GPRA data of Vaishnav et al., Zenodo.
Scripts to reproduce the autosome.org solution for the DREAM 2022 promoter expression challenge.

mattfaltyn/LegNet

LegNet: solving the sequence-to-expression problem with SOTA convolutional networks

This repository provides several resources: