/LegNet

accurate prediction of promoter activity and variant effects from massive parallel reporter assays

Primary LanguagePythonMIT LicenseMIT

LegNet: solving the sequence-to-expression problem with SOTA convolutional networks

Here we present a convolutional network for predicting gene expression and sequence variant effects based on data obtained by large-scale parallel reporter assays.

Our approach secured 1st place in the recent DREAM 2022 challenge in predicting gene expression from millions of promoter sequences. To achieve the top performance, we drew inspiration from EfficientNetV2, a recent state-of-the-art in image analysis, and rephrased the initial sequence-to-expression regression problem as a soft-classification task. In the framework of the DREAM challenge, our model outperformed both attention transformers and recurrent neural networks.

This repository provides several resources: