/Z-DNABERT

Primary LanguageJupyter Notebook

DNABERT-Z

Modifications by Garratt

Installation

Prerequisites

Native system or Virtual Machine with at least around 30 GB free.

The systems is required to provide the following commands:

  • mkdir
  • cp

Dependencies

NixOS

On NixOS, all dependencies are already included in the file shell.nix, no manual installation of dependencies is required.

For Nvidia CUDA support, the file shell.nix has to be edited, specifically # in the line containing # cudaPackages.cudatoolkit has to be removed.

Debian / Ubuntu

For Debian / Ubuntu, the following two commands should suffice:

sudo apt install gcc curl python3.10 python3-pip
python3.10 -m pip install --user virtualenv
Other

On other systems, dependencies may need to be installed manually if the previous sections are not applicable. Please refer to the previous install commands for Debian / Ubuntu and the shell.nix file for NixOS to derive the required setup for your system.

Usage

Starting JupyterLab

On the first run, starting up might take a while, because the python dependencies required to run the Notebook will be installed.

NixOS

To start JupyterLab on NixOS, the following command needs to be executed:

nix-shell
Debian / Ubuntu / Other

To start JupyterLab on Debian / Ubuntu and potentially other systems, the following command needs to be executed:

bash ./run.sh

Using the Notebook

Once JupyterLab has opened, open the notebook ZDNA-prediction.local.ipynb in JupyterLab.

The big blue link at the very top with the title "Jump to Run Section" can be used to jump to the "Run"-Section, if you feel like it takes too long to scroll down.

The "Run"-Section contains further information on how to use the notebook.

Original README

This repository contains code and data for the article "Z-Flipon Variants reveal the many roles of Z-DNA and Z-RNA in health and disease"

The full genome predictions for human and mouse genomes can be downloaded here

To predict Z-DNA flipons on new data please use this colab notebook

The finetuned DNABERT weights can be downloaded from google drive:

Files in this repository

1_HG_chipseq.ipynb - Generate data splits for HG data with Chipseq labels. Train the models. Generate full genome predictions.

1_HG_kousine.ipynb - Generate data splits for HG data with Kouzine labels. Train the models. Generate full genome predictions.

1_MM_curax.ipynb - Generate data splits for MM data with Curax labels. Train the models.

1_MM_kousine.ipynb - Generate data splits for MM data with Kouzine labels. Train the models.

2_Generate_stats_hg_chipseq.ipynb - Calculate most frequently attended k-mers for HG data with Chipseq labels.

2_Generate_stats_hg_kouzine.ipynb - Calculate most frequently attended k-mers for HG data with Kouzine labels.

2_Generate_stats_mm_curax.ipynb - Generate full genome predictions for MM data with Curax labels. Calculate most frequently attended k-mers.

2_Generate_stats_mm_kouzine.ipynb - Generate full genome predictions for MM data with Kouzine labels. Calculate most frequently attended k-mers.

README.md - This file

ZDNA-prediction.ipynb - Standalone notebook for prediction of Z-DNA. Intended to be run in colab enviroment via: https://colab.research.google.com/github/mitiau/Z-DNABERT/blob/main/ZDNA-prediction.ipynb