/Research-notes-non-coding-Rna-

Research notes for ML projects of diffentating b/w coding and non-coding rna

Primary LanguageJupyter Notebook

Finding Coding RNA using Machine Learning ๐Ÿงฌ ๐Ÿงช ๐Ÿ’ป

This repository contains the code and data for a project aimed at finding whether a given DNA sequence is coding RNA or non-coding RNA using machine learning models. The project uses various features of the DNA sequence to train the models

Features ๐Ÿ”

  • ORF length
  • protein coding potential
  • G-C content
  • K-mer periodicity
  • Minimal Free Energy
  • Transcript length
  • CpG islands
  • BLAST similarity
  • Conservation Score
  • Nucleotide Frequencies
  • Hexamer score
  • Fickett Score

Dataset ๐Ÿ“Š

The dataset used in this project contains both coding and non-coding RNA sequences. The sequences are in FASTA format and have been pre-processed to extract various features such as ORF length, protein coding potential, etc. The dataset is split into training and testing sets, with 80% of the data used for training and the remaining 20% used for testing.

Models ๐Ÿค–

The following machine learning models have been implemented for this project:

  • Logistic Regression
  • Support Vector Machine (SVM)
  • Naive Bayes
  • Random Forest

These models have been trained using the features extracted from the dataset and are used to predict whether a given DNA sequence is coding RNA or non-coding RNA.

Requirements ๐Ÿ› ๏ธ

To run the code in this repository, you will need:

  • Python 3.x
  • NumPy
  • Pandas
  • Scikit-learn
  • Matplotlib
  • Biopython

Usage ๐Ÿš€

To use the code in this repository, follow these steps:

  • Clone the repository to your local machine.
  • Install the required dependencies using pip.
  • Use the trained models to predict whether a given DNA sequence is coding RNA or non-coding RNA.

Conclusion ๐Ÿ“

This project shows that machine learning models can be used to accurately classify DNA sequences as coding or non-coding RNA based on various features. The models implemented in this project can be further optimized and used in various biological applications, such as identifying potential drug targets and understanding gene expression. So, let's go and explore the world of RNA with Machine Learning!