MOL3022-bioinformatics-project

This repository is related to a project in the course MOL3022 Bioinformatics - Method Oriented Project and contains the code for the backend application and for training and testing the machine learning model.

The model training and testing can be found in the notebooks training.ipynb and testing.ipynb, respectively.

The frontend code can be found at https://github.com/Senja20/mol-3022-front-end-application.

The model card for the machine learning model can be found at https://huggingface.co/andreas122001/mol3022-signal-peptide-prediction.

Data format

The program expects data to be in FASTA format with a header where the kingdom (organism group) is at the first index of the header, and the kingdom is one "EUKARYA", "ARCHAEA", "POSITIVE" or "NEGATIVE". For example:

>EUKARYA|other_header_items
MSGYSPLSSGPADVHIGKAGFFSSVINLANTILGAGILSLPNAFTKTGLLFGCLTIVFSAFASFLGLYFV

Example data in the correct format is provided in data/examples_small.fasta and data/examples.fasta.

Usage

Requirements

This project assumes you have Python installed on your machine.

To install requirements, do:

pip install -r requirements.txt

Run it either by (1) hosting the backend-frontend servers, or (2) by running the CLI:

(1) Host the backend

To run the backend:

python src/api.py

This will automatically download the machine learning model, if it is not already on your machine, and run it on the backend server.

See frontend repo for instructions on how to run the frontend.

(2) Run as CLI

Alternatively, run the CLI like this:

python src/cli.py --file path/to/file.fasta

# E.g. test it on the example file like this:
python src/cli.py --file data/examples_small.fasta

This will also automatically download the machine learning model, but will only run it on the provided dataset. You can test it out using the example data.

You can also see the video guide:

CLI-example-usage.mp4

Usage of the CLI can be seen below: