Tono Amboro
This repository was created as a part of the assignment on Big Data Analysis course, which will attempt to create several methods for analyzing DNA sequence using Python 3.
- README.md (this file).
- Script.ipynb. It is a python script which generated from Jupyter lab. I created this file so that I can try some scripts before I write it on the .py files. This file is the environment for trial and error of some scripts.
- assg_script.py. It is the main script that I created for this assignment. In this script, I write some method to achieve the goal of this assignment.
- test_assg_script.py. It is a pytest file for testing the function that I wrote in the main script. This script ensures the main script to generate the desired output.
- nd2.fasta. It is a DNA sample (in .fasta format) that contain a lot of DNA sequences from many species. Our instructor provides this file.
- dna_sample.fasta. It is a DNA sample (in .fasta format) that I created to work on my main script. It contains fewer sequences so I could accelerate the process.
- dna_false_sample.fasta. It is a DNA sample (in .fasta format) that I created to work on my main script. It also contains fewer sequences so I could accelerate the process. However, this file has a false sequence alphabets on it, which also similar with the nd2.fasta file. This is an important file so that we could test the script whether it could differentiate the DNA sequence (ATCG) with another sequence, also to spot when something is wrong with the DNA sequence.
- output folder. This folder contain tables folder and graphs folder to save the desired output from the main script. Inside the tables folder, there are several .csv file with table in it. Whereas inside the graphs folder, there are two graphs generated by the main script. I use the dna_sample.fasta file to generate the output files (tables and graphs) as an example of running the main script.
Before you run the script:
- Ensure your data file is in .fasta format.
- Place the script and the data in the same directory.
- Create output folder in the same directory, and create tables and graphs folder inside the output folder.
- Although, you will be warned when your DNA sequence is incorrect, always ensuring your DNA sequence in the right format is recommended.