/NCBI

This repository contains resources used to perform ETL on NCBI Genbank files.

Primary LanguagePython


NCBI Genbank ETL

About The Project

The purpose of this project was to:

  1. Access NCBI nucleotide database and downloading relevant files using their API.
  2. Extracting relevant data from each Genbank file.
  3. Moving extracted data to storage.

Getting Started

Scripts are comatable with Python 3.9.14+, Pandas 2.0+, Biopython 1.80+.

Prerequisites

  • Python
  • Pandas
  • NumPy

Installation

conda create -n env_name python=3.9.14
conda install pandas
conda install numpy
conda install -c anaconda natsort