The GRPM (Gene-Rsid-Pmid-Mesh) system is designed to facilitate the integration and analysis of genetic polymorphism data associated with nutrition. The system consists of five modules, each serving a specific purpose and requiring different Python modules. Here's an overview of each module:
The purpose of this module is to retrieve data from the LitVar and PubMed databases and merge them into a CSV (comma-separated values) format. It utilizes the following Python modules:
requests
: for making HTTP requests to retrieve data from the databasespandas
: for data manipulation and CSV creationbiopython
: for handling bibliographic informationnbib
: for parsing bibliographic recordsbeautifulsoup
: for parsing HTML data
This module focuses on creating a coherent mesh term list to explore the database. It employs the ChatGPT language model to extract mesh terms from the complete mesh dataset. The required Python modules are:
pandas
: for data manipulationopenai
: for interacting with the ChatGPT API
Module 03 incorporates the reference mesh list generated by Module 02 into the database. It extracts a survey from the integrated data. The only required Python module for this module is:
pandas
: for data manipulation
The purpose of Module 04 is to analyze the reports generated by Module 01 and Module 03, as well as the GRPM association data. It utilizes the following Python modules:
pandas
: for data manipulationmatplotlib
: for data visualizationseaborn
: for enhanced data visualization
This module incorporates GWAS (Genome-Wide Association Study) data extracted from the complete GWAS catalog dataset into the GRPM surveys. It associates GWAS phenotypes and possible risk/effect alleles with the GRPM relationships. The required Python module is:
pandas
: for data manipulation