/gwas-utils

Collection of tools and scripts to manage GWAS Catalog related infrastructure.

Primary LanguagePython

gwas-utils

This repository is a collection for scripts and small applications we are using in the everyday life of the GWAS Catalog.

For detailed description of the content of this repository see the individual readme files within each folder or the documentation on Confluence.

Installation/execution

First thing to note is that many of the utils have a hard dependency on the curation database. This make the portability of those utils troublesome and they cannot be run off the network (i.e. locally).

With Docker

docker run -it ebispot/gwas-utils <entry_point> [options]

e.g.

docker run -it ebispot/gwas-utils python /catalogPlots/gwas_cat_plus_ss.py

With conda

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
conda env create -f conda_env.yml
conda activate gwas-utils
pip install .

With virtualenv

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
python3 -m venv .venv
source .venv/bin/activate
pip install .

User/system wide

git clone git@github.com:EBISPOT/gwas-utils.git
cd gwas-utils
pip install .

Contents

After installation (above) the tools below will be available. Usage, entry points and further documentation for each utility is given on the following links:

A collection of scripts we use to generate plots, stats of the GWAS Catalog.

Historic curator scripts (merged in from https://github.com/EBISPOT/gwas-curation-utils)

A tool to add, change, remove curator user in the database.

Tool to compare databases and solr as part of the quality control process. This script is called during the data release process.

A script to perform the data export task of the data release plan. Generates all downloadable files, names them properly, then generates release specific readme for the ftp folder.

A tool to solve issues with diagram generation: when the pussycat application is called, this script keeps checking the process and the generation of the diagram. Also performs certain checks. This script is also part of the data release process.

EPMC API querying tool

Tool to release summary stats folders to ftp. This script is called during the data release process.

Tool for application flagging peak associations in a distance based fashion (merged in from https://github.com/EBISPOT/gwas-associationFilter)

Scripts to analyse site access logs to generate statistics on user behaviour.

Upon every new release of Ensembl the full GWAS Catalog data has to be remapped to the new release. This tool to help the remapping process by automating the process that triggers remapping.

To generate site access stats it is useful to know what users are sarching for. This script classifies search terms parsed out from site access statistics.

This small Python module makes it easy to query, update, refresh the specified solr instance/core.

Scripts to control summary statistics file release to the FTP

Scripts to control data flow from submission app to harmonisation pipeline