The Gene Wiki Project

Background

The Gene Wiki project on Wikipedia is an initiative to create a comprehensive review article for every notable human gene. There are currently over 10,400 human genes in the Gene Wiki, and more are added at a steady rate.

We have developed a number of tools to analyze, expand and maintain the Gene Wiki project. While initial development was largely in Java, much of the core code is actively being ported to Python to facilitate use in scripting and ease-of-use.

Projects

The following projects all fall under the Gene Wiki umbrella:

  • pygenewiki : code to update and expand Gene Wiki pages and resources. Includes ProteinBoxBot, the GeneWiki API, and GeneWiki Generator (a BioGPS plugin).

  • ProteinBoxBot(this project): Wikidata bot to upload gene data onto Wikidata.

  • mediawiki-sync: a Java daemon that copies changes from one MediaWiki installation to another, created to support the Gene Wiki mirror at GeneWiki+.

  • genewiki-miner: code related to information extraction and parsing for many of the papers and analyses we've done on the Gene Wiki.

  • genewiki-commons: Common code used across Java projects (required as a Maven dependency)

  • genewiki-generator: Previous version of this project, written in Java. Provides the ProteinBoxBot and GeneWiki Generator (bioGPS plugin).

    ProteinBoxBot ====================================================================== ProteinBoxBot is a wikidata bot for maintaing Human(&Mouse) Gene(&Protein) items on Wikidata. PBB retreives information about genes through MyGene,info and creates/updates/maintains Gene items on Wikidata eg Reelin wikidata_item. In due course, the Protein Box templates of Gene Wikipedia articles (eg Reelin) will source their information from these Gene Wikidata items.

Installation

The bot only requires Pywikibot framework. The detailed installation steps for the framework are here. Additionally to run tests, it requires pytest

Quick Start Guide

The bot runs in two modes:

  • Normal sequential mode-- It retreives the set of entrez id's from genewikiplus. The id's returned from GW+ is the order in which the bot runs.

Command -- sudo python bot.py

  • Specified mode --- Specify a text file with list of entrez id's. The bot will run for these entrez id's only.

Command -- sudo python bot.py --only /path/to/file

The file contents should be of the folowing format.

only=[<list of entrez id's>] Ex: only=[5649,362]