This project acts as an intermediate layer between the Genesys PGR API and an application that utilises its data. It is intended to both facilitate the query setup, as well as automate a big part of the response collection.
The project runs on Python 2.7+ and Python 3.2+.
Libraries required:
- requests (
pip install requests
)
The module as well as its dependencies can also be installed with python setup.py install
.
Before doing this, the basic configuration must be provided (see step 1 below).
The file main.py
showcases the use of the GenesysParser class. The
steps to follow are outlined below.
A template file titled config.py-example
can be edited for this purpose, to
produce config.py
. The parameters that must be specified are:
-
url
: the base address of the site to be queried. There are currently two possibilities for this, the official Genesys site (https://www.genesys-pgr.org/), and the Genesys sandbox site (https://sandbox.genesys-pgr.org/). The/
terminator is optional. -
clientID
andclientSecret
: The string credentials as given by the site. It is worth noting that they are not cross-compatible, so they must be for the respective site.
In config.py-example
, edit the strings as necessary without altering the variable names.
Then rename the file to config.py
, and leave it in the same directory.
First, the query parameters need to be defined. This is done by means of a dictionary, the keys for which are listed here as:
crops
sampStat
storage
coll.collMissId
available
art15
mlsStatus
sgsv
alias
acceNumb
institute.code
geo.elevation
geo.longitude
geo.latitude
orgCty.iso3
taxonomy.sciName
taxonomy.species
taxonomy.genus
taxonomy.subtaxa
id
duplSite
donorCode
institute.country.iso3
institute.country.iso2
institute.networks
inSgsv
taxonomy.genusSpecies
historic
seqNo
lists
Further information about these fields can be found here. Additionally, the institute codes, among other details, can be found on the WIEWS homepage.
Don't forget to include the class:
from GenesysParser import *
Using the above, the query parameters for a few types of tomato, at the institutes as seen below would be:
query_params = \
{
'institute.code':
[
'NLD037', # CGN
'USA003', # Geneva
'DEU146', # Gatersleben
'USA176', # Tomato Genetics Resource Center
],
'taxonomy.genus': ['Solanum', 'Lycopersicon'],
'taxonomy.species': ['lycopersicum', 'esculentum', 'sp.',
'pimpinellifolium', 'peruvianum'],
}
Following the definition of the query parameters, instantiation of the class is simple:
r = GenesysParser(query_params)
Once the query parameters have been specified, the query may be submitted to the site. There are two ways to do this, depending on one's needs.
r.submitReq()
will only fetch one page of results. The function takes two parameters:
r.submitReq(page, size)
, set by default to 1 (the first page of results) and 50 respectively.
50 is also the maximum number of result items the site allows per query, which means that queries
with a larger number of results must be submitted as multiple requests to the site.
Example uses: r.submitReq()
, r.submitReq(1)
,r.submitReq(size=10)
, r.submitReq(page=1, size=10)
.
The first and second examples both fetch the first page of results, with a maximum of 50 items.
The third and fourth examples also fetches the first page, but with a maximum result size of 10 items.
As many query results might be composed of multiple pages, the fetchall()
function is provided.
It will fetch as many pages as necessary to exhaust all results. As it is intended to provide a complete
list of results without an unnecessary number of queries, this function takes no page
or size
parameters.
Example use: r.fetchAll()
.
The result of a query submitted in the way described above is a list of ItemGenesys
objects.
The full
field of this class holds the MCPD of each result item in its entirety, and its sub-fields
are accessible as parts of a python dictionary.
The most commonly used fields are also accessible directly, as class attributes. These are:
genesysUUID, acqDate, accessionID, collectionDate, genus, species, collSite, instituteCode, aliases
It is worth mentioning that the acqDate, as a class attribute, undergoes some processing and is available
as a datetime.date object. In particular, as Genesys uses the '00' or '--' notation when a field of (year, month, day)
is unavailable, the class substitutes these with the defaults of (0001
, 01
, 01
) respectively.
For example, it is possible to print all accession IDs of the results in the two following ways:
for result_item in r.fetchAll():
print(result_item.full['acceNumb'])
or
for result_item in r.fetchAll():
print(result_item.accessionID)
It is also possible to print a result item, as a product of its most commonly used fields, as follows:
print(result_item)
which results in this description for PI 100697:
ItemGenesys(accessionID=PI 100697, collectionDate=1932-06-28, otherNames=[321], genus=Solanum, species=lycopersicum, instituteCode=USA003, collectionSite=None)
Further examples can be found at the end of GenesysParser.py
.
Several logging parameters can be adjusted in logging.json
. By default, all
messages are set to show as well as be recorded in the debug.log
and errors.log
files.
The curl
command to directly query the Genesys API using a
clientID
and clientSecret
for the accession number PI 100697
would look like:
curl -H 'Content-Type: application/json' -H 'Referer: http://ecpgr.cgn.wur.nl/eupotato/test.html' -X POST -d '{"filter": "{\"acceNumb\":[\"PI 100697\"]}"}' 'https://sandbox.genesys-pgr.org/webapi/v0/acn/filter?client_id=clientID&client_secret=clientSecret'
and produce a result like this.