iDigBio/idb-backend

download api mishandles request when POSTing json and fails to see rq/mq

Opened this issue · 3 comments

Attempting to POST to the download api will return a message of:

$ curl -X POST -H "Content-Type: application/json" --data @query.json http://192.168.0.168:19197/v2/download/
{
  "error": "Please supply at least one query paramter (rq,mq)"
}

$ cat query.json 
{"rq":{"scientificname":["shortia"]}}

This exact same json chunk works just fine when sent as a GET request.

This happens for two reasons.

https://github.com/iDigBio/idb-backend/blob/master/idb/data_api/v2_download.py#L53

Reason 1 is that when type checking o[k] when k == 'rq', it's a dict and so this means both the list and string checks fail. The loop then exits and returns the message above.

Reason 2 is when you attempt to pass the POST data without specifying the application/json header. The flask framework (including the older v0.12 that we're using) behaves differently if that header is not present:

https://flask.palletsprojects.com/en/0.12.x/api/

get_json(force=False, silent=False, cache=True)
Parses the incoming JSON request data and returns it. By default this function will return None if the mimetype is not application/json but this can be overridden by the force parameter. If parsing fails the on_json_loading_failed() method on the request object will be invoked.

So if you leave out the application/json header, get_json() returns None as per documentation. Code then grabs from request.form. then if k in o returns false for rq. This is because in this path, type(o) is still a dict as in problem 1 but it has the 'rq' object/string somehow shoved in as a key with an empty utf8 value, so if k in o fails and it never gets to the point of trying to convert the value in the first place (so it doesn't hit problem 1 above).

(I couldn't get a good textual output of this, so here's an image)
debug-o-with-json-string-as-key

(versus how o is represented in the path of problem 1)

{u'rq': {u'scientificname': [...]}}
special variables:
function variables:
u'rq': {u'scientificname': [u'shortia']}
len(): 1

That interface doesn't accept JSON it uses form data as input.

rq=%7B%22genus%22%3A%22hadronyche%22%7D&email=wilsotc%40ufl.edu

curl 'https://api.idigbio.org/v2/download'
-H 'Connection: keep-alive'
-H 'Accept: /'
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36'
-H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8'
-H 'Origin: http://portal.idigbio.org'
-H 'Sec-Fetch-Site: cross-site'
-H 'Sec-Fetch-Mode: cors'
-H 'Sec-Fetch-Dest: empty'
-H 'Referer: http://portal.idigbio.org/'
-H 'Accept-Language: en-US,en;q=0.9'
--data-raw 'rq=%7B%22genus%22%3A%22hadronyche%22%7D&email=wilsotc%40ufl.edu'
--compressed

Thanks for checking. I didn't have time to get back to it yet and I was hoping I was missing something simple.

Might be possible to do a quick fix to allow it to accept json too. Or else we can check for json and 400-something it with a message to insist the user submit form encoding.