CottageLabs/OpenArticleGauge

Unicode characters cause 500 error

Opened this issue · 1 comments

When running the following batch (presented as the body of a POST) the service throws a 500. This is because of the identifier in position 8, which includes some kind of unicode character.

You can reproduce this by putting the below string in a variable called data and running

requests.post("http://howopenisit.org/lookup", headers={'Accept':'application/json'}, data=data)

'["10.4404/hystrix-19.1-4412", "10.1590/S1806-66902011000400029", "10.2298/TSCI120109120P", "10.1590/S0004-28032003000400008", "10.1186/1532-429X-13-S1-O85", "10.1590/S1516-84842005000400011", "10.1155/2012/376381", "10.1186/1754-0429-1-11", "10.5824/1309\u20101581.2013.1.002.x", "10.1051/shsconf/20120200028", "10.5232/ricyde2014.03501", "10.1186/1477-7819-11-27", "10.1155/2010/562356", "10.1590/S0100-41582005000100019", "10.5747/ce.2009.v01.n1.e003", "10.4236/msa.2013.45037", "10.4000/confins.7485", "10.7763/IJIET.2013.V3.305", "10.1590/S1676-26492008000300003", "10.1590/S0102-37722011000100008", "10.1590/S1679-62252008000100005", "10.1590/S1516-14982011000100007", "10.1186/1471-2164-13-9", "10.1590/S0006-87051952000300009", "10.5380/dma.v30i0.33029", "10.1186/1751-0147-49-13", "10.3390/s101210778", "10.1186/1758-2652-14-S1-S7", "10.5539/ells.v2n4p39", "10.1590/S0100-15742000000300010", "10.4236/am.2013.411A2005", "10.2298/SOC0504289D", "10.3144/expresspolymlett.2013.71", "10.4061/2010/525862", "10.7714/cnps/8.1.203", "10.1155/2013/279593", "10.5194/acp-12-1527-2012", "10.1186/1471-230X-11-119", "10.1590/S0104-42302003000400003", "10.3732/apps.1200499", "10.1186/1741-7007-6-47", "10.3390/md12052877", "10.5430/jst.v1n3p155", "10.1186/1471-2334-12-S1-P48", "10.5539/jpl.v4n1p109", "10.1186/1743-7075-4-2", "10.1186/1471-2407-10-442", "10.1186/1471-2164-10-396", "10.1590/S0101-81752003000200018", "10.1051/kmae:1974013", "10.3390/s140203342", "10.1186/1475-2875-11-8", "10.1051/epjconf/20147506001", "10.2298/VSP1307675S", "10.1186/1743-422X-6-214", "10.5194/nhess-12-3241-2012", "10.1590/S0103-84782003000300029", "10.1186/1471-2296-10-68", "10.1590/S0004-282X2001000100004", "10.1186/1746-4358-1-22", "10.4236/jbpc.2011.22011", "10.1107/S1600536808005655", "10.1186/1752-0509-5-S3-S4", "10.1590/S1516-35982009001000030", "10.1186/1471-2474-11-158", "10.1590/S0103-20702005000100012", "10.1186/1532-429X-13-S1-P302", "10.1186/1743-422X-10-27", "10.1186/1471-244X-10-69", "10.1107/S1600536810019914", "10.1186/1748-5908-2-24", "10.1186/1471-2148-5-44", "10.5922/2079-8555-2012-3-4", "10.1590/S0103-17592003000200001", "10.3389/fnsys.2010.00006", "10.5194/nhess-12-587-2012", "10.1155/2012/531982", "10.4236/ss.2013.41012", "10.3390/molecules19043898", "10.1186/1465-9921-13-73", "10.1186/1471-244X-7-7", "10.5194/acp-12-7015-2012", "10.3846/1822-430X.2009.17.3.92-102", "10.3390/cancers2020397", "10.1590/S0103-84782011001200006", "10.1186/1471-2156-4-S1-S43", "10.1590/S1414-81452006000200015", "10.1186/1477-5956-10-4", "10.1051/proc/201343017", "10.1155/2013/203719", "10.3989/arbor.2014.765n1003", "10.4401/ag-5283", "10.1186/1477-7819-9-128", "10.1186/2040-2392-1-14", "10.4236/jbnb.2011.225058", "10.1155/2014/649756", "10.3390/i8050433", "10.1155/2009/372548", "10.1590/S1415-43662007000200011", "10.2423/i22394303v2n1p121"]'

For info, I just checked the DOI spec, and it does support unicode, so OAG needs to be able to do so.