inukshuk/anystyle

Tokenizer doesn't parse Volume/issue typeset with no space.

Opened this issue · 3 comments

Germane to #23 : when given such a reference :

1.	Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev. 1988;10:1‑28. 

the current parser tokenizes 1988;10:1‑28 as a whole and assigns it to Volume/Issue. It should be approximately

Token Value
Year 1998
Volume 10
Pages 1-28

Worse case :

2.	Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al. Biomechanical considerations in the pathogenesis of osteoarthritis of the knee. Knee Surg Sports Traumatol Arthrosc. mars 2012;20(3):423‑35. 

Is parsed as :

Token Value
Citation number 2
[ Author Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.
Title Biomechanical considerations in the pathogenesis of osteoarthritis of the knee
Journal Knee Surg Sports Traumatol Arthrosc mars
Date 2012
Volume/Issue 20(3):423‑35

Again, the whoele Volume/issue token isn't parsed for punctuation. I would expect :

Token Value
Citation number 2
[ Author Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.
Title Biomechanical considerations in the pathogenesis of osteoarthritis of the knee mars
Journal Knee Surg Sports Traumatol Arthrosc
Date mars 2012
Volume/Issue 20(3)
Pages 423‑35

Recognizing mars 2012 is probably harder...

HTH,

Thanks. We should add the examples to the the volume normalizer test cases.

Thanks. We should add the examples to the the volume normalizer test cases.

Are you interested by larger dubious test cases, ? have some of them on hand ;-]...

Definitely. Especially if you could provide them in the test case format.

Basically you'd want something like:

'2012;20(3):423‑35.' => { volume: ['20'], issue: ['3'], date: ['2012'], pages: ['423-35'] }
'1988;10:1‑28.' => { volume: ['10'], issue: ['1' ], pages: ['423-35'] }

And similarly:

'mars 2012;20(3):423‑35.' => { volume: ['20'], issue: ['3'], date: ['2012-03'], pages: ['423-35'] }

Though this one has another aspect to it. Here we should also add some samples using this style to the core training data.

For example:

<citation-number>2.</citation-number>
<author>Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.</author>
<title>Biomechanical considerations in the pathogenesis of osteoarthritis of the knee.</title>
<journal>Knee Surg Sports Traumatol Arthrosc.</journal>
<volume>mars 2012;20(3):423‑35.</volume>