Tokenizer doesn't parse Volume/issue typeset with no space.
Opened this issue · 3 comments
Germane to #23 : when given such a reference :
1. Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev. 1988;10:1‑28.
the current parser tokenizes 1988;10:1‑28
as a whole and assigns it to Volume/Issue
. It should be approximately
Token | Value |
---|---|
Year | 1998 |
Volume | 10 |
Pages | 1-28 |
Worse case :
2. Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al. Biomechanical considerations in the pathogenesis of osteoarthritis of the knee. Knee Surg Sports Traumatol Arthrosc. mars 2012;20(3):423‑35.
Is parsed as :
Token | Value |
---|---|
Citation number | 2 |
[ Author | Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al. |
Title | Biomechanical considerations in the pathogenesis of osteoarthritis of the knee |
Journal | Knee Surg Sports Traumatol Arthrosc mars |
Date | 2012 |
Volume/Issue | 20(3):423‑35 |
Again, the whoele Volume/issue token isn't parsed for punctuation. I would expect :
Token | Value |
---|---|
Citation number | 2 |
[ Author | Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al. |
Title | Biomechanical considerations in the pathogenesis of osteoarthritis of the knee mars |
Journal | Knee Surg Sports Traumatol Arthrosc |
Date | mars 2012 |
Volume/Issue | 20(3) |
Pages | 423‑35 |
Recognizing mars 2012
is probably harder...
HTH,
Thanks. We should add the examples to the the volume normalizer test cases.
Thanks. We should add the examples to the the volume normalizer test cases.
Are you interested by larger dubious test cases, ? have some of them on hand ;-]...
Definitely. Especially if you could provide them in the test case format.
Basically you'd want something like:
'2012;20(3):423‑35.' => { volume: ['20'], issue: ['3'], date: ['2012'], pages: ['423-35'] }
'1988;10:1‑28.' => { volume: ['10'], issue: ['1' ], pages: ['423-35'] }
And similarly:
'mars 2012;20(3):423‑35.' => { volume: ['20'], issue: ['3'], date: ['2012-03'], pages: ['423-35'] }
Though this one has another aspect to it. Here we should also add some samples using this style to the core training data.
For example:
<citation-number>2.</citation-number>
<author>Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.</author>
<title>Biomechanical considerations in the pathogenesis of osteoarthritis of the knee.</title>
<journal>Knee Surg Sports Traumatol Arthrosc.</journal>
<volume>mars 2012;20(3):423‑35.</volume>