Tagging mixed number as #Value
track0x1 opened this issue · 5 comments
Mixed numbers are a common way to express a value like ‘1-1/2 cups’ sometimes without the hyphen separator ‘1 1/2 cups’. When I used compromise v11 I was able to make a plugin with a regex to try and tag these as #Value but it doesn’t seem to work in the latest release. Because it’s so common should this be out of the box tagging?
My purpose here is to match all types of values (including mixed number values) for capturing.
hey Tom, yep - if I remember we still do some of this number-range stuff out of the box, but shied-away from some of it that resembled algebra or subtraction. This is a real doozie, and I agree it's a cool thing to opt-in to, and we should support any unambiguous 'and a half' stuff as much as we can.
You can see some of the fractions tests we pass, and avoid for this here, PRs welcome if you can improve on it, in any way.
ps i enjoyed your blog.
cheers
@spencermountain Thank you Spencer! I just realized something that looks like a bug. When 15-ounce
is wrapped in parentheses it's tagged as a single term and resultantly has the wrong tags.
> nlp('15-ounce (15-ounce)').debug()
┌─────────
│ '15' - Value, Cardinal, NumericValue, Hyphenated
│ 'ounce' - Noun, Unit, Singular, Hyphenated
│ '15-ounce' - Infinitive, Verb, PresentTense
sidebar: is there a way we can convert verbose number ranges (2 to 3) to hyphenated number ranges (2-3)? that would enable me to tap into the same #NumberRange tag for a match.
> nlp('2 to 3 people').debug()
┌─────────
│ '2' - Value, Cardinal, NumericValue
│ 'to' - Conjunction
│ '3' - Value, Cardinal, NumericValue
│ 'people' - Noun, Plural, Actor
> nlp('2-3 people').debug()
┌─────────
│ '[2]' - Value, Cardinal, NumericValue, NumberRange
│ '[to]' - Conjunction, NumberRange
│ '[3]' - Value, Cardinal, NumericValue, NumberRange
│ 'people' - Noun, Plural, Actor
edit: also happy to split these concerns into separate issues/discussions if you prefer
hey Tom, apologies for the delay.
yeah, there's an ugly way:
let doc = nlp('2 to 3 people')
let { before, prep } = doc.match('[<before>#Value] [<prep>to] #Value').groups()
before.post('') //remove '2' whitespace
doc.match(prep).replaceWith('-').post('') //remove '-' whitespace
console.log(doc.text()) //2-3 people
in short, some of this is weird. You may benefit from using replace() with some term methods like @hasDash
or @hasHyphen
This nlp('15-ounce (15-ounce)').debug()
one is a doozie. Haven't got it yet, but will.
hey @track0x1 , this is fixed in 14.12.0
:
let doc = nlp('10-ounce (12-ounce)')
doc.terms().length // 4
cheers