GrammaticalFramework/GF

Parsing arbitrary numbers

Closed this issue · 3 comments

This seem silly but the hardest thing I've come across so far is how to parse arbitrary numbers in GF!
For example:
"cats have 4 legs".
"there are twenty people in the room"
I've found Numerals.gf in the lib, but for the life of me I can't figure out how to use it. I compiles fine but then fails to parse any numbers.

hello @robclouth

which module are you importing?

> i present/LangEng.gfo
Languages: LangEng
Lang> p "cats have 4 dogs"
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (PredVP (DetCN (DetQuant IndefArt NumPl) (UseN cat_N)) (ComplSlash (SlashV2a have_V2) (DetCN (DetQuant IndefArt (NumCard (NumDigits (IDig D_4)))) (UseN dog_N)))))) NoVoc
Lang> p "there are twenty cats in the roof"
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (ExistNP (AdvNP (DetCN (DetQuant IndefArt (NumCard (NumNumeral (num (pot2as3 (pot1as2 (pot1 n2))))))) (UseN cat_N)) (PrepNP in_Prep (DetCN (DetQuant DefArt NumSg) (UseN roof_N))))))) NoVoc
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (ExistNP (DetCN (DetQuant IndefArt (NumCard (NumNumeral (num (pot2as3 (pot1as2 (pot1 n2))))))) (AdvCN (UseN cat_N) (PrepNP in_Prep (DetCN (DetQuant DefArt NumSg) (UseN roof_N)))))))) NoVoc
PhrUtt NoPConj (UttS (UseCl (TTAnt TPres ASimul) PPos (ExistNPAdv (DetCN (DetQuant IndefArt (NumCard (NumNumeral (num (pot2as3 (pot1as2 (pot1 n2))))))) (UseN cat_N)) (PrepNP in_Prep (DetCN (DetQuant DefArt NumSg) (UseN roof_N)))))) NoVoc

(I've changed your sentences slightly just so they fit the GF test lexicon).

Came here to say exactly what @odanoburu said :) It works out of the box in the Lang module.

Just a small addition: if you parse multi-digit numerals in the normal GF shell (i.e. no C runtime, not through an external program that uses e.g. Python/Java/... bindings), then you need to insert the bind token &+ between the digits. Here's an example:

Lang> p -cat=NP "4 dogs"
DetCN (DetQuant IndefArt (NumCard (NumDigits (IDig D_4)))) (UseN dog_N)

Lang> p -cat=NP "40 dogs"
The parser failed at token 1: "40"

Lang> p -cat=NP "4 &+ 0 dogs"
DetCN (DetQuant IndefArt (NumCard (NumDigits (IIDig D_4 (IDig D_0))))) (UseN dog_N)

Lang> p -cat=NP "four hundred and thirty &+ - &+ three dogs"
DetCN (DetQuant IndefArt (NumCard (NumNumeral (num (pot2as3 (pot2plus (pot0 n4) (pot1plus n3 (pot0 n3)))))))) (UseN dog_N)

If you linearise such a tree, you can use the -bind flag as an argument for l:

Lang> l DetCN (DetQuant IndefArt (NumCard (NumDigits (IIDig D_4 (IIDig D_0 (IDig D_0)))))) (UseN dog_N)
4 &+ 0 &+ 0 dogs

Lang> l -bind  DetCN (DetQuant IndefArt (NumCard (NumDigits (IIDig D_4 (IIDig D_0 (IDig D_0)))))) (UseN dog_N)
400 dogs

Ahhh! Thanks so much. I've been using AllEng.gf. I hadn't encountered LangEng.gf.