sharplispers/parse-number

incorrect results when parsing a real with trailing spaces

Closed this issue · 5 comments

The following constitutes a bug, I believe:

CL-USER> (parse-number:parse-number "2.56 ")
2.056

Note that there is a trailing space in the string. Either one should expect an error ('trailing junk'), if parse-number really requires exact :start and :end bounds, or the trailing space should be ignored.

This is with parse-number-1.3 (from not-so-old quicklisp) on SBCL 1.2.0

On further inspection, this can be traced to

CL-USER> (parse-number:parse-number "2.56 ")
0: (ORG.MAPCAR.PARSE-NUMBER:PARSE-POSITIVE-REAL-NUMBER "2.56 " :START 0 :END
5 :RADIX 10)
1: (ORG.MAPCAR.PARSE-NUMBER::PARSE-INTEGERS "2.56 " 0 5 (1) :RADIX 10)
2: (PARSE-INTEGER "2.56 " :START 0 :END 1 :RADIX 10)
2: PARSE-INTEGER returned 2 1
2: (PARSE-INTEGER "2.56 " :START 2 :END 5 :RADIX 10)
2: PARSE-INTEGER returned 56 5
1: ORG.MAPCAR.PARSE-NUMBER::PARSE-INTEGERS returned (2 . 1) (56 . 3)
0: ORG.MAPCAR.PARSE-NUMBER:PARSE-POSITIVE-REAL-NUMBER returned 2.056
2.056

i.e. parse-integer returns '3 characters consumed', because it consumes the trailing whitespace.
It seems impossible to query the CL readtable in a portable way for whitespace characters. It seems easy enough (and safe) to compute the length of the number in the appropriate radix and use that for computing the fractional part, though.

Seems coherent with cl:parse-integer which also ignores both leading and trailing whitespace, don't you agree?

I'm confused about your readtable remark, though. What role does the readtable play in this context? (I thought it didn't play any.)

I agree that whitespace should be ignored, like cl:parse-integer does. Currently, the trailing whitespace is contributing to the length string of digits for the fractional part, which is wrong. This is due to
CL-USER> (parse-integer "56")
56
2
CL-USER> (parse-integer "56 ")
56
3
i.e., cl:parse-integer returning the integer parsed together with the total number of characters including whitespace. parse-number then uses that number-of-characters to compute the denominator for the fractional part.

WRT readtable: I assumed that since parse-number cites CLHS number parsing as its reference, the decision what constitutes a whitespace character should be governed (in first approximation) by the current readtable. This is obviously debatable, and maybe a dynamic variable like *read-number-whitespace-characters* is a canonical simple solution.

I see. Do you have a patch to go with this bug report? :-)

Not yet. I'll fork and do a pull-request if I find time to do so. Currently I just work around it on the calling end by proper trimming.