Issue with regexes in Python 2.7.5 causes tests to fail
Closed this issue · 0 comments
While working on #14, I found another Python 2.7.5 bug. 40 tests fail while a Unit's template is trying to compile a regular expression, with the error:
error: nothing to compile
Here's an example of a regex that leads to this error:
(?P<parts>(?:[A-Za-z]+|[^A-Za-z0-9]*)+)
From what I understand, the problem is that the expression in the innermost parentheses could lead to a non-match, yet it's followed by a + which requires a match. I think Python versions 2.7.6 and later allow this, while 2.7.5 doesn't. (These tests only fail on 2.7.5.)
All failing tests are for Units AlphaSymbol
, NumericSymbol
, and AlphaNumericSymbol
, all CompoundUnit
types that use the simple Formatting
unit as a component. I've traced the underlying problem to the fact that the Formatting
unit allows 0 or more matches. In most call numbers formatting is treated as optional, so originally this seemed desirable. But, in retrospect, having it match nothing by default seems counterintuitive—and unnecessary, since you can still set formatting components as optional on an individual basis. Changing the default min_length
from 0 to 1 does solve the nothing to compile
error, but it breaks some of the tests testing the current behavior. Fixing it will be a matter of untangling that web and making sure none of the more complex callnumber types rely on that behavior to function.