attardi/wikiextractor

Cannot turn off --html-safe command line option (true by default)

adno opened this issue · 0 comments

adno commented

Due to a bug, the only way to turn off the --html-safe command line option is passing an empty argument (that evaluates as false in Python) like this:

wikiextractor --html-safe ""

The following does not work :

wikiextractor --no-html-safe
wikiextractor --html-safe false

The argument is currently defined like this:

groupP.add_argument("--html-safe", default=True,
help="use to produce HTML safe output within <doc>...</doc>")

This means that any parameter is converted to string, and then evaluates as true unless empty. One simple way of correctly defining a boolean argument with default true value would be:

parser.add_argument("--html-safe", default=True, action=argparse.BooleanOptionalAction,
                        help="use to produce HTML safe output within <doc>...</doc>")

This way the parser would accept both --html-safe and --no-html-safe and also generate appropriate help.