embench/embench-iot

Design rationale for handling of --text (etc) options in benchmark_size.py

Roger-Shepherd opened this issue · 1 comments

I'm looking at how to modify benchmark_size.py to handle macho files as well as elf files. In addition to the processing of the files, works needs to be done on the parsing of the parameters.

In the current implementation the handling and logging is inconsistent between parameters. For example, if the script is run with no parameters given, the log says:

Supplied arguments
==================
	--builddir            : bd
	--logdir              : logs
	--baselinedir         : baseline-data
	--absolute            : False
	--output-format       : None
	--json-comma          : False
	--text                : []
	--data                : []
	--rodata              : []
	--bss                 : []
	--metric              : []

So, for example, --builddir shows the default for the parameter, but --text shows what was actually supplied.

The reason for this is that the handling of parameters is split between build_parser and validate_args with the logging occurring between the two:

# Parse arguments using standard technology
parser = build_parser()
args = parser.parse_args()

# Establish logging
setup_logging(args.logdir, 'size')
log_args(args)

# Check args are OK (have to have logging and build directory set up first)
validate_args(args)

The handling of many of the parameters is done by the python paring library but for some (e.g. --text) is done in validate_args. Specifically for --text the relevant part of build_parser is

# List arguments are empty by default, a user specified value then takes
# precedence. If the list is empty after parsing, then we can install a
# default value.
parser.add_argument(
    '--text',
    type=str,
    default=[],
    nargs='+',
    help='Section name(s) containing code'
)

and of validate_args is:

# Sort out the list of section names to use
gp['secnames'] = dict()

for argname in ['text', 'rodata', 'data', 'bss']:
    secnames = getattr(args, argname)
    if secnames:
        gp['secnames'][argname] = secnames
    else:
        gp['secnames'][argname] = ['.' + argname]

My understanding of how add_argument works is that

parser.add_argument(
    '--text',
    type=str,
    default=[.text'],
    nargs='+',
    help='Section name(s) containing code'
)

and

# Sort out the list of section names to use
gp['secnames'] = dict()

for argname in ['text', 'rodata', 'data', 'bss']:
    secnames = getattr(args, argname)
    gp['secnames'][argname] = secnames

would cause the same processing of the --text parameter to occur but the logging would show the defaults rather than what was on the commend line.

Is there a reason for this? My guess is that the command line parser is so complicated that it seemed easier to do it afterwards, but perhaps there is another reason. I'd like to know before I start making modifications to handle macho.

On reflection: Having looked more closely at the implications of modifying either structure, the original seems better in that, with the proposal I am following for supporting macho, the only change in build_parser is the addition of a --format parameter. All the knowledge of standard section names can be deferred to validate_args without the need for build_parser to have to understand the chosen file format.

Dealt with by #132