CLI arguments are inconsistent
Opened this issue · 6 comments
Now almost all command line arguments have a short and a long name. Most short names are undecipherable or misleading without reading through the help. Also, few argument names leak implementation details. We shouldn't expose short switches for more obscure functionality at all and the ones we are exposing should be as intuitive as possible.
I think we should double think about the names of these switches before 1.0:
-n
: It means dry-run in most command line programs and it may make sense to have a dry-run switch in the future, but right now it is configuring entropy calculation.-e
: Output location can be configured either via positional argument or-o
many times. As we are supporting arbitrary input files, we shouldn't use a positional argument though.--process-num
/-p
: There is no generally established name here. Some times-j
is used to denote "jobs", or-p
to indicate parallelism. I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actualprocess_file
function too)--show-external-dependencies
this is an odd one we agreed about in the past that it doesn't really seem right and may make sense to use a subcommand instead but we didn't want to introduce them just for this functionality. If we are to use subcommands elsewhere, we should reconsider this as well
Suggested help output (for existing functionality)
$ unblob --help
Usage: unblob [OPTIONS] FILES...
A tool for getting information out of any kind of binary blob.
See '--show-external-dependencies' for details on required 3rd party tools.
Common Options:
--help Show this message and exit.
-v, --verbose Verbosity level, counting, maximum level: 3
(use: -v, -vv, -vvv)
-V, --version
Extraction Options:
-d, --depth INTEGER RANGE Recursion depth. How deep should we extract
containers. [default: 10; x>=1]
-j, --jobs INTEGER RANGE
Number of jobs to process files
parallelly. [default: 8; x>=1]
-o, --output DIRECTORY Extract the files to this directory. Will be
created if doesn't exist.
-P, --plugins-path PATH Load plugins from the provided path.
Special Options:
--entropy-depth INTEGER RANGE
Entropy calculation depth. How deep should
we calculate entropy for unknown files? 1
means input files only, 0 turns it off.
[default: 1; x>=0]
--show-external-dependencies Shows commands need to be available for
unblob to work properly
Suggested help format containing potential future options
$ unblob --help
Usage: unblob [OPTIONS] FILES...
A tool for getting information out of any kind of binary blob.
See '--show-external-dependencies' for details on required 3rd party tools.
Common Options:
--help Show this message and exit.
-v, --verbose Verbosity level, counting, maximum level: 3
(use: -v, -vv, -vvv)
-V, --version
Extraction Options:
-f, --force Overwrite already existing output
-d, --depth INTEGER RANGE Recursion depth. How deep should we extract
containers. [default: 10; x>=1]
-j, --jobs INTEGER RANGE
Number of jobs to process files
parallelly. [default: 8; x>=1]
-o, --output DIRECTORY Extract the files to this directory. Will be
created if doesn't exist.
-P, --plugins-path PATH Load plugins from the provided path.
Special Options:
--extract-config [KEY=VALUE...] Control finer details of extraction process.
Plugins can register extra options.
See '--extract-config help' for details.
--show-external-dependencies Shows commands need to be available for
unblob to work properly
$ unblob --extract-config help
Usage: unblob --extract-config [KEY=VALUE...]
The following configuration options are available:
cleanup_output Remove intermediate files [...]
entropy_depth Entropy calculation depth. How deep should
we calculate entropy for unknown files? 1
means input files only, 0 turns it off.
[default: 1; x>=0]
ignore_magic Do not extract files matching to the given magic
Potential future direction with subcommands
Currently we have only the default implicit extract
and the somewhat clunky show-external-dependencies
functionality which doesn't warrant the addition of subcommands. If we want to add in the future e.g. Forcing a given extractor to a file, it could make sense to add subcommands (I know, I know, unpack and extract are awful verbs to use together...)
$ unblob --help
Usage: unblob [SUBCOMMAND]
Options:
--help Show this message and exit.
-v, --verbose Verbosity level, counting, maximum level: 3
(use: -v, -vv, -vvv)
-V, --version
Subcommands:
extract Extracts a binary blob. Default command if unspecified.
unpack Unpacks a file using the specified extractor
help Show this help, or the help of the given subcommand
Can you show us your ideal --help
output ? :)
I think the -e
probably comes from our usage of binwalk in the past, I'd gladly change it to -o / --output-dir
which would fall in line with the CLI of most of our extractors (7zip, jefferson, ...). Note: if we do so, we also have to change that option in unromfs.
We can decide to reserve -n
for future use (dry-run switch) and move the entropy calculation switch to something more meaningful. The symbol for entropy is S, so maybe use -s / --entropy-depth
?
I don't have opinions on --show-external-dependencies
at the moment. I'm not against it.
Regarding -p
:
I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actual process_file function too)
You would keep the option and hide it from the --help section ? Or just hide it from the general documentation (README, wiki) ? In terms of options both -j
and -p
work for me.
Can you show us your ideal
--help
output ? :)I think the
-e
probably comes from our usage of binwalk in the past, I'd gladly change it to-o / --output-dir
which would fall in line with the CLI of most of our extractors (7zip, jefferson, ...). Note: if we do so, we also have to change that option in unromfs.
Interestingly, binwalk's use of -e
is different from ours, they have -C
˙for the output directory.
We can decide to reserve
-n
for future use (dry-run switch) and move the entropy calculation switch to something more meaningful. The symbol for entropy is S, so maybe use-s / --entropy-depth
?
I don't think that we need a short name for that
I don't have opinions on
--show-external-dependencies
at the moment. I'm not against it.Regarding
-p
:I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actual process_file function too)
You would keep the option and hide it from the --help section ? Or just hide it from the general documentation (README, wiki) ? In terms of options both
-j
and-p
work for me.
I'd probably rename it to --parallel(ism)
to convey the meaning without making the exact mode of parallelism out of the API. E.g. if we were decide that threads are enough, I don't want to rename this switch.
If there is a general consensus that we should rework the switches I am willing to design it :) At this stage it is more of a question to see if you are agreeing with my assessment.
Maybe we should also add a --version argument
Updated OP with suggested changes in help, also containing a potential future direction to add extensible extract configuration options without cluttering the core UX
Maybe we should also add a --version argument
This flag is available with the latest version (23.8.11).