onekey-sec/unblob

CLI arguments are inconsistent

Opened this issue · 6 comments

vlaci commented

Now almost all command line arguments have a short and a long name. Most short names are undecipherable or misleading without reading through the help. Also, few argument names leak implementation details. We shouldn't expose short switches for more obscure functionality at all and the ones we are exposing should be as intuitive as possible.

I think we should double think about the names of these switches before 1.0:

  • -n: It means dry-run in most command line programs and it may make sense to have a dry-run switch in the future, but right now it is configuring entropy calculation.
  • -e: Output location can be configured either via positional argument or -o many times. As we are supporting arbitrary input files, we shouldn't use a positional argument though.
  • --process-num/-p: There is no generally established name here. Some times -j is used to denote "jobs", or -p to indicate parallelism. I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actual process_file function too)
  • --show-external-dependencies this is an odd one we agreed about in the past that it doesn't really seem right and may make sense to use a subcommand instead but we didn't want to introduce them just for this functionality. If we are to use subcommands elsewhere, we should reconsider this as well

Suggested help output (for existing functionality)

$ unblob --help
Usage: unblob [OPTIONS] FILES...

  A tool for getting information out of any kind of binary blob.

  See '--show-external-dependencies' for details on required 3rd party tools.

Common Options:
  --help                          Show this message and exit.
  -v, --verbose                   Verbosity level, counting, maximum level: 3
                                  (use: -v, -vv, -vvv)
  -V, --version

Extraction Options:
  -d, --depth INTEGER RANGE       Recursion depth. How deep should we extract
                                  containers.  [default: 10; x>=1]
  -j, --jobs INTEGER RANGE
                                  Number of jobs to process files
                                  parallelly.  [default: 8; x>=1]
  -o, --output DIRECTORY          Extract the files to this directory. Will be
                                  created if doesn't exist.
  -P, --plugins-path PATH         Load plugins from the provided path.


Special Options:
  --entropy-depth INTEGER RANGE
                                  Entropy calculation depth. How deep should
                                  we calculate entropy for unknown files? 1
                                  means input files only, 0 turns it off.
                                  [default: 1; x>=0]
  --show-external-dependencies    Shows commands need to be available for
                                  unblob to work properly

Suggested help format containing potential future options

$ unblob --help
Usage: unblob [OPTIONS] FILES...

  A tool for getting information out of any kind of binary blob.

  See '--show-external-dependencies' for details on required 3rd party tools.

Common Options:
  --help                          Show this message and exit.
  -v, --verbose                   Verbosity level, counting, maximum level: 3
                                  (use: -v, -vv, -vvv)
  -V, --version

Extraction Options:
  -f, --force                     Overwrite already existing output
  -d, --depth INTEGER RANGE       Recursion depth. How deep should we extract
                                  containers.  [default: 10; x>=1]
  -j, --jobs INTEGER RANGE
                                  Number of jobs to process files
                                  parallelly.  [default: 8; x>=1]
  -o, --output DIRECTORY          Extract the files to this directory. Will be
                                  created if doesn't exist.
  -P, --plugins-path PATH         Load plugins from the provided path.

Special Options:
  --extract-config [KEY=VALUE...] Control finer details of extraction process.
                                  Plugins can register extra options.
                                  See '--extract-config help' for details.
  --show-external-dependencies    Shows commands need to be available for
                                  unblob to work properly

$ unblob --extract-config help
Usage: unblob --extract-config [KEY=VALUE...]

The following configuration options are available:

  cleanup_output                  Remove intermediate files [...]
  entropy_depth                   Entropy calculation depth. How deep should
                                  we calculate entropy for unknown files? 1
                                  means input files only, 0 turns it off.
                                  [default: 1; x>=0]
  ignore_magic                    Do not extract files matching to the given magic

Potential future direction with subcommands

Currently we have only the default implicit extract and the somewhat clunky show-external-dependencies functionality which doesn't warrant the addition of subcommands. If we want to add in the future e.g. Forcing a given extractor to a file, it could make sense to add subcommands (I know, I know, unpack and extract are awful verbs to use together...)

$ unblob --help
Usage: unblob [SUBCOMMAND]

Options:
  --help                          Show this message and exit.
  -v, --verbose                   Verbosity level, counting, maximum level: 3
                                  (use: -v, -vv, -vvv)
  -V, --version

Subcommands:
  extract                         Extracts a binary blob. Default command if unspecified.
  unpack                          Unpacks a file using the specified extractor
  help                            Show this help, or the help of the given subcommand

Can you show us your ideal --help output ? :)

I think the -e probably comes from our usage of binwalk in the past, I'd gladly change it to -o / --output-dir which would fall in line with the CLI of most of our extractors (7zip, jefferson, ...). Note: if we do so, we also have to change that option in unromfs.

We can decide to reserve -n for future use (dry-run switch) and move the entropy calculation switch to something more meaningful. The symbol for entropy is S, so maybe use -s / --entropy-depth ?

I don't have opinions on --show-external-dependencies at the moment. I'm not against it.

Regarding -p:

I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actual process_file function too)

You would keep the option and hide it from the --help section ? Or just hide it from the general documentation (README, wiki) ? In terms of options both -j and -p work for me.

vlaci commented

Can you show us your ideal --help output ? :)

I think the -e probably comes from our usage of binwalk in the past, I'd gladly change it to -o / --output-dir which would fall in line with the CLI of most of our extractors (7zip, jefferson, ...). Note: if we do so, we also have to change that option in unromfs.

Interestingly, binwalk's use of -e is different from ours, they have -C˙for the output directory.

We can decide to reserve -n for future use (dry-run switch) and move the entropy calculation switch to something more meaningful. The symbol for entropy is S, so maybe use -s / --entropy-depth ?

I don't think that we need a short name for that

I don't have opinions on --show-external-dependencies at the moment. I'm not against it.

Regarding -p:

I think we shouldn't indicate on our public API that we are using processes for parallelism. (It is also true for the actual process_file function too)

You would keep the option and hide it from the --help section ? Or just hide it from the general documentation (README, wiki) ? In terms of options both -j and -p work for me.

I'd probably rename it to --parallel(ism) to convey the meaning without making the exact mode of parallelism out of the API. E.g. if we were decide that threads are enough, I don't want to rename this switch.

vlaci commented

If there is a general consensus that we should rework the switches I am willing to design it :) At this stage it is more of a question to see if you are agreeing with my assessment.

Maybe we should also add a --version argument

vlaci commented

Updated OP with suggested changes in help, also containing a potential future direction to add extensible extract configuration options without cluttering the core UX

Maybe we should also add a --version argument

This flag is available with the latest version (23.8.11).