mefistotelis/pylabview

decouple library (modules/classes/etc.) from command-line interface (argparse)

jimkring opened this issue · 8 comments

Great tool!

I noticed that the tests are mocking up CLI calls and then invoking the main function/entrypoint (readRSRC_main()) of the pylabview library.

I had a thought... would it make sense to decouple the the various classes like VI from the argparse object po (and the command-line usage, in general)?

Currently, it's a bit tricky to the the library. I want to be able to read a VI's saved-in labview version. The call would look like:

from pylabview.LVrsrcontainer import VI

vi_path = "/path/to/my.vi"

with open(vi_path, "rb") as rsrc:
    vi = VI(po, rsrc_fh=rsrc)
    lv_version = VI(po, rsrc_fh=vi_file_object).getFileVersion()

print("labview version: ", lv_version)

However, we're missing the bit of magic in the po argument. What is this?

We can see by looking at the test example code:

# Extract the VI file
command = [os.path.join("pylabview", "readRSRC.py"), "-vv", "-x", "-i", rsrc_inp_fn, "-m",
           os.sep.join([single_vi_path_extr1, xml_fn])]
with patch.object(sys, 'argv', command):
    readRSRC_main()
    
# Re-create the VI file
command = [os.path.join("pylabview", "readRSRC.py"), "-vv", "-c", "-m", os.sep.join([single_vi_path_extr1, xml_fn]),
           "-i", rsrc_out_fn]
with patch.object(sys, 'argv', command):
    readRSRC_main()
    
# Extract the LLB file
command = [os.path.join("pylabview", "readRSRC.py"), "-vv", "-x", "--keep-names", "-i", rsrc_inp_fn, "-m", os.sep.join([single_vi_path_extr1, xml_fn])]
with patch.object(sys, 'argv', command):
    readRSRC_main()
# Re-create the LLB file
command = [os.path.join("pylabview", "readRSRC.py"), "-vv", "-c", "-m", os.sep.join([single_vi_path_extr1, xml_fn]), "-i", rsrc_out_fn]
with patch.object(sys, 'argv', command):
    readRSRC_main()

Then, in readRSRC_main() it works by parsing those arguments using something like this:

import argparse

# Import the functions to be tested
from pylabview.readRSRC import main as readRSRC_main

# create an argument parser object
parser = argparse.ArgumentParser()
# define various arguments
parser.add_argument('-i', '--rsrc', '--vi', default="", type=str, help="name of the LabView RSRC file, VI or other")
# parse arguments and generate an object (`po`) with the argument values as attributes
po = parser.parse_args()

then, it looks for these arguments as attributes of the of the po object.

if len(po.rsrc) == 0:
    raise FileNotFoundError("Only RSRC file listing is currently supported.")

if (po.verbose > 0):
    print("{}: Starting file parse for RSRC listing".format(po.rsrc))
with open(po.rsrc, "rb") as rsrc_fh:
    vi = VI(po, rsrc_fh=rsrc_fh, text_encoding=po.textcp)

inside of the VI class, the po object is passed along into the constructors of other classes (shown in the various uses of self.po below):

    def readRSRCList(self, fh):
        """ Read all RSRC headers from input file and check their sanity.
            After this function, `self.rsrc_headers` is filled with a list of RSRC Headers.
        """
        rsrc_headers = []
        curr_rsrc_pos = -1
        next_rsrc_pos = 0
        while curr_rsrc_pos != next_rsrc_pos:
            curr_rsrc_pos = next_rsrc_pos
            fh.seek(curr_rsrc_pos)
            rsrchead = RSRCHeader(self.po)
            if fh.readinto(rsrchead) != sizeof(rsrchead):
                raise EOFError("Could not read RSRC {:d} Header.".format(len(rsrc_headers)))
            if self.po.print_map == "RSRC":
                self.rsrc_map.append( (fh.tell(), sizeof(rsrchead), \
                  "{}[{}]".format(type(rsrchead).__name__,len(rsrc_headers)),) )
            if (self.po.verbose > 2):
                print(rsrchead)
            if not rsrchead.checkSanity():
                raise IOError("RSRC {:d} Header sanity check failed.".format(len(rsrc_headers)))
            # The last header has offset equal to its start
            if rsrchead.rsrc_info_offset >= curr_rsrc_pos:
                next_rsrc_pos = rsrchead.rsrc_info_offset
            else:
                raise IOError("Invalid position of next item after parsing RSRC {:d} Header: {:d}".format(len(rsrc_headers),rsrchead.rsrc_info_offset))
            rsrc_headers.append(rsrchead)
        self.rsrc_headers = rsrc_headers
        return (len(rsrc_headers) > 0)

It seems that it there is a lot of coupling the the various classes like VI with the argparse object po.

For my use case of checking a VI's saved-in labview version, I can create my own po object and set it's attribute like this:

from pylabview.LVrsrcontainer import VI

vi_path = "/path/to/my.vi"

# create an object to hold the various parser attributes that normally come from use of argparse
class PO: pass
po = PO()

po.typedesc_list_limit = 4095
po.array_data_limit = (2 ** 28) - 1
po.store_as_data_above = 4095

# don't print output / note can't find way to suppress warnings
po.verbose = 0
# don't print the RSRC map / need to set this to empty string or will get a missing attribute error.
po.print_map = ""

with open(vi_path, "rb") as rsrc:
    vi = VI(po, rsrc_fh=rsrc)
    lv_version = VI(po, rsrc_fh=vi_file_object).getFileVersion()

print("labview version: ", lv_version)

I'm not 100% certain which attributes should be set on po for this call to simply get a VI's version such as -- I have to read through the code and do some trial and error.

It seems that passing the various po parameters as individual arguments to the constructors/methods would make the library very useful (and easy to use), independent from the command-line interface mode.

It seems that passing the various po parameters as individual arguments to the constructors/methods would make the library very useful (and easy to use), independent from the command-line interface mode.

The idea behind the po is to hide these parameters - it is on purpose. the po object hides parameters:

  • which are needed for debug only, ie. file names or verbosity level
  • which may potentially influence how some deep part of the parser works, ie. change how to interpret some names, or which code page to use

Supplying each function with separate params would mean:

  • interface of each function isn't clear - there are many parameters which are kind of add-ons/flags, and not the core parameters which one would expect
  • each time any new switch is needed, all the functions from top to bottom need to have the declarations changed to transfer that specific parameter

This means, the po object is kind of replacement for global variables - though it isn't changed during program operation, so it's not really a state, but an initial config object - program options.

It just so happens that the program options for command line tool are set - well, from the command line. It doesn't mean that having program options in one class is somehow linked to command line tools design. Actually, it is quite common to have Options or Settings class in any Object oriented language.

Which options to set - it might be a good idea to just set all of them. Some probably are not needed for your use case, but setting them doesn't really cost much.

Also, you may want to look at dataclass for clearer implementation.

The tests are mocking command line args - yes, these are supposed to be "black box" tests - verifying the scripts as the user would execute them. I just don't use subrocess as pytest isn't really compatible with that (and the tests would run much slower, especially on Windows where creating a process comes at high cost).

Thanks for the answers. That makes sense.

Here is what I think you're describing

from dataclasses import dataclass, field
from enum import Enum
from typing import IO, Any, Dict
from pylabview.LVrsrcontainer import VI

vi_path = "/path/to/my.vi"

@dataclass
class PrintMap(str, Enum):
    """Enum with string values for the `print_map` program option."""
    RSRC = "rsrc"
    NONE = ""

@dataclass
class ProgramOptions:
    """global configuration settings for the pylabview tool."""
    print_map: PrintMap = field(default=PrintMap.NONE, metadata={"help": "Print the RSRC map to stdout."})
    typedesc_list_limit: int = field(default=4095, metadata={"help": "Limit for the length of type descriptors."})
    array_data_limit: int = field(default=(2 ** 28) - 1, metadata={"help": "Limit for the length of array data."})
    store_as_data_above: int = field(default=4095, metadata={"help": "Store as data above this length."})
    verbose: int = field(default=0, metadata={"help": "Increase verbosity of output. 0 = quiet, 1 = normal, 2 = verbose."})
    rsrc: IO[Any] | None = field(default=None, metadata={"help": "file (IO) reference to the RSRC file."})

def get_vi_labview_version(vi_path: str) -> Dict:
    """return the labview version a VI is saved in"""
    
    with open(vi_path, "rb") as f:
        po = ProgramOptions(rsrc = f)
        vi = VI(po, rsrc_fh=po.rsrc)

    return vi.getFileVersion()

lv_version = get_vi_labview_version(vi_path)

print(lv_version)

Some ideas:

  1. Add ProgramOptions and related types (enums, etc.) to the pylabview module and use them in the code/examples.

  2. For the command-line construct a ProgramOptions instance from an argparse parser instance with a classmethod constructor like this:

@dataclass
class ProgramOptions:

   ...

    @classmethod
    def from_argparse(cls, parser: ArgumentParser):
        po = parser.parse_args()
        return cls(**po)
  1. When creating a VI instance vi = VI(po, rsrc_fh=po.rsrc) note that the po object MUST also have it's po.rsrc attribute set the the file handle and the rsrc_fh argument needs to be set to the file handle too, otherwise the constructor throws an error. This feels redundant to need to set po.rsrc and also pass in the rsrc_fh argument.

Side question:

How can I suppress (or fix) these warnings, below, that are printing to the console? I get them when I run the above code.

Y:\path\to\my.vi: Warning: Block b'VICD' section 0 patches parse exception: No parsing made for MCLVRTPatches.
Y:\path\to\my.vi: Warning: Block b'VICD' section 0 size is 4884 and does not match parsed size 4816

Text printed to stderr is not conditional - to remove these you either need to replace sys.stderr, or modify the eprint() in pylabview.

The other ideas - it's more code to do the same thing, but it is a better library interface. If properly implemented, it should be acceptable.

The po.rsrc is a file name, not stream.

I don't think a class is required for print-map, better to keep it as string. If not set, the default is not empty string - see argparse docs:
https://docs.python.org/3/library/argparse.html#quick-links-for-add-argument

OK, it's starting to simplify...

from dataclasses import dataclass, field
from typing import Dict

from pylabview.LVrsrcontainer import VI


@dataclass
class ProgramOptions:
    """global configuration settings for the pylabview tool."""
    print_map: str = field(default="RSRC", metadata={"help": "Print the RSRC map to stdout."})
    typedesc_list_limit: int = field(default=4095, metadata={"help": "Limit for the length of type descriptors."})
    array_data_limit: int = field(default=(2 ** 28) - 1, metadata={"help": "Limit for the length of array data."})
    store_as_data_above: int = field(default=4095, metadata={"help": "Store as data above this length."})
    verbose: int = field(default=0,
                         metadata={"help": "Increase verbosity of output. 0 = quiet, 1 = normal, 2 = verbose."})
    rsrc: str | None = field(default=None, metadata={"help": "file path to RSRC file."})


def get_file_labview_version(vi_path: str) -> Dict:
    """return the labview version a VI is saved in"""

    with open(vi_path, "rb") as f:
        vi = VI(po=ProgramOptions(), rsrc_fh=f)

    return vi.getFileVersion()

my thoughts about helpful tweaks then become:

  • add ProgramOptions class to the library - that way users don't have to guess or go exploring about how it's used.
  • make passing po optional -- for VI.__init__ make the po argument optional and set self.po = None if po is None else ProgramOptions()
  • add an option (to ProgramOptions) to suppress printing to stderr -- (for a library at least) if there's an error, my thinking is that it should be propagated up to the caller, rather than printed to stdout. Printing python exceptions to stdout is a command-line application feature and not so much a feature of a VI/RSRC class of the library.
  • create some very high level examples, tests, or even a module of the library for most common use cases -- for example, it would be very helpful to see how to get a VI's labview version. This discussion is a log of my journey to accomplish that, as a new user.

I'm happy to help with the above suggestions. Which of the above would you be friendly to, and I'll create some PRs and work to make sure they meet your expectations/requirements.

Thanks!

add ProgramOptions class to the library

That sounds ok.

make passing po optional -- for VI.init make the po argument optional

I don't think it's a good idea. The options should always be provided.

add an option (to ProgramOptions) to suppress printing to stderr

If you see any invalid propagation of errors, it can be fixed. But stdout/stderr are not something to be afraid of when programming. The standard streams are there to be used, and all well designed operating systems handle that correctly, GUI or not.
If you want to capture stderr, you should replace the stderr with a memory stream or another mechanism. Not using stdio in not any kind of good practice.

That said, what could be placed instead of just printing, is the logging module, I guess. This would also replace the po.verbose checking with configuration of the logger level. Personally I rarely use the logging, stdio is simpler. And I'm not sure how using it would influence tests - I do use the logging there.

create some very high level examples, tests, or even a module of the library for most common use cases

Sure, tests are ok. I only created black box tests, and only for extraction - but I'd be happy to have unit tests as well.

Not using stdio in not any kind of good practice.

I agree stdio is a best practice for console applications as a way to communicate with the calling process, terminal UI, and OS -- it's the console application's external interface to the world.

However, I would argue that use of stdio in libraries designed for data parsing/generating limits their use in other types of applications and adds complexity for code that calls them and may want to handle stdio differently or not at all (in the case of GUI applications).

Most open-source apps print things on the console. You just don't see it, unless you have an actual console attached to that process. It is not normal for one kind of apps, but universal.

But in case of Python and pylabview, you are right with your arguments about complexity, and switching the standard streams having potentially wider consequences.