jwlodek/py_cui

ValueError: embedded null character

PabloLec opened this issue · 2 comments

Describe the bug

I have a nice little bug which I didn't notice yet, but now I encounter it frequently with my pycui tool recoverpy.
The logic, in recoverpy, to display a partition block content in a string format is either you can .decode("utf-8") it or else just str() it.
The error occurs when the string is decoded to utf-8, didn't try with a pure str() translation as it rarely occurs.
For some results, when trying to display it in a text block (decoded in utf8), it throws this exception:

Full error output
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pablo/projets/recoverpy/recoverpy/__main__.py", line 3, in <module>
    recoverpy.main()
  File "/home/pablo/projets/recoverpy/recoverpy/__init__.py", line 55, in main
    VIEWS_HANDLER.open_view_parameters()
  File "/home/pablo/projets/recoverpy/recoverpy/views_handler.py", line 21, in open_view_parameters
    _PARAMETERS_MENU.start()
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 317, in start
    curses.wrapper(self._draw)
  File "/usr/lib/python3.8/curses/__init__.py", line 105, in wrapper
    return func(stdscr, *args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 1567, in _draw
    self._handle_key_presses(key_pressed)
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 1470, in _handle_key_presses
    self._popup._handle_key_press(key_pressed)
  File "/usr/local/lib/python3.8/dist-packages/py_cui/popups.py", line 192, in _handle_key_press
    self._command(ret_val)
  File "/home/pablo/projets/recoverpy/recoverpy/view_parameters.py", line 154, in start_search
    VIEWS_HANDLER.open_view_search(
  File "/home/pablo/projets/recoverpy/recoverpy/views_handler.py", line 39, in open_view_search
    _SEARCH_MENU.start()
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 317, in start
    curses.wrapper(self._draw)
  File "/usr/lib/python3.8/curses/__init__.py", line 105, in wrapper
    return func(stdscr, *args, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 1572, in _draw
    self._draw_widgets()
  File "/usr/local/lib/python3.8/dist-packages/py_cui/__init__.py", line 1366, in _draw_widgets
    self.get_widgets()[widget_key]._draw()
  File "/usr/local/lib/python3.8/dist-packages/py_cui/widgets.py", line 849, in _draw
    self._renderer.draw_text(self, render_text, counter, start_pos=self._viewport_x_start, selected=self._selected)
  File "/usr/local/lib/python3.8/dist-packages/py_cui/renderer.py", line 395, in draw_text
    self._stdscr.addstr(y, current_start_x, text_elem[0])
ValueError: embedded null character

So the cause seems to be ASCII null character, \x00.

This works:

Raw:

b'    }\n}\n\nproc tixBalloon:ClearStatus {w} {\n    upvar #0 $w data\n\n    if {![winfo exists $data(-statusbar)]} {\n\treturn\n    }\n\n    # Clear the StatusBar widget\n    #\n    set vv [$data(-statusbar) cget -textvariable]\n    if {$vv == ""} {\n\t$data(-statusbar) config -text ""\n    } else {\n\tuplevel #0 set $vv [list ""]\n    }\n}\n\n#----------------------------------------------------------------------\n# PublicMethods:\n#----------------------------------------------------------------------\n\n# %% if balloon is already p'

raw.decode("utf-8")

    }
}

proc tixBalloon:ClearStatus {w} {
    upvar #0 $w data

    if {![winfo exists $data(-statusbar)]} {
        return
    }

    # Clear the StatusBar widget
    #
    set vv [$data(-statusbar) cget -textvariable]
    if {$vv == ""} {
        $data(-statusbar) config -text ""
    } else {
        uplevel #0 set $vv [list ""]
    }
}

#----------------------------------------------------------------------
# PublicMethods:
#----------------------------------------------------------------------

# %% if balloon is already p
This throws a ValueError when displayed: Raw:
b'imeVisibleTypeAnnotations\x01\x00\x06equals\x01\x00\x15(Ljava/lang/Object;)Z\n\x00\x03\x00$\x0c\x00%\x00&\x01\x00\x08getClass\x01\x00\x13()Ljava/lang/Class;\n\x00\x03\x00(\x0c\x00!\x00"\x01\x00\x03obj\x01\x00\x12Ljava/lang/Object;\x01\x00\x0cobjectChange\x01\x00\x10objectChangeSet2\x01\x00\rStackMapTable\x07\x00/\x01\x00\x18io/realm/ObjectChangeSet\x01\x00\x0cgetChangeset\x01\x00\x1c()Lio/realm/ObjectChangeSet;\x01\x00\x19RuntimeVisibleAnnotations\x01\x00\tgetObject\x01\x00\x17()Lio/realm/RealmModel;\x01\x00\x05()TE;\x01\x00\x08hashCode\x01\x00\x03()I\n\x00\x03\x009\x0c\x006\x007\x01\x00\x01I\x01\x00\x08toString\x01\x00\x14()Ljava/lang/String;\x07\x00>\x01\x00\x17java/lang/StringBuilder\x08\x00@\x01\x00\x14ObjectChange{object=\n\x00=\x00B\x0c\x00\x0b\x00C\x01\x00\x15(Ljava/lang/String;)V\n\x00=\x00E\x0c\x00F\x00G\x01\x00\x06append\x01'

raw.decode("utf-8")

$
 %getClass()Ljava/lang/Class;
(
 !"objLjava/lang/Object;
StackMapTable/io/realm/ObjectChangeSetjectChangeSet2
                                      getChangeset()Lio/realm/ObjectChangeSet;RuntimeVisibleAnnotations getObject()Lio/realm/RealmModel;()TEhashCode()I
9
 67toString()Ljava/lang/String;>java/lang/StringBuilde@ObjectChange{object=
=B

  C(Ljava/lang/String;)V
=E
  FGappend

I can output the .decode("utf-8") result in the console or in the logging file. This only throws an error when trying to display it in a text block. (And other widgets I guess)
This probably occurs with other encoding formats null characters.
I'm not sure where is the bug located, maybe in curses. And I'm also not sure how to fix it. A simple error handling would not be ideal as it would not display the string.

Of course, I could handle it myself by replacing/removing null characters but maybe we can handle it upstream.

To Reproduce

Sadly, I do not have an easy procedure in mind to reproduce it. I might do a simple app/env if needed. Else, just try recoverpy with a dummy query like test or change, you should encounter the exception in first page block results.


I'm not an expert in character encoding, and neither I am in curses so I do not know what would be a clever way to handle this. Any thoughts?

I'd say the way to fix it is to essentially add a helper function that strips out null characters from decoded strings. We can either do this no matter what string gets passed in or only when a null character is found. I think the issue is curses drawing expects the only null character to be at the end of the string - my guess is the underlying C API will only draw characters until it hits a null terminator, so passing one inside the string causes the python layer to raise an error to let the user know that not all of the string would have been drawn.

The question then becomes what we should replace the null character with? A space? A newline? Nothing?

So I added a PR handling this issue.
I replaced it with nothing, .replace(chr(0), ""), which, I guess, is the expected representation in a Python string. At least I didn't find any counter example.