Windows emits Console Virtual Terminal Sequences which causes output parsing to fail
Closed this issue · 5 comments
Any Windows pros around?
I am not entirely sure why but running dbt commands can add characters that look like Console Virtual Terminal Sequences (https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences) which will cause the parsing of the command output to fail.
stdout will then look something like
{"resource_type": "model", "depends_on": {"macros": ... }}
�[0m
Happened on a Windows 10.
Recreate:
This script caused the error:
from dbt_invoke.properties import update
from invoke.context import Context
ctx = Context()
update(ctx,
resource_type='model',
project_dir="PATH",
select="users",
log_level="DEBUG")
Solution
We can use a simple regex to parse out any unwanted characters. Something like proposed here: https://stackoverflow.com/a/38662876
PR Should be done soon.
Thank you, @ciklista, for submitting this issue and also for PR #17. I am going to try to reproduce this output on a Windows machine (tried on Mac, but as expected I didn't get an error). The PR looks straightforward and I'm definitely not a Windows expert, but I would like to spend a little bit of time to see I can get a better grasp for the root cause before merging the fix.
Are you able to post the error message here also?
The error I get is (sorry for the german error message)
Traceback (most recent call last):
File "C:\Users\\code\dbt-invoke\dbt_invoke\testomatiko.py", line 5, in <module>
update(ctx,
File "C:\Users\PycharmProjects\dbt-invoke\venv\lib\site-packages\invoke\tasks.py", line 127, in __call__
result = self.body(*args, **kwargs)
File "C:\Users\\code\dbt-invoke\dbt_invoke\properties.py", line 101, in update
common_dbt_kwargs, transformed_ls_results = _initiate_alterations(
File "C:\Users\code\dbt-invoke\dbt_invoke\properties.py", line 236, in _initiate_alterations
transformed_ls_results = _transform_ls_results(
File "C:\Users\code\dbt-invoke\dbt_invoke\properties.py", line 292, in _transform_ls_results
if Path(ctx.config['project_path'], potential_result_path).exists():
File "C:\Program Files\Python39\lib\pathlib.py", line 1407, in exists
self.stat()
File "C:\Program Files\Python39\lib\pathlib.py", line 1221, in stat
return self._accessor.stat(self)
OSError: [WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'C:\\[...]\\code\\dbt2\\\x1b[0m'
Process finished with exit code 1
The wrong line is being added to the dbt ls
output here and thus treated as a resource with a corresponding yml file.
dbt-invoke/dbt_invoke/internal/_utils.py
Lines 173 to 195 in 949c0eb
@ciklista, thanks for adding the error message (no need to apologize for knowing multiple languages!). It helped me understand better.
I have an alternate suggestion that I would like to get your opinion on.
I notice the error message is referencing line 290 from this part of properties.py:
dbt-invoke/dbt_invoke/properties.py
Lines 270 to 291 in 949c0eb
There is an error during Path(ctx.config['project_path'], potential_result_path).exists()
because the Path not only doesn't exist, but is invalid (because of the ANSI escape sequence issue you identified) as seen near the end of your error message: ... 'C:\\[...]\\code\\dbt2\\\x1b[0m'
What if instead of filtering escape sequences with a regex, we handled all invalid paths, regardless of platform, by checking whether the path is valid before checking whether it exists? If the path is invalid we can trigger a debug log and continue to the next iteration of the loop. Something like:
for i, potential_result in enumerate(potential_results):
...
project_with_potential_result_path = Path(ctx.config['project_path'], potential_result_path)
try:
project_with_potential_result_path.resolve()
except (OSError, RuntimeError) as e:
_LOGGER.debug(e)
continue
if project_with_potential_result_path.exists():
results[potential_result_path] = potential_result
Thoughts?
That would definitely solve the issue but I think I prefer making sure that the command output is valid because it
(a) targets the issue more specifically (what if something causes dbt ls
to emit a valid output that is not a path, I guess we'd like the function to fail) and
(b) it's closer to the actual problem. If the output gets used in a different context, we'd have to handle the ANSI escape sequence there as well.
You have convinced me with your point (b) that regardless of the behavior of the _transform_ls_results
function we should handle the ANSI escape sequence closer to the source, as you originally suggested.