Windows emits Console Virtual Terminal Sequences which causes output parsing to fail

Any Windows pros around?
I am not entirely sure why but running dbt commands can add characters that look like Console Virtual Terminal Sequences (https://docs.microsoft.com/en-us/windows/console/console-virtual-terminal-sequences) which will cause the parsing of the command output to fail.
stdout will then look something like

{"resource_type": "model", "depends_on": {"macros": ... }}
�[0m

Happened on a Windows 10.

Recreate:
This script caused the error:

from dbt_invoke.properties import update
from invoke.context import Context

ctx = Context()
update(ctx,
       resource_type='model',
       project_dir="PATH",
       select="users",
       log_level="DEBUG")

Solution
We can use a simple regex to parse out any unwanted characters. Something like proposed here: https://stackoverflow.com/a/38662876

PR Should be done soon.

Thank you, @ciklista, for submitting this issue and also for PR #17. I am going to try to reproduce this output on a Windows machine (tried on Mac, but as expected I didn't get an error). The PR looks straightforward and I'm definitely not a Windows expert, but I would like to spend a little bit of time to see I can get a better grasp for the root cause before merging the fix.

Are you able to post the error message here also?

The error I get is (sorry for the german error message)

Traceback (most recent call last):
  File "C:\Users\\code\dbt-invoke\dbt_invoke\testomatiko.py", line 5, in <module>
    update(ctx,
  File "C:\Users\PycharmProjects\dbt-invoke\venv\lib\site-packages\invoke\tasks.py", line 127, in __call__
    result = self.body(*args, **kwargs)
  File "C:\Users\\code\dbt-invoke\dbt_invoke\properties.py", line 101, in update
    common_dbt_kwargs, transformed_ls_results = _initiate_alterations(
  File "C:\Users\code\dbt-invoke\dbt_invoke\properties.py", line 236, in _initiate_alterations
    transformed_ls_results = _transform_ls_results(
  File "C:\Users\code\dbt-invoke\dbt_invoke\properties.py", line 292, in _transform_ls_results
    if Path(ctx.config['project_path'], potential_result_path).exists():
  File "C:\Program Files\Python39\lib\pathlib.py", line 1407, in exists
    self.stat()
  File "C:\Program Files\Python39\lib\pathlib.py", line 1221, in stat
    return self._accessor.stat(self)
OSError: [WinError 123] Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch: 'C:\\[...]\\code\\dbt2\\\x1b[0m'

Process finished with exit code 1

The wrong line is being added to the dbt ls output here and thus treated as a resource with a corresponding yml file.

dbt-invoke/dbt_invoke/internal/_utils.py

Lines 173 to 195 in 949c0eb

    
           for line in result_lines: 
        
               # Because we set the dbt global arg "--log-format json", if 
        
               # line is valid json then it may be an actual result or it 
        
               # may be some other output from dbt, like a warning. 
        
               try: 
        
                   line_dict = json.loads(line) 
        
               # If line is not valid json, then it should be an actual 
        
               # result. This is because even when the "dbt ls" command 
        
               # arg "--output" is not set to json, non-result logs will 
        
               # still be in json format (due to the dbt global arg 
        
               # "--log-format json"). 
        
               except ValueError: 
        
                   result_lines_filtered.append(line) 
        
                   continue 
        
               # If 'resource_type' is in line_dict, then this is likely 
        
               # an actual result and not something else like a warning. 
        
               if 'resource_type' in line_dict: 
        
                   result_lines_filtered.append(line_dict) 
        
               # Else, if 'resource_type' is not in line_dict, this may be 
        
               # a warning from dbt, so log it. 
        
               else: 
        
                   logger.warning(f'Extra output from "dbt ls" command: {line}') 
        
           return result_lines_filtered

@ciklista, thanks for adding the error message (no need to apologize for knowing multiple languages!). It helped me understand better.

I have an alternate suggestion that I would like to get your opinion on.

I notice the error message is referencing line 290 from this part of properties.py:

dbt-invoke/dbt_invoke/properties.py

Lines 270 to 291 in 949c0eb

    
           for i, potential_result in enumerate(potential_results): 
        
               if 'original_file_path' in potential_result: 
        
                   potential_result_path = potential_result['original_file_path'] 
        
               # Before dbt version 0.20.0, original_file_path was not 
        
               # included in the json response of "dbt ls". For older 
        
               # versions of dbt, we need to run "dbt ls" with the 
        
               # "--output path" argument in order to retrieve paths 
        
               else: 
        
                   if potential_result_paths is None: 
        
                       potential_result_paths = _utils.dbt_ls( 
        
                           ctx, 
        
                           supported_resource_types=_SUPPORTED_RESOURCE_TYPES, 
        
                           logger=_LOGGER, 
        
                           output='path', 
        
                           **kwargs, 
        
                       ) 
        
                       assert len(potential_result_paths) == len( 
        
                           potential_results 
        
                       ), 'Length of results differs from length of result details' 
        
                   potential_result_path = potential_result_paths[i] 
        
               if Path(ctx.config['project_path'], potential_result_path).exists(): 
        
                   results[potential_result_path] = potential_result

There is an error during Path(ctx.config['project_path'], potential_result_path).exists() because the Path not only doesn't exist, but is invalid (because of the ANSI escape sequence issue you identified) as seen near the end of your error message: ... 'C:\\[...]\\code\\dbt2\\\x1b[0m'

What if instead of filtering escape sequences with a regex, we handled all invalid paths, regardless of platform, by checking whether the path is valid before checking whether it exists? If the path is invalid we can trigger a debug log and continue to the next iteration of the loop. Something like:

for i, potential_result in enumerate(potential_results): 
    ...
    project_with_potential_result_path = Path(ctx.config['project_path'], potential_result_path)
    try:
        project_with_potential_result_path.resolve()
    except (OSError, RuntimeError) as e:
        _LOGGER.debug(e)
        continue
    if project_with_potential_result_path.exists(): 
        results[potential_result_path] = potential_result

Thoughts?

That would definitely solve the issue but I think I prefer making sure that the command output is valid because it
(a) targets the issue more specifically (what if something causes dbt ls to emit a valid output that is not a path, I guess we'd like the function to fail) and
(b) it's closer to the actual problem. If the output gets used in a different context, we'd have to handle the ANSI escape sequence there as well.

You have convinced me with your point (b) that regardless of the behavior of the _transform_ls_results function we should handle the ANSI escape sequence closer to the source, as you originally suggested.

	for line in result_lines:
	# Because we set the dbt global arg "--log-format json", if
	# line is valid json then it may be an actual result or it
	# may be some other output from dbt, like a warning.
	try:
	line_dict = json.loads(line)
	# If line is not valid json, then it should be an actual
	# result. This is because even when the "dbt ls" command
	# arg "--output" is not set to json, non-result logs will
	# still be in json format (due to the dbt global arg
	# "--log-format json").
	except ValueError:
	result_lines_filtered.append(line)
	continue
	# If 'resource_type' is in line_dict, then this is likely
	# an actual result and not something else like a warning.
	if 'resource_type' in line_dict:
	result_lines_filtered.append(line_dict)
	# Else, if 'resource_type' is not in line_dict, this may be
	# a warning from dbt, so log it.
	else:
	logger.warning(f'Extra output from "dbt ls" command: {line}')
	return result_lines_filtered

	for i, potential_result in enumerate(potential_results):
	if 'original_file_path' in potential_result:
	potential_result_path = potential_result['original_file_path']
	# Before dbt version 0.20.0, original_file_path was not
	# included in the json response of "dbt ls". For older
	# versions of dbt, we need to run "dbt ls" with the
	# "--output path" argument in order to retrieve paths
	else:
	if potential_result_paths is None:
	potential_result_paths = _utils.dbt_ls(
	ctx,
	supported_resource_types=_SUPPORTED_RESOURCE_TYPES,
	logger=_LOGGER,
	output='path',
	**kwargs,
	)
	assert len(potential_result_paths) == len(
	potential_results
	), 'Length of results differs from length of result details'
	potential_result_path = potential_result_paths[i]
	if Path(ctx.config['project_path'], potential_result_path).exists():
	results[potential_result_path] = potential_result