catcombo/jira2markdown

Converting Jira lists with CRLF line-breaks adds erroneous whitespace to subsequent text

Opened this issue · 6 comments

When using jira2markdown's convert() function on Jira lists with Carriage Return (CR) Line Feed (LF) (CRLF) style line-breaks the resulting markdown text adds erroneous whitespace to subsequent text after the last list item.

See below for a visual example of the conversion issue.

from jira2markdown import convert
jira_text = 'Line Before List: Sample text words words words:\r\n * Bulleted Item 1: Sample text words words words\r\n * Bulleted Item 2: Sample text words words words\r\n\r\nLine After List: Sample text words words words\r\nLine After List: Sample text words words words'
print(jira_text)

Input (jira_text printed):

Line Before List: Sample text words words words:
 * Bulleted Item 1: Sample text words words words
 * Bulleted Item 2: Sample text words words words

Line After List: Sample text words words words
Line After List: Sample text words words words

Input (jira_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
 * Bulleted Item 1: Sample text words words words\r\n
 * Bulleted Item 2: Sample text words words words\r\n
\r\n
Line After List: Sample text words words words\r\n
Line After List: Sample text words words words
md_text = convert(jira_text)

Expected Output (md_text printed):

Line Before List: Sample text words words words:
- Bulleted Item 1: Sample text words words words
- Bulleted Item 2: Sample text words words words

Line After List: Sample text words words words
Line After List: Sample text words words words

Expected Output (md_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
- Bulleted Item 1: Sample text words words words\r\n
- Bulleted Item 2: Sample text words words words\r\n
\r\n
Line After List: Sample text words words words\r\n
Line After List: Sample text words words words\r\n
print(md_text)

Actual Output (md_text printed):

Line Before List: Sample text words words words:
- Bulleted Item 1: Sample text words words words
- Bulleted Item 2: Sample text words words words
  
  Line After List: Sample text words words words
  Line After List: Sample text words words words

Actual Output (md_text with line-breaks visualized):

Line Before List: Sample text words words words:\r\n
- Bulleted Item 1: Sample text words words words\n
- Bulleted Item 2: Sample text words words words\n
  \n
  Line After List: Sample text words words words\n
  Line After List: Sample text words words words

As shown the conversion ends up replacing:

  • \r\n in the list with \n
  • \r\n\r\n at the end of the list with \n \n
  • \r\n after the list with \n

Copy-and-Pasteable Snippet to replicate the issue:

from jira2markdown import convert

# Input with CRLF line-breaks 
jira_text = 'Line Before List: Sample text words words words:\r\n * Bulleted Item 1: Sample text words words words\r\n * Bulleted Item 2: Sample text words words words\r\n\r\nLine After List: Sample text words words words\r\nLine After List: Sample text words words words'

# Print input with line-breaks rendered
print("\njira_text:\n" + jira_text)

# Print input with line-breaks represented, not rendered
print("\nrepr(jira_text):\n" + repr(jira_text))

md_text = convert(jira_text)

# Print output with line-breaks rendered
print("\nmd_text:\n" + md_text)

# Print output with line-breaks represented, not rendered

print("\nrepr(md_text):\n" + repr(md_text))

Happy to try and find/fix the issue if it is not a trivial fix on your end.

Hi @arctus-io!

Thank you very much for the detailed issue with the reproducer. Sorry for the late response. I prepared a fix #28 Could you please test it?

@catcombo: Thanks for the fix! I can confirm #28 fixes this issue.

However... while testing this fix I noticed that the same type of issue regarding \r\n line endings is causing problems with table conversions as well.

I can submit another issue with more details if needed but I believe applying an across the board change that gives \r\n the same equivalency as \n would.

My current workaround is to just convert all \r\n line endings to \n line endings before running it through jira2markdown

@arctus-io Thanks for the feedback! The easiest way to solve this problem is to replace \r\n with \n in the convert function before applying markup conversion. But I would like to find solution of how to fix it on the pyparsing level. It may take some time. Could you give me an example for the table conversion so I have more test cases?

@catcombo:

from jira2markdown import convert

# Example for CRLF `\r\n` broken table conversion

table_CRLF_test_input = 'Table Test:\r\n\r\n||heading 1||heading 2||heading 3||\r\n|col A1|col A2|col A3|\r\n|col B1|col B2|col B3|\r\n\r\nLine after table'

table_CRLF_test_output = convert(table_CRLF_test_input)

print(f"\n\nJira Input with CRLF (printed):\n{'-' * 40}\n{table_CRLF_test_input}\n{'-' * 40}\n")

print(f"\n\nJira Input with CRLF (string):\n{'-' * 40}\n{repr(table_CRLF_test_input)}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from CRLF (printed):\n{'-' * 40}\n{table_CRLF_test_output}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from CRLF (string):\n{'-' * 40}\n{repr(table_CRLF_test_output)}\n{'-' * 40}\n")

# Example for LF `\n` working table conversion

table_LF_test_input = 'Table Test:\n\n||heading 1||heading 2||heading 3||\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'

table_LF_test_output = convert(table_LF_test_input)

print(f"\n\nJira Input with LF (printed):\n{'-' * 40}\n{table_LF_test_input}\n{'-' * 40}\n")

print(f"\n\nJira Input with LF (string):\n{'-' * 40}\n{repr(table_LF_test_input)}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from LF (printed):\n{'-' * 40}\n{table_LF_test_output}\n{'-' * 40}\n")

print(f"\n\nMarkdown Output from LF (string):\n{'-' * 40}\n{repr(table_LF_test_output)}\n{'-' * 40}\n")

Output from above:

Jira Input with CRLF (printed):
----------------------------------------
Table Test:

||heading 1||heading 2||heading 3||
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Jira Input with CRLF (string):
----------------------------------------
'Table Test:\r\n\r\n||heading 1||heading 2||heading 3||\r\n|col A1|col A2|col A3|\r\n|col B1|col B2|col B3|\r\n\r\nLine after table'
----------------------------------------



Markdown Output from CRLF (printed):
----------------------------------------
Table Test:


|heading 1|heading 2|heading 3|
|-|-|-|-|
|col A1|col A2|col A3|
<br>Line after table||

----------------------------------------



Markdown Output from CRLF (string):
----------------------------------------
'Table Test:\r\n\r\n\n|heading 1|heading 2|heading 3|\r|\n|-|-|-|-|\n|col A1|col A2|col A3|\r|\n|col B1|col B2|col B3|\r<br>\r<br>Line after table|\n'
----------------------------------------



Jira Input with LF (printed):
----------------------------------------
Table Test:

||heading 1||heading 2||heading 3||
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Jira Input with LF (string):
----------------------------------------
'Table Test:\n\n||heading 1||heading 2||heading 3||\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'
----------------------------------------



Markdown Output from LF (printed):
----------------------------------------
Table Test:

|heading 1|heading 2|heading 3|
|-|-|-|
|col A1|col A2|col A3|
|col B1|col B2|col B3|

Line after table
----------------------------------------



Markdown Output from LF (string):
----------------------------------------
'Table Test:\n\n|heading 1|heading 2|heading 3|\n|-|-|-|\n|col A1|col A2|col A3|\n|col B1|col B2|col B3|\n\nLine after table'
----------------------------------------

Thanks for the reproducer for tables! I guess the easiest and reliable way of how to fix this issue would be force conversion of \r\n to \n. I updated PR. Could you please test? I think it should work now for any markup elements combinations.