simplistix/sybil

IndentationError results from trailing carriage return

jborbely opened this issue · 4 comments

This issue occurs if the documentation file that is being parsed for code blocks contains \r\n line separators. It results from the implementation of the builtin textwrap.dedent function and the text that Sybil passes to this function.

The regex pattern used for end_pattern in the CodeBlockParser finds the \n but not the corresponding \r which results in a trailing \r.

The following illustrates how textwrap.dedent behaves. The trailing \r that exists when I initially define text does not dedent the text -- which would result in an IndentationError being raised by sybil.parsers.codeblock.compile_codeblock

>>> import textwrap
>>> text = '    from math import cos\r\n    x=cos(0.1)\r\n\r'
>>> textwrap.dedent(text)
'    from math import cos\r\n    x=cos(0.1)\r\n\r'
>>> text = '    from math import cos\r\n    x=cos(0.1)\r\n'
>>> textwrap.dedent(text)
'from math import cos\r\nx=cos(0.1)\r\n'

Here's an example for how to reproduce it with Sybil to raise an IndentationError

from sybil.document import Document
from sybil.example import Example
from sybil.parsers.codeblock import CodeBlockParser

text = 'This is my example:\r\n\r\n.. code-block:: python\r\n\r\n    from math import cos\r\n    x = cos(0.1)\r\n\r\nThat was my example.\r\n'

document = Document(text=text, path='whatever.rst')

region = list(CodeBlockParser()(document))[0]
region.evaluator(
    Example(
        document=document,
        line=0,
        column=0,
        region=region,
        namespace={}
    )
)

A similar issue could exist when using dedent for a capture, but I have not tested it.

I don't suppose there's any chance you could not use Windows line endings?

It would be a challenge for me to enforce that all of my colleagues modify their git configuration to only push LF and not CRLF line separators to the repository.

I have discovered that the root of this issue is not in Sybil but in the open function and is dependent on the platform and Python version. The following is taken from the docs,

In text mode, the default when reading is to convert platform-specific line endings (\n on Unix, \r\n on Windows) to just \n.

So Sybil should never encounter a \r. My example above that showed how to raise an IndentationError was not fair because that is not how Sybil is supposed to be used -- I would never manually create a Document in my code, but, through this example, I was able to get to the real root of this issue.

My proposed solution is to modify how text is defined. To replace

text = source.read()

with

text = '\n'.join(line.rstrip() for line in source)

I think that there is very little overhead in this change and it does help to protect Sybil against future changes to the default behaviour of the open function. I also can't think of an example where right-stripping white space from each line would cause a code-block or capture to pass when it should have failed or fail when it should have passed. It also solves the IndentationError that I experienced.

The question is whether you want to fix this issue as it appears to be unique to Python 2.7 and linux and macOS.

Sorry, not following: sybil opens the file in text mode, so what's happening to result in \r getting through to where it's causing you problems?

I think 141b3a3, which is in the 1.2.2 release, will fix your issue.