IndentationError results from trailing carriage return
jborbely opened this issue · 4 comments
This issue occurs if the documentation file that is being parsed for code blocks contains \r\n
line separators. It results from the implementation of the builtin textwrap.dedent
function and the text that Sybil passes to this function.
The regex pattern used for end_pattern in the CodeBlockParser
finds the \n
but not the corresponding \r
which results in a trailing \r
.
The following illustrates how textwrap.dedent
behaves. The trailing \r
that exists when I initially define text
does not dedent the text -- which would result in an IndentationError
being raised by sybil.parsers.codeblock.compile_codeblock
>>> import textwrap
>>> text = ' from math import cos\r\n x=cos(0.1)\r\n\r'
>>> textwrap.dedent(text)
' from math import cos\r\n x=cos(0.1)\r\n\r'
>>> text = ' from math import cos\r\n x=cos(0.1)\r\n'
>>> textwrap.dedent(text)
'from math import cos\r\nx=cos(0.1)\r\n'
Here's an example for how to reproduce it with Sybil to raise an IndentationError
from sybil.document import Document
from sybil.example import Example
from sybil.parsers.codeblock import CodeBlockParser
text = 'This is my example:\r\n\r\n.. code-block:: python\r\n\r\n from math import cos\r\n x = cos(0.1)\r\n\r\nThat was my example.\r\n'
document = Document(text=text, path='whatever.rst')
region = list(CodeBlockParser()(document))[0]
region.evaluator(
Example(
document=document,
line=0,
column=0,
region=region,
namespace={}
)
)
A similar issue could exist when using dedent
for a capture, but I have not tested it.
I don't suppose there's any chance you could not use Windows line endings?
It would be a challenge for me to enforce that all of my colleagues modify their git configuration to only push LF
and not CRLF
line separators to the repository.
I have discovered that the root of this issue is not in Sybil but in the open
function and is dependent on the platform and Python version. The following is taken from the docs,
In text mode, the default when reading is to convert platform-specific line endings (
\n
on Unix,\r\n
on Windows) to just\n
.
So Sybil should never encounter a \r
. My example above that showed how to raise an IndentationError
was not fair because that is not how Sybil is supposed to be used -- I would never manually create a Document
in my code, but, through this example, I was able to get to the real root of this issue.
My proposed solution is to modify how text is defined. To replace
text = source.read()
with
text = '\n'.join(line.rstrip() for line in source)
I think that there is very little overhead in this change and it does help to protect Sybil against future changes to the default behaviour of the open
function. I also can't think of an example where right-stripping white space from each line would cause a code-block or capture to pass when it should have failed or fail when it should have passed. It also solves the IndentationError
that I experienced.
The question is whether you want to fix this issue as it appears to be unique to Python 2.7 and linux and macOS.
Sorry, not following: sybil opens the file in text mode, so what's happening to result in \r
getting through to where it's causing you problems?