virtuald/pyhcl

heredoc termination problems

28mm opened this issue · 4 comments

28mm commented

Hi,

first of all: I've been using pyHCL and love it. Thank you.

There seems to be a difference in the way Terraform and pyHCL handle heredoc termination, that's come up for me a couple of times.

Terraform seems happy to accept EOF at either the beginning or the end of a line, while PyHCL only accepts it at the beginning (or after tabs, in the case of tabbed heredocs).

See below for a simple demonstration of the issue. I pulled the definition of cert_options from tectonic, where I've most recently encountered it.

Anyway, I will have a look at the parser code, but any guidance is much appreciated :)

provider "aws" {
	region = "us-east-1"
}

variable cert_options {

   default=<<EOF
--cert-file=/etc/ssl/etcd/server.crt \
  --key-file=/etc/ssl/etcd/server.key \
  --peer-cert-file=/etc/ssl/etcd/peer.crt \
  --peer-key-file=/etc/ssl/etcd/peer.key \
  --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
  --peer-client-cert-auth=trueEOF

}
In [1]: import hcl

In [2]: config = """
   ...: provider "aws" {
   ...: region = "us-east-1"
   ...: }
   ...: 
   ...: variable cert_options {
   ...: 
   ...:    default=<<EOF
   ...:   --cert-file=/etc/ssl/etcd/server.crt \
   ...:   --key-file=/etc/ssl/etcd/server.key \
   ...:   --peer-cert-file=/etc/ssl/etcd/peer.crt \
   ...:   --peer-key-file=/etc/ssl/etcd/peer.key \
   ...:   --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
   ...:   --peer-client-cert-auth=trueEOF
   ...: 
   ...: }"""

In [3]: hcl.loads(config)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-18dfafa781ac> in <module>()
----> 1 hcl.loads(config)

/usr/local/lib/python3.6/site-packages/hcl/api.py in loads(s)
     60     s = u(s)
     61     if isHcl(s):
---> 62         return HclParser().parse(s)
     63     else:
     64         return json.loads(s)

/usr/local/lib/python3.6/site-packages/hcl/parser.py in parse(self, s)
    305 
    306     def parse(self, s):
--> 307         return self.yacc.parse(s, lexer=Lexer())
    308 
    309 

/usr/local/lib/python3.6/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
    329             return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    330         else:
--> 331             return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    332 
    333 

/usr/local/lib/python3.6/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1059                 if not lookahead:
   1060                     if not lookaheadstack:
-> 1061                         lookahead = get_token()     # Get the next token
   1062                     else:
   1063                         lookahead = lookaheadstack.pop()

/usr/local/lib/python3.6/site-packages/hcl/lexer.py in token(self)
    273 
    274     def token(self):
--> 275         return self.lex.token()

/usr/local/lib/python3.6/site-packages/ply/lex.py in token(self)
    404             tok.lexer = self
    405             self.lexpos = lexpos
--> 406             newtok = self.lexeoff(tok)
    407             return newtok
    408 

/usr/local/lib/python3.6/site-packages/hcl/lexer.py in t_heredoc_eof(self, t)
    220     def t_heredoc_eof(self, t):
    221         t.lexer.lineno += t.lexer.lexdata[t.lexer.here_start:t.lexer.lexpos].count('\n')
--> 222         _raise_error(t, 'EOF before closing heredoc')
    223 
    224     t_tabbedheredoc_ignoring = t_heredoc_ignoring

/usr/local/lib/python3.6/site-packages/hcl/lexer.py in _raise_error(t, message)
     15     if message is None:
     16         message = "Illegal character '%s'" % lexdata[lexpos]
---> 17     raise ValueError("Line %d, column %d, index %d: %s" % (lineno, column, lexpos, message))
     18 
     19 def _find_column(input, token):

ValueError: Line 11, column 1, index 325: EOF before closing heredoc

And here is Terraform:

[...]$ terraform init
[...]$ terraform graph
digraph {
	compound = "true"
	newrank = "true"
	subgraph "root" {
		"[root] meta.count-boundary (count boundary fixup)" -> "[root] var.cert_options"
	}
}

That's interesting that they allow that. I don't know if there is an official spec for heredocs, but the wikipedia article mentions "The most common syntax for here documents, originating in Unix shells, is << followed by a delimiting identifier (often EOF or END), followed, starting on the next line, by the text to be quoted, and then closed by the same delimiting identifier on its own line."

If you wanted to get down and dirty yourself, the place you'll want to look is in the lexer, not the parser. Specifically here: https://github.com/virtuald/pyhcl/blob/master/src/hcl/lexer.py#L182-L214

The t_heredoc_STRING is what looks for the pattern of a line that has only non-whitespace characters followed by a newline (t_tabbedheredoc_STRING is similar but it allows tabs at the beginning). If it finds a line like that, it calls _end_heredoc to see if the value is the here_identifier. If so it does some work to fix up the tabs, fix up the pos counter and the lineno counter, and then returns. If it doesn't find that it will go back to trying to match a line for t_heredoc_STRING.

You'd probably need to change the t_*heredoc_STRING functions to check all lines of the heredoc for a line that ends with the identifier and then do all the necessary fixing up to clean up.

Feel free to take a stab. I can assist or try to implement it at some point.

@28mm I believe #37 should resolve the issue. I added some tests, but perhaps you can give it a go with your example?

Actually, I saw that the example was pretty simple, so I went ahead and did it.

In [1]: import hcl

In [2]: config = """
   ...: provider "aws" {
   ...: region = "us-east-1"
   ...: }
   ...: 
   ...: variable cert_options {
   ...: 
   ...:    default=<<EOF
   ...:   --cert-file=/etc/ssl/etcd/server.crt \
   ...:   --key-file=/etc/ssl/etcd/server.key \
   ...:   --peer-cert-file=/etc/ssl/etcd/peer.crt \
   ...:   --peer-key-file=/etc/ssl/etcd/peer.key \
   ...:   --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
   ...:   --peer-client-cert-auth=trueEOF
   ...: 
   ...: }"""

In [3]: hcl.loads(config)
Out[3]: 
{'provider': {'aws': {'region': 'us-east-1'}},
 'variable': {'cert_options': {'default': '  --cert-file=/etc/ssl/etcd/server.crt   --key-file=/etc/ssl/etcd/server.key   --peer-cert-file=/etc/ssl/etcd/peer.crt   --peer-key-file=/etc/ssl/etcd/peer.key   --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt   --peer-client-cert-auth=true'}}}
28mm commented

@scottbelden works like a charm.