heredoc termination problems
28mm opened this issue · 4 comments
Hi,
first of all: I've been using pyHCL and love it. Thank you.
There seems to be a difference in the way Terraform and pyHCL handle heredoc termination, that's come up for me a couple of times.
Terraform seems happy to accept EOF
at either the beginning or the end of a line, while PyHCL only accepts it at the beginning (or after tabs, in the case of tabbed heredocs).
See below for a simple demonstration of the issue. I pulled the definition of cert_options
from tectonic
, where I've most recently encountered it.
Anyway, I will have a look at the parser code, but any guidance is much appreciated :)
provider "aws" {
region = "us-east-1"
}
variable cert_options {
default=<<EOF
--cert-file=/etc/ssl/etcd/server.crt \
--key-file=/etc/ssl/etcd/server.key \
--peer-cert-file=/etc/ssl/etcd/peer.crt \
--peer-key-file=/etc/ssl/etcd/peer.key \
--peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
--peer-client-cert-auth=trueEOF
}
In [1]: import hcl
In [2]: config = """
...: provider "aws" {
...: region = "us-east-1"
...: }
...:
...: variable cert_options {
...:
...: default=<<EOF
...: --cert-file=/etc/ssl/etcd/server.crt \
...: --key-file=/etc/ssl/etcd/server.key \
...: --peer-cert-file=/etc/ssl/etcd/peer.crt \
...: --peer-key-file=/etc/ssl/etcd/peer.key \
...: --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
...: --peer-client-cert-auth=trueEOF
...:
...: }"""
In [3]: hcl.loads(config)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-18dfafa781ac> in <module>()
----> 1 hcl.loads(config)
/usr/local/lib/python3.6/site-packages/hcl/api.py in loads(s)
60 s = u(s)
61 if isHcl(s):
---> 62 return HclParser().parse(s)
63 else:
64 return json.loads(s)
/usr/local/lib/python3.6/site-packages/hcl/parser.py in parse(self, s)
305
306 def parse(self, s):
--> 307 return self.yacc.parse(s, lexer=Lexer())
308
309
/usr/local/lib/python3.6/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
329 return self.parseopt(input, lexer, debug, tracking, tokenfunc)
330 else:
--> 331 return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
332
333
/usr/local/lib/python3.6/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
1059 if not lookahead:
1060 if not lookaheadstack:
-> 1061 lookahead = get_token() # Get the next token
1062 else:
1063 lookahead = lookaheadstack.pop()
/usr/local/lib/python3.6/site-packages/hcl/lexer.py in token(self)
273
274 def token(self):
--> 275 return self.lex.token()
/usr/local/lib/python3.6/site-packages/ply/lex.py in token(self)
404 tok.lexer = self
405 self.lexpos = lexpos
--> 406 newtok = self.lexeoff(tok)
407 return newtok
408
/usr/local/lib/python3.6/site-packages/hcl/lexer.py in t_heredoc_eof(self, t)
220 def t_heredoc_eof(self, t):
221 t.lexer.lineno += t.lexer.lexdata[t.lexer.here_start:t.lexer.lexpos].count('\n')
--> 222 _raise_error(t, 'EOF before closing heredoc')
223
224 t_tabbedheredoc_ignoring = t_heredoc_ignoring
/usr/local/lib/python3.6/site-packages/hcl/lexer.py in _raise_error(t, message)
15 if message is None:
16 message = "Illegal character '%s'" % lexdata[lexpos]
---> 17 raise ValueError("Line %d, column %d, index %d: %s" % (lineno, column, lexpos, message))
18
19 def _find_column(input, token):
ValueError: Line 11, column 1, index 325: EOF before closing heredoc
And here is Terraform:
[...]$ terraform init
[...]$ terraform graph
digraph {
compound = "true"
newrank = "true"
subgraph "root" {
"[root] meta.count-boundary (count boundary fixup)" -> "[root] var.cert_options"
}
}
That's interesting that they allow that. I don't know if there is an official spec for heredocs, but the wikipedia article mentions "The most common syntax for here documents, originating in Unix shells, is << followed by a delimiting identifier (often EOF or END), followed, starting on the next line, by the text to be quoted, and then closed by the same delimiting identifier on its own line."
If you wanted to get down and dirty yourself, the place you'll want to look is in the lexer, not the parser. Specifically here: https://github.com/virtuald/pyhcl/blob/master/src/hcl/lexer.py#L182-L214
The t_heredoc_STRING
is what looks for the pattern of a line that has only non-whitespace characters followed by a newline (t_tabbedheredoc_STRING
is similar but it allows tabs at the beginning). If it finds a line like that, it calls _end_heredoc
to see if the value
is the here_identifier
. If so it does some work to fix up the tabs, fix up the pos
counter and the lineno
counter, and then returns. If it doesn't find that it will go back to trying to match a line for t_heredoc_STRING
.
You'd probably need to change the t_*heredoc_STRING
functions to check all lines of the heredoc for a line that ends with the identifier and then do all the necessary fixing up to clean up.
Feel free to take a stab. I can assist or try to implement it at some point.
Actually, I saw that the example was pretty simple, so I went ahead and did it.
In [1]: import hcl
In [2]: config = """
...: provider "aws" {
...: region = "us-east-1"
...: }
...:
...: variable cert_options {
...:
...: default=<<EOF
...: --cert-file=/etc/ssl/etcd/server.crt \
...: --key-file=/etc/ssl/etcd/server.key \
...: --peer-cert-file=/etc/ssl/etcd/peer.crt \
...: --peer-key-file=/etc/ssl/etcd/peer.key \
...: --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt \
...: --peer-client-cert-auth=trueEOF
...:
...: }"""
In [3]: hcl.loads(config)
Out[3]:
{'provider': {'aws': {'region': 'us-east-1'}},
'variable': {'cert_options': {'default': ' --cert-file=/etc/ssl/etcd/server.crt --key-file=/etc/ssl/etcd/server.key --peer-cert-file=/etc/ssl/etcd/peer.crt --peer-key-file=/etc/ssl/etcd/peer.key --peer-trusted-ca-file=/etc/ssl/etcd/ca.crt --peer-client-cert-auth=true'}}}
@scottbelden works like a charm.