regression with plan diffs due to base64 detection
Closed this issue · 2 comments
With the following raw input
~ module.rds.aws_db_parameter_group.default
parameter.1918306725.apply_method: "immediate" => "immediate"
parameter.1918306725.name: "log_output" => "log_output"
parameter.1918306725.value: "file" => "FILE"
I expect, which is the case with 16acb97
~ module.rds.aws_db_parameter_group.default
parameter.1918306725.value: "file" => "FILE"
but the output becomes with latest master ccfd94c (broken in 6dcc97e but untestable until be3f575)
~ module.rds.aws_db_parameter_group.default
parameter.1918306725.value: "~)^" => "FILE"
This seems to be a weird and probably rare case where the input text when base64 decoded is still valid ASCII text. If you threw a regex base64 detection in front, specifically the regex from https://www.sans.org/reading-room/whitepapers/detection/base64-pwned-33759 (?:[A-Za-z0-9+/]{4}){2,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[AZa-z0-9+/][AQgw]==)
, this would correctly detect this as not base64 and provide the correct diff.
The problem with this approach and specifically with this regex from the whitepaper, they say
More targetted regular expressions such as [...] will have fewer false positive results but will miss approximately one
third of the base64 they see as they are looking for trailing equal signs
I wanted to bring this issue up to get some feedback. I can accept this as a rare occurrence where the tradeoffs of using something that increases false negatives (meaning less valid base64 encoded text from Terraform plans are properly diffed) is probably not ideal in this situation as also stated in the whitepaper.
Modifying the regular expression to not require any equal signs creates a large
number of false positive results and is thus virtually useless. As a result, we are left with an undesirable option: we either generate false positives or we generate false negatives. The best approach depends on the business problem you are trying to solve.
@jburnham thanks for brining up this issue! It's definitely a bug but as you have stated it's hard to detect. I'm not sure what the best course of action is. I think for now I'd rather keep functionality as-is since otherwise we won't be able to decode 1/3 of valid base64 encoded strings, namely those without a trailing =
(according to RFC 4648 the padding is optional)
Thoughts? I'll leave the issue open to track behavior and see if other people are running into the same issue. If anyone has a good solution I'd love to hear it.
Scenery is not actively maintained and the repo will be archived momentarily. I no longer have the time to maintain this tool nor do I think it should be kept being used as Terraform 0.11 has been deprecated for over a year now (scenery can only parse Terraform 0.11 plan outputs). Terraform 0.14 has some plan output changes as well as introduced the concept of concise diff plan outputs which does most of what scenery currently does.
If you'd like to add new functionality as you cannot upgrade your terraform version feel free you fork the repo.