cavejay/Strippy

Match alias's are being 'found' during sanitation stage

Opened this issue · 4 comments

Currently if you have a rule that replaces a string with 'abcdefg' and another that replaces 'cde' with 'memes' you'll could end up with abmemes123fg4 which is obviously an undesirable outcome.

in order to prevent this patterning the 'cde' -> 'memes' sanitisation should occur before the 'xx' -> 'abcdefg' sanitisation.

To resolve this bug please either:

  • add a warning to the config file about this behaviour with steps prevent it (order the rules in the config file to avoid this behaviour)
  • automatically resolve the ordering of sanitisation at run time using magic (unwritten code)
  • add a config entry for sanitisation ordering

On second thoughts I don't think this is a straight forward fix. Matches are dynamically found and replaced with keys based on length. In order to do something other than this the sanitation stage would need to be more intelligent and 'lock' keys once they'd been switched in for a match.

In the end this bug doesn't stop the files from being cleaned, it just creates an ugly output.

This bug is also the cause behind the script recursively replacing some rules. When this occurs sanitising may never complete and the resulting files are much larger than the originals.

Example of rule:

"<some regex>"=CleverWellThoughtOutName
"<some regex that matches 'WellThought'"=AnotherCleverWellThoughtOutName

This is something that's quite hard to predict as it requires us to check if the second rule might ever match a replacement string. It is something we could do after collecting all the keys and then warn the user and/or exit out to prevent loss of time.

Closing #43 should prevent this happening for number-based overlaps.