facelessuser/wcmatch

Odd behavior with wcmatch.glob.glob()?

imackintouch opened this issue · 16 comments

Hello there. I was very pleased to see a python module that attempted to codify the extended glob pattern matching that one gets with the Bash command line. Great stuff!

However, in making use of it, I found a weird scenario.

Consider this piece of code run under Python 3.6:

glob.globfilter(["goo.cfg", "foo.bar", "foo.bar.cfg", 'foo.cfg.bar'], '!*.bar', flags=glob.EXTGLOB|glob.NEGATE)

It unexpectedly returns:

['goo.cfg']

That doesn't make sense to me. I would have expected:

['goo.cfg', 'foo.bar.cfg']

I meant, the following code behaves the way I would expect when I remove the negation:

glob.globfilter(["goo.cfg", "foo.bar", "foo.bar.cfg", 'foo.cfg.bar'], '*.bar', flags=glob.EXTGLOB|glob.NEGATE)
['foo.bar', 'foo.cfg.bar']

foo.bar.cfg doesn't appear and rightly so in the positive case so why is it disappearing in the negative one.

Could there be a syntactic misunderstanding that I am not picking up from the docs?

Regards,
IMc

Thanks for the report. This does look like a bug. I think I know why it is behaving as it is as well. I need to look into this further to be sure.

Thanks for the update Isaac! I hope that a resolution does not prove to be elusive for you.

Regards.

It looks like for our negate pattern

r'^(?!(pattern)).*$'

We should have been using something like:

r'^(?!(pattern)$).*$'

Basically looking to make sure the match we don't want also meets the end of the buffer. It appears it only break two tests (that were unexpected). Most likely this means the tests are probably wrong to begin with and just didn't get close enough scrutiny (I wrote so many tests near the end I probably just got tired and over looked some erroneous logic). It shouldn't be a difficult fix 🤞 .

I have a potential fix up in #25. Once I get around to evaluating the failing tests and resolving them, I should be able to get this fix out.

Nicely done Isaac. Turns out that in the app on my side of the fence, I will need some additional tests for negation as well as I only spotted this problem once I had rolled out a changed from got to my dev environment!

Some other quirks have surfaced as I've looked at this closer. It will take me a bit longer to work them out.

Okay, I think I got negate returning sane results in glob now. I will need to add some more tests, such as the ones in this issue. This was good as it also brought to my attention a globstar issue which I have also fixed. I probably won't get around to finishing this tonight, but hopefully soon. Recent changes are in the linked branch.

We apparently have a Windows issue that's cropped up with recent changes. I'll get to that tomorrow. I think it's real close though.

Everything looks to be fixed now, I'll try and get a release out later today.

Thanks Isaac. That was super fast!!

The more the issue bothers me, the quicker I usually get to it :). It bothered me that I missed this case. Though the secondary globstar issue bothered me even more.

Release has been made: https://github.com/facelessuser/wcmatch/releases/tag/2.1.0.

Feel free to let me know if you run into any other unexpected behavior. It's nice seeing it getting some use outside of my use cases as it helps to shake out overlooked things.

Once again, Isaac thanks for the fix. I will be playing around with version 2.10 in my environment later today and keep you posted!

Hi Isaac. Things look good for the simple negation case e.g. '!*.cfg". However, I have another question related to how to use multiple patterns in a negation expression.

If my pattern looks like this '!(.cfg|.bar)' with the flags glob.NEGATE and glob.EXTGLOB set I don't get results that I'd expect. I basically end up with the full contents of my current working directory...almost as if the pattern is completely unrecognizable.

Is there a special way that one needs to specify multiple negation patterns?

Can I ask what you are attempting to do? Without knowing specifics, I can only respond generally.

I'll start by saying, if you are specifying multiple patterns, and expect to delimit them with |, you run the pattern through globsplit. But it should be noted, that | within an EXTGLOB pattern are ignored as stated in the documentation. globsplit (or when using fnsplit when using fnmatch), are meant to allow an interface for accepting patterns chained with pipes that can be split, and then fed into the match, filter, whatever function you need.

Negation with ! is a convention that I've seen used in multiple places, which is why I've allowed it here, but you'll notice that there is a conflict when you use ! negation and !(...) EXTGLOB patterns. If I want to use both, which I use in my Rummage tool if enabled, I also enable MINUSNEGATE (along with NEGATE) to use - instead of !. This provides a conflict free way to have negation and EXTGLOB patterns.

Remember that the way negation works in Wcmatch's glob is that if a negation pattern is given, it is applied to the non negation pattern(s) that are given as well. It essentially filters out what you don't want to match out of whatever you do want to match. If no positive pattern is given, it assumes the positive pattern of ** or * if GLOBSTAR is not enabled. So !somepattern is like saying **|!somepattern where ** is treated like * if GLOBSTAR is not enabled.

So if I wanted to limit things further I could say more narrow pattern|!but not this. Or I could say narrow pattern|!but not this|!and not this either. When passed through split, it will return ['narrow pattern', '!but not this', '!and not this either']. You then feed the pattern list into globmatch or whatever it is you are doing.

The BRACES flag allows you do something similar to Bash's brace expansion and requires no additional calls. It will expand a pattern internally into multiple commands: !.{this,that} --> ['!.this', '!.that'].

I don't see an example of how you are attempting to use any of these features, so it is difficult for me to tell exactly what you are doing wrong or what you are attempting to achieve. If you have a specific question, I would open a new issue with an example, and we can discuss it there. That way we can keep each issue/question on topic.

I may have overwhelmed you with info above. Reading over your question again, I think I understand better what you are asking: Using MINUSNEGATE:

Assuming files:

foo.bar
foo.bar.cfg
foo.cfg.bar
goo.cfg

Code:

>>> glob.glob('!(foo.bar|goo.cfg)', flags=glob.N|glob.M|glob.E)
['foo.bar.cfg', 'foo.cfg.bar']
>>> glob.glob('-!(foo.bar|goo.cfg)', flags=glob.N|glob.M|glob.E)
['foo.bar', 'goo.cfg']
>>> glob.glob('!(*.cfg|*.cfg.bar)', flags=glob.N|glob.M|glob.E)
['foo.bar']