gsutil regex - not the same behaviour since last version (v5.16)
4sushi opened this issue · 7 comments
Hello, since yesterday (2022-10-27) with the upgrade of gsutil to version 5.16, we have some issues with gsutil command. That impact most of our CICD jobs, on Github Actions.
This command with regex expression do not have the same behaviour anymore:
gsutil -m rsync -c -d -r -x "\..*|.*/\.[^/]*$|.*/\..*/.*$|_.*" . ${{ secrets.BUCKET }}/project
Result with gsutil 5.16 (latest):
Building synchronization state...
Skipping excluded path ./packages.yml...
Skipping excluded path ./e0859b964bdbc629ad7f2f98...
Skipping excluded path ./makefile...
Result with gsutil 5.15:
Building synchronization state...
Starting synchronization...
Copying file://./8c6353775efb45e0657f2a11 [Content-Type=application/octet-stream]...
/ [0 files][ 0.0 B/ 3.8 KiB]
/ [1 files][ 3.8 KiB/ 3.8 KiB]
Do you have any idea of the reason of this change?
Thanks.
@4sushi
I have the same issue, except that it's working on version 5.14 and it doesn't work on version 5.15 in my case
I just ran into this as well using 5.16. I think my particular problem is due to #1602, which I got to by blaming
gsutil/gslib/commands/rsync.py
Lines 734 to 736 in 2fd9759
The merge commit of this PR is in v5.14...v5.15 so I think I will get the same problem with 5.15 (unverified).
I suspect that the format string should be something like {}/(?:{})
. Otherwise, if alternation (|
) is used as in @4sushi's example, the precedence dictates that the first component (base_url_str
) binds with the LHS of |
, which usually doesn't make sense (a subtle point is that if your RHS expression is unanchored, like .*/node_modules/.*
, it does work). Also, the use of ^
doesn't make sense anymore either, which was another source of breakage for me (although at this point I'm not sure if my use of ^
was ever legitimate... anyways, before it was working as I intended).
So the current workaround seems to be
- Enclose your original
-x
parameter in(?:
...)
- Remove/replace the use of
^
accordingly
@tomonacci Great catch! You are right. The format string should include a non-capturing group. Will work on the fix ASAP, however, because of the release freeze, this might take some time to go out.
For me at least, with gsutil version 5.17 the 'Skipping excluded path' messages no longer appear. However, messages like the following do appear:
Skipping excluded directory /home/user-name/Projects/project-name/build/libs...
Skipping excluded directory /home/user-name/Projects/project-name/build/classes...
Skipping excluded directory /home/user-name/Projects/project-name/build/generated...
Skipping excluded directory /home/user-name/Projects/project-name/build/tmp...
And that is a change in behavior from gsutil versions prior to version 5.15.
same issue , 5.17 version skips directories
This is working as intended, although this has been a breaking change for people who were relying on negative lookahead to "include" paths from directories that would otherwise logically be excluded. I believe that may have been why #1637 popped up. For that reason I'm working on #1642, which will bring back the old behavior for -x
and put the new directory-excluding behavior in a new flag -y
.
As you intended, for me gsutil version 5.18 has restored the old behavior for -x. Thanks for working through this.