EliziumNet/RexFs

How I can remove all leading and trailing spaces from a folder and all its subfolders names?

Closed this issue · 12 comments

How I can remove all leading and trailing spaces from a folder and all its subfolders names?

Hi @Emasoft, sorry I haven't responded to this request sooner, I only just noticed this issue.
I am not sure I fully understand your query. Leading and trailing spaces can't appear on directory or file names, so the scenario you describe can't exist. I suspect I am not fully understanding your query so more information would be required.

Hi @plastikfan , unfortunately leading and trailing spaces can and do exist in files and folders names. I have a bunch of folders with many trailing characters that I created programmatically on a Mac (simply truncating longer names at a predefined number of chars, causing many of those to be truncated just after a space between words) and transferred on a Win/NTFS system, and they preserved the damn spaces.
I happened to discover this because I can’t synch the backup of those folders on OneDrive, because OneDrive truly does not allow any leading or trailing spaces.
Please read this:
https://www.ghisler.ch/board/viewtopic.php?t=38812

Fact is that file and folder names with trailing spaces can and do exist on NTFS partitions. They can be created by hackers, viruses, buggy programs, the user, other systems like a Mac client. The user could also be accessing a NTFS drive from whatever device not using Windows. Why shouldn't there be trailing spaces. I expect a filemanager to perform file management operations like copying, deleting, renaming and moving on NTFS drives as good as it possibly can.

Also:
https://docs.microsoft.com/en-us/troubleshoot/windows-server/backup-and-storage/disk-space-problems-on-ntfs-volumes#invalid-file-names

https://support.microsoft.com/en-us/office/onedrive-can-rename-files-with-invalid-characters-99333564-c2ed-4e78-8936-7c773e958881?ui=en-us&rs=en-us&ad=us

NTFS support trailing spaces in path names because it is POSIX compliant:

NTFS removes the 8.3 file name limit that MS-DOS introduced. NTFS supports case-sensitive, long file names, as well as the Unicode standard, and provides POSIX file name compatibility by supporting trailing dots and trailing spaces.
source: https://flylib.com/books/en/4.84.1.65/1/

Folders or files that contain leading or trailing spaces are acceptable in NTFS; however, these files are not acceptable in the Win32 subsystem. Therefore, neither Windows Explorer nor a command prompt can reliably handle files that have leading or trailing spaces.
source: https://www.betaarchive.com/wiki/index.php?title=Microsoft_KB_Archive/315688#Invalid_File_Names

So NTFS supports trailing spaces but Windows will not save them. But there exists a bunch of other kind of “spaces” characters that are still saved by Windows:
https://docs.microsoft.com/en-us/troubleshoot/windows-client/shell-experience/file-folder-name-whitespace-characters

You can also find many scripts to fix the folders and files names before synching to OneDrive:

https://github.com/UoE-macOS/jss/blob/master/utilities-fix-file-names.sh

https://github.com/soundsnw/mac-sysadmin-resources/blob/a44aed1564cb564b7833dc1361e9ccbf57c8c751/scripts/fix-onedrive-filenames-apfs.sh

Unfortunately there are none for windows. So I came to you and your cool script, but it seems you are not aware of the issue. Can you add this feature?

Even better would be to feature a predefined option to sanitize folders and files for all invalid characters in OneDrive, so backups can be made with no worries:

https://support.microsoft.com/en-us/office/restrictions-and-limitations-in-onedrive-and-sharepoint-64883a5d-228e-48f5-b3d2-eb39e07630fa#invalidcharacters

Oh ok fair enough. I've not tested the remy command on this scenario, but it should be able to handle this. Have you tried:

gci <path> -rec -dir | remy -cut ' '

By design, the remy command always trims the result of any operation to prevent leading and trailing spaces. The example above will remove the first space it will find. Then then the post processing kicks in causing the trim to be activated. I would be interested to know what the result of this is so please let me know.

Actually if you have files that also contain rogue spaces then you can leave off the -dir switch in the gci command

Oops I just realised, this is not the correct command. You should also filter the input to ensure that remy only sees the violating entries. This should be done with 2 commands:

Leading spaces:

gci <path> -filter ' *' -rec | remy -cut ' '

Trailing spaces:

gci <path> -filter '* ' -rec | remy -cut ' ',L

Note that in the second command, you must use the ,L which targets the last space

You should always use the whatif flag before running the command for real, to check that your command works as expected. Also before running the command in full, just check that the gci portion of the pipeline returns the items you expect.

Your comment about sanitising bad characters before syncing to onedrive is a valid request. You should also know that the remy command also validates against invalid file system characters and will not proceed with the rename if invalid characters are present. It will consider 'any' character deemed to be invalid either on posix or windows to be invalid regard less of what os is running in recognition of the fact that directories can be copied across different filing systems.

What I hadn't accounted for was the possibility that the original path names may contain bad characters. However I suspect that powershell may itself have issues seeing and being able to handling file system entities with bad characters. If it can, then I could add a repair facility to the command. If powershell can access these entries, then please create a new issue for this particular issue and I will add this repair facility for you. Cheers

Ok, when I go home later I will try those.
And yes, please, add a SANITIZE-FOR-ONEDRIVE option. Or a SANITIZE method where you can pass profile files with forbidden characters customized for all different file systems.

On further analysis and reading some of the references you provided, I am a bit sceptical that powershell will be capable of reading entries from the file system with leading/trailing spaces or invalid characters. If this is the case, then unfortunately, remy will not be able to repiar them. So we are depending on whether the gci command can access these entries; I'll await you testing the commands I specified previously and be happy to help if I can.

Ok. I will test it. But it would be strange if Powershell would be so limited. Low level file system access should be possible in C#, so why not in Powershell?
I was already imagining an .spf (sanitize profile file) for every OS.
So you could just type:

gci <path> -sanitize windows10.spf
gci <path> -sanitize onedrive.spf
gci <path> -sanitize gdrive.spf
gci <path> -sanitize dropbox.spf
gci <path> -sanitize iclouddrive.spf
gci <path> -sanitize exfat.spf
gci <path> -sanitize ntfs.spf
gci <path> -sanitize apfs.spf
gci <path> -sanitize linux-ext4.spf
gci <path> -sanitize linux-f2fs.spf
gci <path> -sanitize linux-erofs.spf
gci <path> -sanitize linux-btrfs.spf
gci <path> -sanitize linux-gfs2.spf
gci <path> -sanitize ufs2.spf
gci <path> -sanitize xfs.spf
gci <path> -sanitize zfs.spf
gci <path> -sanitize playstation-pfs.spf
gci <path> -sanitize amiga-ffs.spf
gci <path> -sanitize cd-iso9660.spf
etc…

It would be AWESOME.

Well fingers crossed, let's hope that powershell can see these entries. The sanitize option would be on the remy command as we can't modify the paramters on gci, it being a microsoft command (well unless we use the proxying technique, but let's not dive into that can of worms).

Considering a file could be copied from 1 file system to another, don't you think it would be best to just have 1 profile; ie a character considered bad on one file system should be considered bad for all file systems. I would rather just a single file that contains the sum total of all the characters in all the profiles you listed above, but I stand to be corrected, if you think there is a good reason why this approach should not be pursued.

Actually, your proposal about implementing different profiles is a good one. I just read the onedrive link you referenced and its clear to me now that we can'tjust have a single universal profile. Perhaps we could chat on gitter later this evening? You can find me as @plastikfan on gitter, or pehaps discord?