Reduce filename length
skymoo opened this issue · 28 comments
I am using gocryptfs
to reverse encrypt some files and then sync the resulting encrypted files to OneDrive for remote backup. Everything seems to be working fine but when I sync to OneDrive using rclone I get a few errors of the form:
2020-09-01 21:54:13 ERROR : 2Ddc9ywAOFEbIdlgPXI19Q/ZUyJffdGZg_dBIZ-0zuFVg/42PW1SMQDlCuh68QjKLUeEBsTRCsnCSWSMO7606Kaq4/fOOjbi8sU9D8N2VQz1ZLtHFIz9iBL1BpZmVgRHEzL50oxuTvstwTezFRvSVRFs2JuF4UsKicdQMXzaUcBqgQ6Q/lP_OT__s6GQ_CkoghPVuKtQfbWeaP1iGunypwq-BpSWBn4D_hQPM9fdNhhXjzd2Lv7oG__zNDr8N4O9Kt4y4TGlx0u-aawIDoG2E-LzMwbp_mSz5Mw1jEmGTwvSXeKc2SFXuhcoaNHFIEidTbLBUZV7UzfVdyUshSF1xYhK3XmiDB4lqxV16RzLVYLWditAo: Failed to copy: pathIsTooLong: Path is too long
I'm currently using the default feature flags as applied with the -reverse
option, i.e.:
"FeatureFlags": [
"GCMIV128",
"HKDF",
"DirIV",
"EMENames",
"LongNames",
"Raw64",
"AESSIV"
]
Is there a way to make the encrypted filenames shorter?
I'm using gocryptfs-1.8.0
as packaged in Fedora 32.
Hmm, looking at this in detail, I don't understand why you get this error. The mentioned path
2Ddc9ywAOFEbIdlgPXI19Q/ZUyJffdGZg_dBIZ-0zuFVg/42PW1SMQDlCuh68QjKLUeEBsTRCsnCSWSMO7606Kaq4/fOOjbi8sU9D8N2VQz1ZLtHFIz9iBL1BpZmVgRHEzL50oxuTvstwTezFRvSVRFs2JuF4UsKicdQMXzaUcBqgQ6Q/lP_OT__s6GQ_CkoghPVuKtQfbWeaP1iGunypwq-BpSWBn4D_hQPM9fdNhhXjzd2Lv7oG__zNDr8N4O9Kt4y4TGlx0u-aawIDoG2E-LzMwbp_mSz5Mw1jEmGTwvSXeKc2SFXuhcoaNHFIEidTbLBUZV7UzfVdyUshSF1xYhK3XmiDB4lqxV16RzLVYLWditAo
is 369 characters long. According to https://support.microsoft.com/en-us/office/restrictions-and-limitations-in-onedrive-and-sharepoint-64883a5d-228e-48f5-b3d2-eb39e07630fa#filenamepathlengths :
The entire decoded file path, including the file name, can't contain more than 400 characters
This contradicts the microsoft article
Please note the definition of path length in the documentation. For example, for Sharepoint paths, the longer the company URL domain name and site name, the shorter the actual length the user can upload.
Take the example in the documentation:
The prefix has taken up 38 characters and is counted in the total path length.
personal/meganb_contoso_com/Documents/
This means that there are only 362 characters left to use.
In my actual tests, I can only upload path lengths up to 342 characters under my account.
Although gocryptfs shortens long filenames and keeps track of the actual filenames in a separate file, the names are still too long.
Cryptomator also shortens filenames, but makes it so that conversion starts as soon as the encrypted file name reaches 220 characters (1 ASCII character = 1 byte). Whereas gocryptfs only does conversions when the filename exceeds 255 bytes.
https://docs.cryptomator.org/en/1.5/security/architecture/#name-shortening
However there is still the problem of the whole path being too long.
Thank you for referring to this solution.
The above mentioned approach uses Unicode characters with printable ranges for the file names.
In my testing, using Unicode characters for encoding effectively addresses the file name and path length limitations of macOS AFPS, Windows NTFS file systems, and some cloud vendors (specifically, Microsoft OneDrive).
-
APFS (macOS / iOS)
- name: 255 UTF-8 characters
- path: no limit
- notes: macOS has a limit of 1024 bytes PATH_MAX
-
NTFS (Windows)
- name: 255 UTF-16 characters
- path:
- default: 260 UTF-16 characters
- opt-in Long Path: 32767 UTF-16 characters
For OneDrive, the path length is calculated as the length of the decoded Unicode character, so it can easily carry more information.
In my tests, the problem of long file names can be solved in this way and works with local file systems and cloud storage providers that support Unicode characters.
I have too encountered issues with file names being too long and not working in gocryptfs in macOS. This happens a lot when the file names are in foreign languages. If the solution above solves it, that is great news.
Yep, the boxcryptor's approach of using 4000 carefully chosen unicode chars seems viable.
The boxcryptor thing is windows-only, won't use that one :)
The boxcryptor thing is windows-only, won't use that one :)
There is a smiley, so this is probably some jokey nudge. Let's get more to the point then - is there any technical concern why the "higher density encoding" (e.g. using those 4000 carefully chosen characters) couldn't be used by gocryptfs?
Essentially, the boxcryptor trick works because on windows, the length limit is is counted in characters. That means you can have as many chinese characters as you could have ASCII characters.
That is not the case on linux, where the limit is counted in bytes. And a chinese character takes more bytes than an ASCII one, which eats up the gains of having a larger alphabet.
Yes, this is all clear. Though my intuition says it wouldn't eat up the gains on Linux (simply because Linux filesystems APIs have generaly much higher limits than Windows filesystems API) and second it would solve the OP's problem.
My plan of action is is to add a command-line parameter to -init
called something like -longnamemax
.
gocryptfs already stores names that are longer than 255 bytes in a .name
file (see https://nuetzlich.net/gocryptfs/forward_mode_crypto/#long-file-name-handling ). The -longnamemax
parameter will allow to change the value from 255 to something lower (100 for example). This will guarantee that each path component is at most 100 bytes long.
@rfjakob does this mean many more people would need to back up also .name
in each directory along with gocryptfs.conf
to avoid accidental loss of access? Isn't the trick with chinese chars a much better heuristics without such risks and high probability of solving most practical cases (sure, it just "delays" the problem - but I'd guess it surpasses the "critical" threshold of practical cases, so it's much more worth it)?
If you finally choose to add the -longnamemax
parameter to solve the problem, can you provide more options for this parameter, for example, you can add a number after the parameter to define the limit
This may prevent the length of the path from exceeding the limit in some special cases
such as
-longnamemax 20
-longnamemax 55
@dumblob
(1) one .name
file for each file with a long name
(2) I won't consider the chinese char trick, as it only works on Windows
@BrsyRockSs Yes exactly, -longnamemax NUMBER
. Note that values below 67 do not make sense, as a .name
file is 67 bytes long by itself. Looks like this:
gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name
@rfjakob I started working on adding longnamemax to cppcryptfs.
I found that I still have this bug in cppcryptfs #143 that you fixed in gocryptfs years ago. I just wanted to point out that the gocryptfs man page still says 176 bytes where I think it should say 175.
I don't understand why the gocryptfs man page says the minimum value for LongNameMax is 62, but the .name files are 67 chars (with the .name extension). Shouldn't the minimum LongNameMax value be 67 to reflect that files with names that long will be created?
(1) 175 vs 176: Yes, this should read 175, thanks! Fixed in d530fbd .
(2) 62 vs 67: Yes, as you have observed, the .name file is 67 bytes, example:
gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10 = 62 bytes
gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name = 67 bytes
But, if you have nested directories, only the first file becomes part of the path. So with -longnamemax=62 you can get a shorter complete path. But the basename will still need 67 bytes.
@dumblob (1) one
.name
file for each file with a long name (2) I won't consider the chinese char trick, as it only works on Windows@BrsyRockSs Yes exactly,
-longnamemax NUMBER
. Note that values below 67 do not make sense, as a.name
file is 67 bytes long by itself. Looks like this:gocryptfs.longname.nONaEDDZOrwtQdXPH1SxSFkPtOc8srIyB82ZuduqG10.name
hi, thx for the function,
yet if you (or others who can contribute) could suggest the -longnamemax valuve for laymen like me, at least for the big3: onedrive, box, dropbox would be nice. google support very long file name/path so it never be an issue.
or if possible mention the values in the manual (even list as unknown) etc.
thanks
@ccchan234
Short file names can get short paths, but at the same time, more files need to be created Name file may cause additional performance burden on frequently accessed file systems. I think you should consider it in combination with the files you want to store
Is it better path compatibility or better access performance
Why don't you use Long Path Tool? It is super easy and solves all these problems. Also, good for path compatibility.
@ccchan234 Short file names can get short paths, but at the same time, more files need to be created Name file may cause additional performance burden on frequently accessed file systems. I think you should consider it in combination with the files you want to store Is it better path compatibility or better access performance
my next PC is gonna faster than most supercomputers 5 years ago......
Longpath files are too irritating - to make my life easier, I use LongPath Tool.
Longpath files are too irritating - to make my life easier, I use LongPath Tool.
can you tell more?
even in win10 i enabled very very long path in registry, it's not enough for me. thanks
Longpath files are too irritating - to make my life easier, I use LongPath Tool.
Personal license
( non-commercial usage)
$59.97
with gpt4 i could duplicate many small programs in a day.
Longpath files are too irritating - to make my life easier, I use LongPath Tool.
can you tell more?
even in win10 i enabled very very long path in registry, it's not enough for me. thanks
You can get all the information here: https://longpathtool.com/
Long Path Tool is effective in solving these kinds of problems. I have been using it since a month and my life has become easier.
This "Long Path Tool" stuff is useless spam. Please don't feed the spammers.