ejgallego/coq-serapi

Windows PATH length (again)

MSoegtropIMC opened this issue · 21 comments

The max file name length of a serapi build with Coq Platform standards path is beyond the 259 char Windows limit again and is now:

C:\bin\cygwin64_coq_platform\home\Michael\.opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\serlib_number_string_notation_plugin__Ser_number.impl.all-deps

The breakdown in parts comming from opam (or otherwise inevitable), from Coq Platform and from serapi+dune are:

C:\\.opam\.opam-switch\build\
bin\cygwin64_coq_platform\home\Michael\CP.2023.11.0~8.18~2023.11
coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\serlib_number_string_notation_plugin__Ser_number.impl.all-deps

I already shortened the coq platform controlled part of the path to support serapi (the CP used to be coq_platform), but I find it difficult to change the Coq Platform path naming in each version of Coq Platform to accomodate for increasing path length in serapi (which is the only Coq package having such issues).

I will find a way to shorten the path once more - either I put the opam folder in the cygwin root (which has the advantage that it does not depend on the user name) or I remove the _platform in the cygwin root path, but that's it then.

I woule appreciate if you put something in CI to check that the maximum build path length + package name with version numer, that is the coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\serlib_number_string_notation_plugin__Ser_number.impl.all-deps piece of the path does not exceed the 168 characters it has now in the future.

Btw.: with some help I could probably fix OCaml to support longer path length - in Windows the restriction was removed 10 years ago (one has to use a registry setting to enable long paths, since some apps might crash on long file names), but it is still somehow hard coded in OCaml or C libraries it uses. But I don't have the bandwidth to do thsi alone.

Sorry to hear you are having trouble again; I'm afraid there is little we can do from SerAPI as we cannot control paths which are set by OCaml tools or libraries.

Have you open an issue in ocaml/dune about this?

Have you open an issue in ocaml/dune about this?

You mean about the fact that the dune produced path is redundant (with 2x serlib_number_string_notation_plugin in it)?

I thought I created an issue about OCaml depending on MAX_PATH somehow, but can't find it right now. I will repeat my tests and create an issue then.

You mean about the fact that the dune produced path is redundant (with 2x serlib_number_string_notation_plugin in it)?

I mean more that Dune should be aware of the path limitation, and workaround it transparently.

@ejgallego : i now moved the opam folder to the cygwin root which gives some extra 12 or 13 chars. Now it chokes on:

C:\bin\cygwin64_coq_platform_2\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1/_build/default/serlib/plugins/syntax/.serlib_number_string_notation_plugin.objs/byte\serlib_number_string_notation_plugin__Ser_g_number_syntax.cmt0fcbf1.tmp

which has 263 chars (4 more than allowed). Any ideas what I could do? I don't wan to shorten the Coq Platform part any further, but I could do a test with a really short cygwin root path to see what is down the road?

Could one patch serapi without user visible effects and shorten the paths?

If not, I will probably give up on this.

@ejgallego : if you think this is the way to go, I think you are in a better position to write a dune issue - I don't know enough about the project structure and dune to argue that this would be the way to go.

@MSoegtropIMC I'm gonna discuss with Dune / OCaml to see what their opinion is, but IMHO indeed the build system should hide these details from us.

The other option would be to fix OCaml to support arbitrary path length and ask users to set the corresponding registry setting in Windows. It is a slight security risk, though, because it can lead to stack overwrite exploits otherwise not possible - not in OCaml but elsewhere because this is a system global setting. Many Windows apps still just allocate MAX_PATH for paths and will crash when Windows returns larger paths. This is the reason why long paths are disabled by default. Afaik it is technically supported since Windows 7.

So indeed having some sort of path shortening for build artefacts in dune might be the better option.

Moving ~/.opam to /opam did not fix it btw. It might be that changing C:\bin\cygwin64_coq_platform to C:\bin\cygwin_coq helps - I did a test build with a one letter root path and this did run through, but I didn't analyse that max path length as yet.

@ejgallego : I managed to reduce the path a bit more and now coq-serapi builds, but it is at the edge.

It is hard to find out what the max path length is, because the longest names have temporary files, which are not in the build folder after the build is finished. So one has to record file system calls during build. Anyway, I would appreciate if the path length of coq-serapi does not increase further.

For the records: currently the 10 longest paths with length (limit is 259) are:

258 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\byte\serlib_number_string_notation_plugin__Ser_g_number_syntax.cmtf38cd3.tmp
258 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\byte\serlib_number_string_notation_plugin__Ser_g_number_syntax.cmicfa3a1.tmp
256 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\native\serlib_number_string_notation_plugin__Ser_g_number_syntax.o.exe.lnk
253 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\serlib_number_string_notation_plugin__Ser_g_number_syntax.impl.all-deps
252 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\native\serlib_number_string_notation_plugin__Ser_g_number_syntax.o.lnk
252 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\native\serlib_number_string_notation_plugin__Ser_g_number_syntax.o.exe
250 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\native\serlib_number_string_notation_plugin__Ser_g_number_syntax.cmx
249 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\byte\serlib_number_string_notation_plugin__Ser_number.cmtf698c6.tmp
249 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\byte\serlib_number_string_notation_plugin__Ser_number.cmib59cd5.tmp
248 C:\bin\cygwin_coq_platform\opam\CP.2023.11.0~8.18~2023.11\.opam-switch\build\coq-serapi.8.18.0+0.18.1\_build\default\serlib\plugins\syntax\.serlib_number_string_notation_plugin.objs\native\serlib_number_string_notation_plugin__Ser_g_number_syntax.o

This is recoded using procmon and recording all CreateFile commands, save this as SerApiBuild_CreateFile.CSV and run

awk -F'","' '{print length($5) " " $5}' SerApiBuild_CreateFile.CSV | sort -n -r | uniq | head -n 10 > SerApiBuild_CreateFile.sorted

over the result.

nojb commented

Hello, one way to work around the max length limitation in Win32 paths is to use "verbatim paths"; these are paths prefixed with \\?\. The only issue to keep in mind is that these paths are interpreted verbatim, so in particular no slash normalization takes place, so the path must be using backslashes and cannot use forward slashes. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation.

nojb commented

Hello, one way to work around the max length limitation in Win32 paths is to use "verbatim paths"; these are paths prefixed with \\?\. The only issue to keep in mind is that these paths are interpreted verbatim, so in particular no slash normalization takes place, so the path must be using backslashes and cannot use forward slashes. See https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation.

Sorry, I had misunderstood your problem, I see that you are generating the paths yourselves. In that case, there is no easy fix, I'm afraid. Either OCaml or Dune would have to do something to address it, but in both cases it is a non-trivial amount of work.

Sorry, I had misunderstood your problem, I see that you are generating the paths yourselves.

Well I generate the prefix of the path, the (lengthy) postfix is generated by dune.

Prefixing \\?\. won't help because not Windows, but something in dune or the OCaml library restricts the path length. Since a while Windows has a reg option to allow arbitrary length paths, which is by default off, but I have it set. Setting this doesn't help with dune, but it does work in tests using plain C code.

I don't think it would be that hard to fix - one just has to scrutinize all OCam, dune and coq code for MAX_PATH.

nojb commented

something in dune or the OCaml library restricts the path length.

Thanks for the explanation. I will look into this and get back here.

nojb commented

(Sorry for the disjointed answers, I had forgotten many of the details about this subject.)

Prefixing \\?\. won't help because not Windows, but something in dune or the OCaml library restricts the path length. Since a while Windows has a reg option to allow arbitrary length paths, which is by default off, but I have it set. Setting this doesn't help with dune, but it does work in tests using plain C code.

The \\?\ mechanism works independently of the registry setting. The registry setting is used to be able to use long paths without having to add the \\?\ prefix, but in this case, the program (here, Dune) also needs to set a setting in its "manifest", as explained in https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation.

Regarding the \\?\ prefix, I tried the following:

let fn = "\\\\?\\C:\\Users\\nojebar\\" ^ String.make 200 'a' ^ "\\" ^ String.make 200 'b'

let () =
  Sys.mkdir (Filename.dirname fn) 0o644;
  try close_out (open_out fn) with Sys_error s -> print_endline s

and it works fine in my machine. Based on that, I would say that if you manage to prefix your paths with \\?\ then everything should work. Of course, it may be complicated to get Dune to always insert or preserve the prefix.

nojb commented

The \\?\ mechanism works independently of the registry setting. The registry setting is used to be able to use long paths without having to add the \\?\ prefix, but in this case, the program (here, Dune) also needs to set a setting in its "manifest", as explained in https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation.

If you want to test the registry/manifest way, you could do (assuming dune.exe lives in the current directory):

$ mt.exe /nologo -manifest manifest.xml -outputresource:'dune.exe;#1'

where manifest.xml contains:

<application xmlns="urn:schemas-microsoft-com:asm.v3">
    <windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
        <ws2:longPathAware>true</ws2:longPathAware>
    </windowsSettings>
</application>

@nojb : what I expect is that somewhere, I guess in the OCaml libraries, there is a restriction to 259 characters (MAX_PATH). If so, it won't help to prefix \\?\. since it would make the path even longer. The prefix only helps if the Windows API is the point where the path length restriction happens, and this is apparently not the case, because enabling long paths in the registry does not help.

The manifest and registry method should give the same result. But indeed the manifest method is preferable in case we get this fixed in the code, so that people don't have to change the registry setting.

To summarise what I tested is:

  • the Windows registry setting has no effect on the coq-serapi build
  • the Windows registry setting does have an effect on MinGW C code (fopen, fprintf, fclose afair)

I am not at my Windows machine right now, so I don't have the details at hand.

nojb commented

The manifest and registry method should give the same result. But indeed the manifest method is preferable in case we get this fixed in the code, so that people don't have to change the registry setting.

To summarise what I tested is:

  • the Windows registry setting has no effect on the coq-serapi build
  • the Windows registry setting does have an effect on MinGW C code (fopen, fprintf, fclose afair)

You actually need both the registry setting and the manifest setting in the program (this is not clearly stated in the MS documentation).

You actually need both the registry setting and the manifest setting in the program (this is not clearly stated in the MS documentation).

Ah OK interesting - I wonder why the C code did work then - possibly this is in the default manifest for MinGW - need to check.

I will check what happens if I set the manifest to all involved executables.

Closing this in favor of coq/platform#331 , which is bug is a duplicate.

Duplicate of coq/platform#331

@ejgallego : it is currently solved in Coq Platform (mostly by moving the .opam folder from /home//.opam to /.opam) but please, please, please make sure that the paths don't get any longer. There is 1 char left and that's it. I have no ideas how I could shorten the root path any further.

@MSoegtropIMC I'm aware of this, I can try to make the paths shorter once we find the problem, but unfortunately the length of paths used in build depends on choices by OCaml, Dune, and Coq itself so I'm afraid I have little control here.

[For example, I could add some CI to check this, but it would be brittle]