[Bug?]: Cannot spawn yarn if there're non-ASCII characters in path
ilharp opened this issue · 1 comments
Self-service
- I'd be willing to implement a fix
Describe the bug
Cannot spawn yarn
(eg. using node:child_process
or any other spawn library) if there're non-ASCII characters (eg. å
or あ
) in project path.
To reproduce
cd C:
mkdir å
cd å
yarn init -2
echo 'require("node:child_process").execSync("yarn --version", { stdio: "inherit" })' > index.js
yarn
yarn node index
Produces the following output:
Environment
System:
OS: Windows 10 10.0.22621
CPU: (12) x64 Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz
Binaries:
Node: 18.12.1 - C:\Users\***\AppData\Local\Temp\xfs-1d926b3a\node.CMD
Yarn: 3.4.1 - C:\Users\***\AppData\Local\Temp\xfs-1d926b3a\yarn.CMD
npm: 8.19.2 - C:\Program Files\nodejs\npm.CMD
Additional context
This bug is caused by the following code:
berry/packages/yarnpkg-core/sources/scriptUtils.ts
Lines 41 to 45 in 873e9d8
L44 saves this script using UTF-8. On Windows, Command Prompt determines the encoding used to decode and interpret batch files based on the current system "Codepage", which is usually not UTF-8.
This leads to the following situations:
- The generated script contains the full paths of
node.exe
andyarn.cjs
, so if either the node or yarn directory contains non-ASCII characters, this bug will occur.
- For node, the error will be
The System Cannot Find The Path Specified
. - For yarn, the error will be
Error: Cannot find module '<wrongly encoded path>\yarn.cjs'
.
- ASCII characters are compatible with all codepages. For paths containing only ASCII characters, although the codepage is diverse, this bug will not occur.
In addition, the following points are worth noting:
-
This bug is Windows specific. Only Command Prompt on Windows decodes and interprets scripts using encodings other than UTF-8.
-
This issue is basically the same as #2397. While there is #2499 for fixing #2397, I believe #2499 doesn't actually fix this.
-
This bug cannot be fixed from the yarn side.
To fix this bug, Yarn must detect the current codepage of the system, and then select the corresponding encoding to save the script. Unfortunately, node does not provide a corresponding API, so implementing codepage detection requires depending a C++ binding package, which is not suitable for Yarn.
This issue is just to clarify the current situation and to remind latecomers, so feel free to close this issue.
- Temporary workaround: currently, the best solution is to move both
node.exe
andyarn.cjs
to paths containing only ASCII characters.
Update: as this comment in #2397 says, setting chcp 65001
before executing can solve this, so a fix could be
if (process.platform === `win32`) {
// https://github.com/microsoft/terminal/issues/217#issuecomment-737594785
+ const cmdScript = `@chcp 65001\r\n@goto #_undefined_# 2>NUL || @title %COMSPEC% & @setlocal & @"${argv0}" ${args.map(arg => `"${arg.replace(`"`, `""`)}"`).join(` `)} %*`;
await xfs.writeFilePromise(ppath.format({dir: location, name, ext: `.cmd`}), cmdScript);
}
Notice that chcp 65001
must be set before execution and cannot be placed on the same line.