yarnpkg/berry

[Bug?]: Cannot spawn yarn if there're non-ASCII characters in path

ilharp opened this issue · 1 comments

Self-service

  • I'd be willing to implement a fix

Describe the bug

Cannot spawn yarn (eg. using node:child_process or any other spawn library) if there're non-ASCII characters (eg. å or ) in project path.

To reproduce

cd C:
mkdir å
cd å
yarn init -2
echo 'require("node:child_process").execSync("yarn --version", { stdio: "inherit" })' > index.js
yarn
yarn node index

Produces the following output:

image

Environment

System:
    OS: Windows 10 10.0.22621
    CPU: (12) x64 Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz
  Binaries:
    Node: 18.12.1 - C:\Users\***\AppData\Local\Temp\xfs-1d926b3a\node.CMD
    Yarn: 3.4.1 - C:\Users\***\AppData\Local\Temp\xfs-1d926b3a\yarn.CMD
    npm: 8.19.2 - C:\Program Files\nodejs\npm.CMD

Additional context

This bug is caused by the following code:

if (process.platform === `win32`) {
// https://github.com/microsoft/terminal/issues/217#issuecomment-737594785
const cmdScript = `@goto #_undefined_# 2>NUL || @title %COMSPEC% & @setlocal & @"${argv0}" ${args.map(arg => `"${arg.replace(`"`, `""`)}"`).join(` `)} %*`;
await xfs.writeFilePromise(ppath.format({dir: location, name, ext: `.cmd`}), cmdScript);
}

L44 saves this script using UTF-8. On Windows, Command Prompt determines the encoding used to decode and interpret batch files based on the current system "Codepage", which is usually not UTF-8.

This leads to the following situations:

  1. The generated script contains the full paths of node.exe and yarn.cjs, so if either the node or yarn directory contains non-ASCII characters, this bug will occur.
  • For node, the error will be The System Cannot Find The Path Specified.
  • For yarn, the error will be Error: Cannot find module '<wrongly encoded path>\yarn.cjs'.
  1. ASCII characters are compatible with all codepages. For paths containing only ASCII characters, although the codepage is diverse, this bug will not occur.

In addition, the following points are worth noting:

  1. This bug is Windows specific. Only Command Prompt on Windows decodes and interprets scripts using encodings other than UTF-8.

  2. This issue is basically the same as #2397. While there is #2499 for fixing #2397, I believe #2499 doesn't actually fix this.

  3. This bug cannot be fixed from the yarn side.

To fix this bug, Yarn must detect the current codepage of the system, and then select the corresponding encoding to save the script. Unfortunately, node does not provide a corresponding API, so implementing codepage detection requires depending a C++ binding package, which is not suitable for Yarn.

This issue is just to clarify the current situation and to remind latecomers, so feel free to close this issue.

  1. Temporary workaround: currently, the best solution is to move both node.exe and yarn.cjs to paths containing only ASCII characters.

Update: as this comment in #2397 says, setting chcp 65001 before executing can solve this, so a fix could be

 if (process.platform === `win32`) {
   // https://github.com/microsoft/terminal/issues/217#issuecomment-737594785
+  const cmdScript = `@chcp 65001\r\n@goto #_undefined_# 2>NUL || @title %COMSPEC% & @setlocal & @"${argv0}" ${args.map(arg => `"${arg.replace(`"`, `""`)}"`).join(` `)} %*`;
   await xfs.writeFilePromise(ppath.format({dir: location, name, ext: `.cmd`}), cmdScript);
 }

Notice that chcp 65001 must be set before execution and cannot be placed on the same line.