caoccao/Javet

Inconsistency between V8/Node and Javet runtime with unicode properties in regular expressions

Opened this issue · 7 comments

While trying to use an NPM package, I came across this weird bug.

Considering the following (minimal) example, which should create a regular expression that matches all unicode characters that have the letter property:

test.js

const regex = /\p{L}/u;

Running the code using node (node test.js) does not run into an issue.

However, when trying to run it using Javet and the following Kotlin code:

import com.caoccao.javet.interop.NodeRuntime
import com.caoccao.javet.interop.V8Host
import java.io.File

fun main() {
    val runtime = V8Host.getNodeInstance().createV8Runtime<NodeRuntime>()
    val file = File("./test.js")

    runtime.getExecutor(file).executeVoid()
}

I get the following error:

Exception in thread "main" com.caoccao.javet.exceptions.JavetCompilationException: SyntaxError: Invalid regular expression: /\p{L}/: Invalid property name
	at com.caoccao.javet.interop.V8Native.scriptExecute(Native Method)
	at com.caoccao.javet.interop.V8Runtime.execute(V8Runtime.java:922)
	at com.caoccao.javet.interop.executors.V8StringExecutor.execute(V8StringExecutor.java:107)
	at com.caoccao.javet.interop.IV8Executable.executeVoid(IV8Executable.java:170)
	at MainKt.main(Main.kt:9)
	at MainKt.main(Main.kt)

If I instead use the RegExp("\\p{L}", "u") syntax in the JS file, I get a JavetExceptionException that has the exact same message and stack trace.

The problem is also the same if I use a V8Runtime instead of a NodeRuntime.

Javet is up to date (2.2.3).

That feature belongs to i18n which is disabled in both Node.js and V8 mode.

It's possible to enable i18n in V8 mode with a private build. However, it's impossible to enable i18n in Node.js mode because that would result in JVM crash. As Javet will support Node.js V20 soon, that could be revisited in case the Node.js architecture was changed. Please let me know if this makes sense to you.

Thank you for the explanation!

Considering that the package I'm trying to use would be very beneficial to me, if there is any way you could make this work I would greatly appreciate it. Not mentioning the fact that it would remove the probability of other packages being incompatible for that reason.

However, if it is too much of a hassle to get it working, don't spend days on this issue, I may have an alternative solution that would not involve this particular package.

i18n on/off is not a flag that can be turned on/off easily. Actually, it's a set of C++ preprocessors. Turning it on implies:

  • The whole Intl is loaded.
  • Binary size increases.
  • Performance downgrades.
  • There's no way of turning it off.
  • The work on dedicated packaging + maintenance is not trivial (proper sponsorship is a must-have).

The majority of Javet users don't ask for i18n and avoid libraries that depend on i18n to work. I think as you pointed out it's better to look for an alternative solution without i18n.

Very few Javet users maintain their private builds with i18n by themselves. That shall work as well.

That would indeed not be ideal. However, having a private build that enables this feature would be a possibility for me, as simply forking and maintaining the fork up-to-date with a simple flag added won't be much work.

However, I have a few questions:

  • Do you have any insight about the performance impact? Is it minimal?
  • Is there any other side-effect that I should be aware of?
  • Is it a guarantee that it could be enabled in Node.js mode with Node 20?
    • Specifically about that, if that is something that must be worked on before being possible, is there any way I could help you on this issue, maybe contribute? I do not know the structure of the project, maybe that wouldn't be very efficient, but I should be able to

Could you join the discord to discuss with me? That's quite complicated and changes from time to time.

I would also be interested in this. I wanted to use some dependencies which use regular expressions like /\p{Lu}/gu which currently crash Javet.

I would also be interested in this. I wanted to use some dependencies which use regular expressions like /\p{Lu}/gu which currently crash Javet.

This was discussed in #222. You will need to create private builds with i18n enabled. If you want me to do that for you, please contact me at discord.