ytiurin/hyphen

Unexpected output when the input contains the “constructor” word

arseni-mourzenko opened this issue · 7 comments

There seems to be a problem with hyphen 1.6.2 when the input contains the word “constructor.” Here's a short example which reproduces the issue:

const hyphen = require('hyphen');
const patterns = require('hyphen/patterns/en-us');
const hyphenate = hyphen(patterns);
const result = hyphenate('Example of a constructor.');
console.log(result);

The expected output would be similar to:

Ex-am-ple of a cons-truc-tor.

but the actual output is:

Ex-am-ple of a function Object() { [native code] }.

Unit test

Here's the unit test which would reproduce the issue:

const createHyphenator = require("../hyphen.js");
const patterns = require("../patterns/en-us.js");

let hyphenate;

beforeAll(() => {
  hyphenate = createHyphenator(patterns, { hyphenChar: "-" });
});

describe("Constructor word hyphenation", () => {
  test("Should hyphenate constructor word correctly", () => {
    expect(hyphenate("constructor")).toBe("con-struc-tor");
  });
});

I haven't found the cause of the issue yet, but noticed a weird thing. In hyphen.js:420, the result of hyphenateWord is assigned to cache[nextTextChunk]. If I log cache[nextTextChunk] just after the statement, it shows indeed the incorrect value “function Object() { [native code] }” and the value of nextTextChunk is “constructor.” Now, if I log the value nextTextChunk before the line 420, then the unit test from my original post passes.

Nice find. I guess the reason is that cache is an object and using constructor as a key returns the function that was used to construct the object. I think we should use a Map instead of an object for the cache.

Exact. Another possibility is to replace the condition on line 419 if (!cache[nextTextChunk]) by if (!cache.hasOwnProperty(nextTextChunk)), although I don't know if this can cause side effects. I've seen, among others, an empty string in the cache when debugging, and the new condition will evaluate to true instead of false for an empty string.

Wouldn't we need to change every part of the code that reads/writes the cache using cache[nextTextChunk] too?

There are four occurrences of cache[nextTextChunk] in src/start.js:

  • The first one is the condition itself (line 58).

  • The second one, on the next line 59, is assigning a value to cache[nextTextChunk]. Since this is an assignment, there are no changes required here, naturally.

  • Line 66, if (nextTextChunk !== cache[nextTextChunk]) hyphenatedN++;, is comparing a string to the value from the cache. There is no need to change here anything, because of what happens at the lines 58 and 59. In fact, if (!Object.hasOwnProperty.call(cache, nextTextChunk)) on line 58 would take care of the case where nextTextChunk refers to reserved keyword such as “constructor.” Once a property is assigned to an object, the property accessor would always return the property itself, and not the special members such as the constructor, the isPrototypeOf method, etc. Here's an example of this mechanism in action:

    > var a = {}
    undefined
    > a['constructor'] // A constructor is returned.
    [Function: Object]
    > a['constructor'] = 123
    123
    > a['constructor'] // Now, the property itself is returned instead.
    123
    > a['toString'] // Same here, the method `toString` is returned.
    [Function: toString]
    > a['toString'] = 'xyz'
    'xyz'
    > a['toString'] // Now, `toString` is a value of the property.
    'xyz'
  • Finally, the fourth mention, line 68, doesn't need to be changed either, for the exact same reason as in the previous point.

The src/create-hyphenator.js file mentions the caches[...] in a few other functions, but this is a different object. The cache one is actually a single element of the caches object.

Note that when doing the change, I tried the version I suggested originally, that is, cache.hasOwnProperty(nextTextChunk). Luckily, eslint highlighted an issue with this approach, by saying:

Do not access Object.prototype method 'hasOwnProperty' from target object

Indeed, my approach would have caused an error as soon as the text contains the hasOwnProperty word. In fact, the code would happily hyphenate it, and assign the result to the cache['hasOwnProperty']. The next time cache.hasOwnProperty(nextTextChunk) is called, it would fail with Uncaught TypeError: a.hasOwnProperty is not a function exception. The correct approach is to do Object.hasOwnProperty.call(cache, nextTextChunk) instead.

Thanks for the detailed explanation. You seem to be an accurate guy. I always appreciate accurate people. They are so rare. 😞

Thank you all for the great library. It saved me a lot of time in one of the projects I work on, and was very easy to set up.