niieani/gpt-tokenizer

Import Issue on codebases with both module and common js files

Closed this issue ยท 4 comments

Hello! Let me start by saying this is a great fork, I love the isWithinTokenLimit utility!

We are having an issue in my organization when using this library, our codebase is split between commonjs and module javascript files, and for reasons outside of the scope here, we cannot define our project as modules on package.json.

The issue is as follows: When we import the library from an .mjs file, node tries to import the files under /esm/.js as common js files because of the extension, so it throws an error about invalid syntax.

image

This doesn't happen when importing from a *.cjs file
image

We managed to fix by "proxying" the library through a .cjs file, which works but it is quite unelegant.

Possible solution: The generated files under /esm/ could have the *.mjs extension, so node can infer the proper type of file by the extension alone or something like this microsoft/TypeScript#18442 (comment)

How to reproduce: https://github.com/gbrlmtrz/minimum-reproducable-tokenzier/tree/main on node 18 and 16

For what it's worth - I'm experiencing a similar problem but the other way around. We use modules in a typescript environment and it can't find the exports :\

Thanks for reporting, should be fixed in the next version.

๐ŸŽ‰ This issue has been resolved in version 1.0.4 ๐ŸŽ‰

The release is available on:

Your semantic-release bot ๐Ÿ“ฆ๐Ÿš€

For any one else hitting this in TypeScript but using CommonJS, try importing like this:

import { encode } from 'gpt-tokenizer/cjs/main'