React Native Crash: `SyntaxError: Invalid RegExp: Invalid escape`

Question

React Native Crash: `SyntaxError: Invalid RegExp: Invalid escape`

julian-hecker opened this issue a year ago · 4 comments

Hey Langchain team, lovely project here!

I am working on integrating LangChain and OpenAI into a React Native application. I've faced many issues so far, but I'm almost there.

One issue I'm facing is once a response has been streamed from OpenAI, LangChain tries to count the tokens using TikToken. However, when encoding the tokens, TikToken tries to create a RegExp with a feature that appears to not be supported in Hermes, the Expo React Native runtime I'm using. I have already raised an issue on the TikToken repo.

While we wait for that issue to be fixed, it may be prudent to wrap the call in a try/catch block, as is being done in some other places TikToken is used. The offending line in this case is here: langchain-core/src/language_models/base.

Hope I can be of some help to get this resolved :)

Answer 1 · 2023-12-01T03:08:40.000Z

🤖

Hi @julian-hecker,

I'm Dosu, a bot that's here to help you with troubleshooting, answering questions, and getting you contributing to LangChain. While we're waiting for a human maintainer, I'm here to assist. Let's dive in!

Thank you for your detailed report and suggestion, julian-hecker.

Your suggestion to wrap the TikToken call in a try/catch block is indeed a good approach to handle the SyntaxError: Invalid RegExp: Invalid escape issue. This error typically occurs when a regular expression is not properly escaped. In JavaScript, regular expressions are often used in string manipulation methods like match(), replace(), search(), etc.

In the context of the LangChain code, the TikToken call could be wrapped in a try/catch block as follows:

export async function getEncoding(
  encoding: TiktokenEncoding,
  options?: {
    signal?: AbortSignal;
    extendedSpecialTokens?: Record<string, number>;
  }
) {
  try {
    if (!(encoding in cache)) {
      cache[encoding] = caller
        .fetch(`https://tiktoken.pages.dev/js/${encoding}.json`, {
          signal: options?.signal,
        })
        .then((res) => res.json())
        .catch((e) => {
          delete cache[encoding];
          throw e;
        });
    }

    return new Tiktoken(await cache[encoding], options?.extendedSpecialTokens);
  } catch (error) {
    console.error('An error occurred:', error);
  }
}

In this modification, if an error occurs during the execution of the TikToken call, the error will be caught and logged to the console. This will prevent the application from crashing and allow you to diagnose the issue.

As for the function of the TikToken call in the LangChain codebase, it is used to create a new instance of Tiktoken with the specified encoding and optional extended special tokens. This function first checks if the requested encoding is already in the cache. If it's not, it fetches the encoding from a remote source and stores it in the cache. Then, it creates a new Tiktoken instance with the fetched encoding and any extended special tokens that were provided in the options.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Best,
Dosu

Sources

langchain-core/src/utils/tiktoken.ts

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Answer 2 · 2023-12-01T03:21:33.000Z

Initializing an instance of TikToken isn't the issue, it's when you call the encode method. Wrapping new TikToken() with a try/catch won't do too much.

Answer 3 · 2023-12-02T00:54:45.000Z

Yes, good call. Will open a PR shortly!

Answer 4 · 2023-12-02T02:26:11.000Z

Would love to hear more about what you're building!