facebook/hermes

JS thread is frozen in `[native] errrorCaptureStackTrace`

sesm opened this issue ยท 20 comments

sesm commented

Bug Description

I'm working on upgrading a React Native application. After upgrading to 0.71.3 and switching to Hermes, application's JS thread completely freezes several seconds after startup and initial UI render. Hermes profiler suggests that JS thread if frozen in [native] errorCaptureStackTrace.

Screenshot 2023-03-02 at 13 10 43

  • I have run gradle clean and confirmed this bug does not occur with JSC

Hermes version: Hermes for RN 0.71.3
React Native version (if any): 0.71.3
OS version (if any): Android
Platform (most likely one of arm64-v8a, armeabi-v7a, x86, x86_64):

Steps To Reproduce

Can't provide the application code

The Expected Behavior

JS thread shouldn't freeze completely, or if it does there should be a way to debug this.

Disclaimer

I'm not expecting anyone to telepathically debug my problem. My question is: what could possibly cause this and how do I go about localising the problem? So far the only approach I came up with is to delete code and libraries from the project until the issue disappears, but maybe there is a better way?

Are you able to attach a native debugger and see more specifically exactly where in errrorCaptureStackTrace the thread is frozen?

Also, how do you know it's stuck in that function, vs. say the function being called in a tight loop? Do you have call counts from your profile? Is errrorCaptureStackTrace only called once-ish, or is it called repeatedly?

sesm commented

@newobj I've collected CPU profile with Hermes debugger, looks like errorCapureStackTrace is called once, because everything else shows 0 self time. Please see the attachment:
CPU-20230302T165224.cpuprofile.zip

I've also collected CPU Callstack Sample with Android Studio profiler, but without symbols for libhermes.so I can't understand anything:
cpu-simpleperf-20230302T170042.trace.zip

sesm commented

I've localised the problem. It's caused by calling API from getstream library inside a mixture of redux-toolkit thunks and async-await that is called from useEffect. It's really complicated, but the high level logic is like this:

import { useEffect } from 'react';
import { useDispatch } from 'react-redux';
import { createAsyncThunk } from '@reduxjs/toolkit';
import { connect } from 'getstream';

let client;

export const initializeStreamClient = createAsyncThunk('init', async () => {
  await someApiCall();
  client = connect(keys);
});

export const fetchActivities = createAsyncThunk('fetch', async () => {
  try {
    console.log('This code is reached, the promise from getActivities should reject');
    await client.getActivities([]);
    console.log('This code is never reached');
  } catch (err) {
    console.log('This code is never reached');
  }
});

const MyScreen = () => {
  const dispatch = useDispatch();
  useEffect(() => {
    dispatch(initializeStreamClient()).then(() => {
      dispatch(fetchActivities());
    });
  })
}

The issue is not 100% reproducible, and sometimes promise from getActivities is successfully rejected. But in most cases neither catch block, nor subsequent line is reached and JS thread freezes. The freeze was never reproduced in JSC, but maybe that happened because of timing and there is some deadlock hidden in this async flow.

Can see from the traces that this is happening while StreamApiError is being constructed and populated with a stack, so I suspect that's how it ties into your localization of the problem, but it's almost impossible to diagnose further without a symbolicated stack trace. If you can repro this in a build with symbols, that would go a long way to allowing us to help further.

sesm commented

I would be happy to help! Where can I get a build with symbols?

That would really depend on your build system and I don't think I know the answer. If you're using Gradle, I would possibly look at the doNotStrip setting under packagingOptions and trying to omit .so's.

sesm commented

Yes, I'm using Gradle. Will try this now, thank you!

sesm commented

Sorry, I couldn't get build with symbols to work.
But I have an update: I can confirm that the freeze happens in Error.captureStackTrace function, because commenting it out from getstream library JS code fixes the issue.

Hi @sesm,

Would you be able to provide a complete repro for this issue? Otherwise we won't be able to investigate.

John Paul

sesm commented

@jpporto I will try to reproduce it on a fresh project with the same version of getstream library. Will get back to you with the results.

sesm commented

@jpporto reproducing the issue turns out to be very easy: any error in getstream library causes this, even connect('', '');

Please see example app here: https://github.com/sesm/rn-gestream-freeze

sesm commented

Here is the line of code that seems to cause it: https://github.com/GetStream/stream-js/blob/main/src/errors.ts#L21 , looks like a recursive construction.
Also, just checked: the issue is not reproducible in JSC, because JSC doesn't have Error.captureStackTrace, so that line is skipped.

Thanks for all your help. 2 things:

  1. there is indeed a bug in Hermes that causes Error.captureStackTrace to loop infinitely.
  2. there is a bug in GetStream's AbstractError constructor. The call should be Error.captureStackTrace(this, AbstractError), and not Error.captureStackTrace(this, AbstractError.constructor).

You should expect a fix for 1 soon, and once it lands you can make an ask for the react native team to include it in a dot release.

John Paul

sesm commented

Thanks a lot!

Hey, I just bumped into this on RN 0.71.4, I couldn't find any PR related to this. Is it already fixed? I would love to help but I'm probably not qualified enough. ๐Ÿ˜…

This was fixed by this commit. I am not sure it is part of RN 0.71.4. @cortinico can you confirm either way?

@jpporto Oh, I see, the tag has RN0.72.0 in the name, so I'm assuming it will be released only in that version. According to RN changelog, the last time hermes was updated is in 0.71.4. Thank you!

I am not sure it is part of RN 0.71.4. @cortinico can you confirm either way?

It's not. That will land in 0.72

I just tested on 0.72.0, it's working fine now :) Thank you!