apache/celeborn

[FEATURE] support configurable checksum in Lz4Decompressor

shuai-xu opened this issue · 4 comments

Is your feature request related to a problem? Please describe.

Now the checksum in Lz4Decompressor is set to StreamingXXHash32JNI by default, and there is no config to change it. This checksum calls C++ code, which may cause spark executor hang in Gluten.

Describe the solution you'd like

Make it configurable, so In gluten, they can choose to use StreamingXXHash32JavaSafe.

@shuai-xu, why should it support configurable checksum in Lz4Decompressor? IMO, XXHashFactory#fastestInstance would determine to use which instance according to environment. cc @waitinfuture

/**
   * Returns the fastest available {@link XXHashFactory} instance. If the class
   * loader is the system class loader and if the
   * {@link #nativeInstance() native instance} loads successfully, then the
   * {@link #nativeInstance() native instance} is returned, otherwise the
   * {@link #fastestJavaInstance() fastest Java instance} is returned.
   * <p>
   * Please read {@link #nativeInstance() javadocs of nativeInstance()} before
   * using this method.
   *
   * @return the fastest available {@link XXHashFactory} instance.
   */
  public static XXHashFactory fastestInstance() {
    if (Native.isLoaded()
        || Native.class.getClassLoader() == ClassLoader.getSystemClassLoader()) {
      try {
        return nativeInstance();
      } catch (Throwable t) {
        return fastestJavaInstance();
      }
    } else {
      return fastestJavaInstance();
    }
  }

By default fastestInstance use nativeInstance, it can't change to javaInstance by config

This checksum calls C++ code, which may cause spark executor hang in Gluten.

Could u pls elaborate more? How does it happen?

@shuai-xu, I have supported the configuration mentioned above in #2050. PTAL.