[FEATURE] support configurable checksum in Lz4Decompressor
shuai-xu opened this issue · 4 comments
shuai-xu commented
Is your feature request related to a problem? Please describe.
Now the checksum in Lz4Decompressor is set to StreamingXXHash32JNI by default, and there is no config to change it. This checksum calls C++ code, which may cause spark executor hang in Gluten.
Describe the solution you'd like
Make it configurable, so In gluten, they can choose to use StreamingXXHash32JavaSafe.
SteNicholas commented
@shuai-xu, why should it support configurable checksum in Lz4Decompressor
? IMO, XXHashFactory#fastestInstance
would determine to use which instance according to environment. cc @waitinfuture
/**
* Returns the fastest available {@link XXHashFactory} instance. If the class
* loader is the system class loader and if the
* {@link #nativeInstance() native instance} loads successfully, then the
* {@link #nativeInstance() native instance} is returned, otherwise the
* {@link #fastestJavaInstance() fastest Java instance} is returned.
* <p>
* Please read {@link #nativeInstance() javadocs of nativeInstance()} before
* using this method.
*
* @return the fastest available {@link XXHashFactory} instance.
*/
public static XXHashFactory fastestInstance() {
if (Native.isLoaded()
|| Native.class.getClassLoader() == ClassLoader.getSystemClassLoader()) {
try {
return nativeInstance();
} catch (Throwable t) {
return fastestJavaInstance();
}
} else {
return fastestJavaInstance();
}
}
shuai-xu commented
By default fastestInstance use nativeInstance, it can't change to javaInstance by config
pan3793 commented
This checksum calls C++ code, which may cause spark executor hang in Gluten.
Could u pls elaborate more? How does it happen?
SteNicholas commented