airlift/aircompressor

Presto 0.157.1 + Lzop: NullPointerException

idanh opened this issue · 1 comments

idanh commented

Hey guys,

So I've been successfully using your library with EMR (emr-5.3.1) & Hive (2.1.1) with LZOP_X1 (no constraints) and now moving to Presto (0.157.1) I get the following stack trace:

com.facebook.presto.spi.PrestoException: java.lang.reflect.InvocationTargetException
	at com.facebook.presto.hive.HiveSplitSource.propagatePrestoException(HiveSplitSource.java:137)
	at com.facebook.presto.hive.HiveSplitSource.isFinished(HiveSplitSource.java:115)
	at com.facebook.presto.split.ConnectorAwareSplitSource.isFinished(ConnectorAwareSplitSource.java:63)
	at com.facebook.presto.split.BufferingSplitSource.fetchSplits(BufferingSplitSource.java:59)
	at com.facebook.presto.split.BufferingSplitSource.lambda$fetchSplits$1(BufferingSplitSource.java:65)
	at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
	at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
	at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160)
	at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:276)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:246)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:78)
	at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:179)
	at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:45)
	at com.facebook.presto.hive.util.ResumableTasks.lambda$submit$1(ResumableTasks.java:33)
	... 4 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.facebook.presto.hive.HiveUtil.isSplittable(HiveUtil.java:273)
	... 9 more
Caused by: java.lang.NullPointerException
	at com.hadoop.mapred.DeprecatedLzoTextInputFormat.isSplitable(DeprecatedLzoTextInputFormat.java:101)
	... 14 more

Now, The query I'm getting this exception with works well in Hive. It's basically:
select * from table limit 10;

I've added an .lzo.index near my lzop file in S3 but to no eval.

As far as I can tell, DeprecatedLzoTextInputFormat.class has a member called indexes which, if not populated well, get NPE here: https://github.com/twitter/hadoop-lzo/blob/master/src/main/java/com/hadoop/mapred/DeprecatedLzoTextInputFormat.java#L101
As no check is begin made on LzoIndex index.

Now, I presumed with your library I could pass on that check by it seems like it's not working.
I'm using aircompressor-0.9.jar. I've copied it to /usr/lib/presto/plugin/hive-hadoop2 and removed any older version that was in there.

I am confident that your code is actually called (from the stack trace, and many many tests I've done with and without aircompressor jar).

So for my question: Did you guys ever managed to resolve this?

Relevant EMR cluster configuration:

{
    "classification": "core-site",
    "properties": {
      "io.compression.codec.lzo.class": "io.airlift.compress.lzo.LzopCodec",
      "io.compression.codecs": "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,io.airlift.compress.lzo.LzoCodec,io.airlift.compress.lzo.LzopCodec"
    },
    "configurations": []
}

Thank you very much!

  • Idan
dain commented

Reading that DeprecatedLzoTextInputFormat, it doesn't seem to have anything to do with the compression implementation, and instead appears to be about creating Splits. If this is still an issue you are having with Presto, I suggest filing an issue there (maybe with instructions to reproduce).