pig fails to use lzo as compression for temp files
Opened this issue · 2 comments
The following setup fails using hadoop 2.7.2 and pig 0.15.0 (Google cloud dataproc)
The same job completes fine without lzo comression for temp files and fails with lzo compression for temp files (pig.tmpfilecompression=true pig.tmpfilecompression.codec=lzo)
setup on all nodes during startup:
sudo apt-get install liblzo2-dev
sudo ln -s /lib/x86_64-linux-gnu/liblzo2.so.2 /usr/lib/hadoop/lib/native/
copied hadoop-lzo-0.4.20-SNAPSHOT.jar to /usr/lib/hadoop-mapreduce/
edited core-site.xml and added
<property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
Error: java.lang.RuntimeException: java.io.IOException: Not a valid BCFile. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:155) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:75) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.getPartition(WeightedRangePartitioner.java:58) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:135) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: Not a valid BCFile. at org.apache.hadoop.io.file.tfile.BCFile$Magic.readAndVerify(BCFile.java:927) at org.apache.hadoop.io.file.tfile.BCFile$Reader.(BCFile.java:628) at org.apache.hadoop.io.file.tfile.TFile$Reader.(TFile.java:804) at org.apache.pig.impl.io.TFileRecordReader.initialize(TFileRecordReader.java:64) at org.apache.pig.impl.io.ReadToEndLoader.initializeReader(ReadToEndLoader.java:212) at org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:250) at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.init(WeightedRangePartitioner.java:129) ... 17 more
Is that a Pig issue, or a problem with hadoop-lzo?
Sent from my iPhone
On May 30, 2016, at 10:28 AM, Jefim Matskin notifications@github.com wrote:
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
I really don't know, the problem is that enabling temp file compression with pig to be lzo does not work