Netflix/aegisthus

java.io.EOFException while running Aegisthus

staxxknight opened this issue · 2 comments

Hi I am very new to Aegisthus.
Any help in resolving this exception is highly appreciated.

I tried running to process my SSTables using this command. The SSTables are from Cassandra 2.0.9
(Note: without the lz4 jar I encountered a ClassNotFoundException..is this the correct way?)
hadoop jar aegisthus-hadoop-0.2.4.jar com.netflix.Aegisthus -Daegisthus.columntype=UTF8Type -Daegisthus.keytype=UTF8Type -libjars lz4-1.2.0.jar -inputDir myinputdir -output json
My inputDir "myinputdir" has the Data.db and the CompressionInfo.db

Inspiron-N7110 ~/git/aegisthus2/aegisthus-hadoop/build/libs $ hadoop jar aegisthus-hadoop-0.2.4.jar com.netflix.Aegisthus -Daegisthus.columntype=UTF8Type -Daegisthus.keytype=UTF8Type -libjars lz4-1.2.0.jar -inputDir myinputdir -output json 
14/12/07 23:24:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/12/07 23:24:54 INFO tools.DirectoryWalker: hdfs://localhost:9000/user/stax/csst/node1 :    2 file(s)
14/12/07 23:24:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/12/07 23:24:55 INFO input.FileInputFormat: Total input paths to process : 1
14/12/07 23:24:55 INFO input.AegisthusInputFormat: aegisthus.keytype: UTF8Type
14/12/07 23:24:55 INFO input.AegisthusInputFormat: aegisthus.columntype: UTF8Type
14/12/07 23:24:55 INFO input.AegisthusInputFormat: end path: csst-users-jb-1-Data.db:0:79
14/12/07 23:24:55 INFO input.AegSplit: start: 0, end: 79
14/12/07 23:24:55 INFO input.AegisthusCombinedInputFormat: sstable AegSplits: 1
14/12/07 23:24:55 INFO input.AegisthusCombinedInputFormat: sstables Added AegSplits: 1
14/12/07 23:24:55 INFO input.AegisthusCombinedInputFormat: other AegSplits: 0
14/12/07 23:24:55 INFO input.AegisthusCombinedInputFormat: AegCombinedSplits: 1
14/12/07 23:24:55 INFO mapreduce.JobSubmitter: number of splits:1
14/12/07 23:24:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1417989836400_0003
14/12/07 23:24:56 INFO impl.YarnClientImpl: Submitted application application_1417989836400_0003
14/12/07 23:24:56 INFO mapreduce.Job: The url to track the job: http://stax-Dell-System-Inspiron-N7110:8088/proxy/application_1417989836400_0003/
job_1417989836400_0003
Inspiron-N7110:8088/proxy/application_1417989836400_0003/
14/12/07 23:24:56 INFO mapreduce.Job: Running job: job_1417989836400_0003
14/12/07 23:25:02 INFO mapreduce.Job: Job job_1417989836400_0003 running in uber mode : false
14/12/07 23:25:02 INFO mapreduce.Job:  map 0% reduce 0%
14/12/07 23:25:06 INFO mapreduce.Job: Task Id : attempt_1417989836400_0003_m_000000_0, Status : FAILED
Error: java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:197)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:395)
    at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
    at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:74)
    at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:69)
    at com.netflix.aegisthus.io.sstable.SSTableScanner.serializeColumns(SSTableScanner.java:231)
    at com.netflix.aegisthus.io.sstable.SSTableScanner.next(SSTableScanner.java:205)
    at com.netflix.aegisthus.input.readers.SSTableRecordReader.nextKeyValue(SSTableRecordReader.java:94)
    at com.netflix.aegisthus.input.readers.CombineSSTableReader.nextKeyValue(CombineSSTableReader.java:50)
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
    at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)```

Hi,
Im having the same issue. Any help here wud be highly appreciated

I'm sorry we missed this issue and we will try to do a better job curating our issues in the future.

At the point this issue was created Aegisthus could not read Cassandra 2.x data. In March of 2015 we added support for reading Cassandra 2.0 data. Today I patched Aegisthus so it should be able to read Cassandra 2.1 and 2.2 data. We plan to add better support for Cassandra 2.1 and 2.2 data in the future.