pinterest/secor

Blocking Issue when running on K8s on GCP

richiesgr opened this issue · 1 comments

Hi
I run secor on google cloud kubernetes.
The nodes pool use preemptive machine meaning the machine restart every 24H.
I've deployed 7 pods
The data in kafka is avro and I write Avro files, the files are uploaded in bucket in GCS.
I read data from 5 topics togethers, the data in the topics are filled using Mirror Maker
On some pod After a some hours I get an exception and this is cause the pod to Crashloop

I CANNOT RECOVER FROM HERE. all the failing pod crash the same way when restarted

java.lang.RuntimeException: Failed to write message Message <message binary>
at com.pinterest.secor.consumer.Consumer.handleWriteError(Consumer.java:272)
	at com.pinterest.secor.consumer.Consumer.consumeNextMessage(Consumer.java:232)
	at com.pinterest.secor.consumer.Consumer.run(Consumer.java:164)
Caused by: java.io.FileNotFoundException: /mnt/secor_csv/message_logs/partition/1_25/prod-og-monitoring_agg_impressions_sg/dt=2020-12-02/hr=02/1_229_00000000000189724222.gz (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
	at org.apache.avro.file.SyncableFileOutputStream.<init>(SyncableFileOutputStream.java:58)
	at org.apache.avro.file.DataFileWriter.create(DataFileWriter.java:134)
	at com.pinterest.secor.io.impl.AvroFileReaderWriterFactory$AvroFileWriter.<init>(AvroFileReaderWriterFactory.java:131)
	at com.pinterest.secor.io.impl.AvroFileReaderWriterFactory.BuildFileWriter(AvroFileReaderWriterFactory.java:73)
	at com.pinterest.secor.util.ReflectionUtil.createFileWriter(ReflectionUtil.java:156)
	at com.pinterest.secor.common.FileRegistry.getOrCreateWriter(FileRegistry.java:138)
	at com.pinterest.secor.writer.MessageWriter.write(MessageWriter.java:104)
	at com.pinterest.secor.consumer.Consumer.writeMessage(Consumer.java:256)
	at com.pinterest.secor.consumer.Consumer.consumeNextMessage(Consumer.java:229)

Only doing this using the AvroFileReaderWriterFactory