AbsaOSS/cobrix

Logger implementation not serializable

joaquin021 opened this issue · 2 comments

I am trying to use the Cobrix library with a Tinylog logger because we use this logger implementation in our architecture.
We are getting the exception found at the end of this issue.

We have been investigating and we have found that the logger is not annotated as transient in the Cobrix library.

private val logger = LoggerFactory.getLogger(this.getClass) 

However, in the spark implementation and other connectors, this logger is declared as transient or static:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala#L42
https://github.com/mongodb/mongo-spark/blob/master/src/main/scala/com/mongodb/spark/LoggingTrait.scala#L30
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/MultipartUtils.java#L51

Can we change the logger declarations and annotate them as transient? Is there any drawback of annotate logger as transient?
We have tested it with this modification and it works as we need.
If the issue is accepted, we can make a pull request to change it.

Exception in thread "main" org.apache.spark.SparkException: Task not serializable
.
.
.
Caused by: java.io.NotSerializableException: org.tinylog.slf4j.TinylogLogger
Serialization stack:
	- object not serializable (class: org.tinylog.slf4j.TinylogLogger, value: org.tinylog.slf4j.TinylogLogger@103e9972)
	- field (class: za.co.absa.cobrix.cobol.parser.Copybook, name: logger, type: interface org.slf4j.Logger)
	- object (class za.co.absa.cobrix.cobol.parser.Copybook, za.co.absa.cobrix.cobol.parser.Copybook@352ce817)
	- field (class: za.co.absa.cobrix.cobol.reader.schema.CobolSchema, name: copybook, type: class za.co.absa.cobrix.cobol.parser.Copybook)
	- object (class za.co.absa.cobrix.cobol.reader.schema.CobolSchema, za.co.absa.cobrix.cobol.reader.schema.CobolSchema@87f6ab5)
	- field (class: za.co.absa.cobrix.cobol.reader.FixedLenNestedReader, name: cobolSchema, type: class za.co.absa.cobrix.cobol.reader.schema.CobolSchema)
	- object (class za.co.absa.cobrix.spark.cobol.reader.FixedLenNestedReader, za.co.absa.cobrix.spark.cobol.reader.FixedLenNestedReader@2f4545c6)
	- element of array (index: 0)
	- array (class [Ljava.lang.Object;, size 1)
	- field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, type: class [Ljava.lang.Object;)
	- object (class java.lang.invoke.SerializedLambda, SerializedLambda[capturingClass=class za.co.absa.cobrix.spark.cobol.source.CobolRelation, functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;, implementation=invokeStatic za/co/absa/cobrix/spark/cobol/source/CobolRelation.$anonfun$parseRecords$1:(Lza/co/absa/cobrix/spark/cobol/reader/FixedLenReader;[B)Lscala/collection/Iterator;, instantiatedMethodType=([B)Lscala/collection/Iterator;, numCaptured=1])
	- writeReplace data (class: java.lang.invoke.SerializedLambda)
	- object (class za.co.absa.cobrix.spark.cobol.source.CobolRelation$$Lambda$1761/0x0000000840d13040, za.co.absa.cobrix.spark.cobol.source.CobolRelation$$Lambda$1761/0x0000000840d13040@440461ef)
	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:413)
	... 83 more

Thanks for the report! Do I understand correctly that this can be fixed by making all loggers @transient?

Since you can test the implementation it might be easier if you could create a PR that works for you and we can merge it and release a new version. What do you think?

It is right, this can be fixed by making all loggers @transient.
Sure, I do the PR.
Thank you.