coherence-community/oracle-bedrock

Lambda Serialization may fail when using the JavaVirtualMachine platform

Closed this issue · 9 comments

As reported by @kosmaty.

There is one more lambda serialization issue that I've found. I tried to use lambda expression as EntryProcessor:

ValueExtractor<InvocableMap.Entry<String, String>, String> ex = e -> e.isPresent() ? e.getValue() : null;
namedCache.invokeAll(entry -> ex.apply(entry));

It works fine when cluster is started using LocalPlatform. But when claster is started using JavaVirtualMachine:

CoherenceCluster cluster = builder.build(JavaVirtualMachine.get(), Console.system())

the LinkageError is thrown:

java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
	at com.oracle.bedrock.runtime.coherence.CoherenceNamedCache.remotelyInvoke(CoherenceNamedCache.java:262)
	at com.oracle.bedrock.runtime.coherence.CoherenceNamedCache.invokeAll(CoherenceNamedCache.java:360)
	at com.tangosol.util.InvocableMap.invokeAll(InvocableMap.java:85)
	at com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest.shouldSupportLambdaAsEntryProcessor(ContainerBasedCoherenceClusterBuilderTest.java:100)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:262)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at com.oracle.bedrock.runtime.coherence.CoherenceNamedCache.remotelyInvoke(CoherenceNamedCache.java:228)
	... 34 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.oracle.bedrock.runtime.concurrent.callable.RemoteMethodInvocation.call(RemoteMethodInvocation.java:133)
	at com.oracle.bedrock.runtime.concurrent.AbstractRemoteChannel$CallableOperation.execute(AbstractRemoteChannel.java:578)
	at com.oracle.bedrock.runtime.concurrent.AbstractRemoteChannel$Executor.run(AbstractRemoteChannel.java:849)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: (Wrapped: Failed request execution for PartitionedCache service on Member(Id=2, Timestamp=2016-12-21 15:44:05.641, Address=10.61.6.12:57380, MachineId=30275, Location=site:gft.com,machine:PLLL0106,process:792, Role=IntellijRtExecutionApplicationAppMain) (Wrapped) no such constructor: com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(ValueExtractor)void/newInvokeSpecial) java.lang.IllegalAccessException: no such constructor: com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(ValueExtractor)void/newInvokeSpecial
	at com.tangosol.util.Base.ensureRuntimeException(Base.java:296)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.tagException(Grid.CDB:61)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:123)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.run(PartitionedCache.CDB:1)
	at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:1)
	at com.tangosol.coherence.component.util.DaemonPool$WrapperTask.run(DaemonPool.CDB:32)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService$DaemonPool$WrapperTask.run(PartitionedService.CDB:1)
	at com.tangosol.coherence.component.util.DaemonPool$Daemon.onNotify(DaemonPool.CDB:66)
	at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:54)
	... 1 more
Caused by: java.lang.IllegalAccessException: no such constructor: com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(ValueExtractor)void/newInvokeSpecial
	at java.lang.invoke.MemberName.makeAccessException(MemberName.java:867)
	at java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1003)
	at java.lang.invoke.MethodHandles$Lookup.resolveOrFail(MethodHandles.java:1381)
	at java.lang.invoke.MethodHandles$Lookup.findConstructor(MethodHandles.java:919)
	at com.tangosol.internal.util.invoke.ClassDefinition.setRemotableClass(ClassDefinition.java:125)
	at com.tangosol.internal.util.invoke.RemotableSupport.realize(RemotableSupport.java:123)
	at com.tangosol.internal.util.invoke.RemoteConstructor.newInstance(RemoteConstructor.java:120)
	at com.tangosol.internal.util.invoke.RemoteConstructor.readResolve(RemoteConstructor.java:231)
	at com.tangosol.util.ExternalizableHelper.realize(ExternalizableHelper.java:4837)
	at com.tangosol.util.ExternalizableHelper.deserializeInternal(ExternalizableHelper.java:3101)
	at com.tangosol.util.ExternalizableHelper.fromBinary(ExternalizableHelper.java:334)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$InvokeFilterRequest.deserializeProcessor(PartitionedCache.CDB:7)
	at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onInvokeFilterRequest(PartitionedCache.CDB:62)
	... 7 more
Caused by: java.lang.LinkageError: loader constraint violation: when resolving method "com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(Lcom/tangosol/util/ValueExtractor;)V" the class loader (instance of <bootloader>) of the current class, java/lang/Object, and the class loader (instance of com/tangosol/internal/util/invoke/RemotableSupport) for the method's defining class, com/oracle/bedrock/runtime/coherence/ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654, have different Class objects for the type com/tangosol/util/ValueExtractor used in the signature
	at java.lang.invoke.MethodHandleNatives.resolve(Native Method)
	at java.lang.invoke.MemberName$Factory.resolve(MemberName.java:975)
	at java.lang.invoke.MemberName$Factory.resolveOrFail(MemberName.java:1000)
	... 18 more

But I suspect it might be some problem with Oracle Coherence itself (not Bedrock) since I was facing the same issue when using Littlegrid (in fact, it was the primary reason why I started evaluation of Bedrock).

Please, feel free to move this to another issue if you think it is not strictly connected to the original case of this issue.

From my understanding, I believe the issue is due to an attempt to serialize and deserialize a nested anonymous non-static lambda expression, that of which (using Coherence or not) is generally a bad idea. ie: It's actually a Java issue more than anything else.

That said, things are made slightly more confusing by Application Containers and especially Child-First ClassLoader strategies used by Bedrock and LittleGrid. Let's look at some example work-arounds and discuss why they work.

If you replace:

ValueExtractor<InvocableMap.Entry<String, String>, String> ex = e -> e.isPresent() ? e.getValue() : null;
namedCache.invokeAll(entry -> ex.apply(entry));

with a single Lambda expression like this: (ie: fold the inner lambda into the outer one)

namedCache.invokeAll(entry -> entry.isPresent() ? entry.getValue() : null);

or define the ex lambda as a static final like this: (ie: make it a constant)

public static final ValueExtractor<InvocableMap.Entry<String, String>, String> ex = e -> e.isPresent() ? e.getValue() : null;

...
namedCache.invokeAll(entry -> ex.apply(entry));

it works. Why?

When Coherence (and Java) attempts to serialize the lambda expression entry -> ex.apply(entry) as used by the invokeAll method, it essentially has to create a serializable closure which includes resolving the class and name of the effectively constant (but not necessarily so) ex lambda variable/expression, that of which is defined in the line above the invokeAll. It needs to know the class name of this inner expression so the outer expression (and it) can be serialized and deserialized.

As Lambdas classes in Java have no identity (ie: no formal externalizable class name), the best Coherence (and Java) can do is to create a synthetic identity (ie: class name) based on the current calling context, which is why you see something like: com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654 as a class name.

Notice it actually contains the fully-qualified-class-name of the class in which the lambda is defined, together with the method name, in addition to some other runtime-specific information (like a byte-code / memory location and UUID). The Java language does the best it can at creating an identity but sadly this execution time specific.

While this appears to be a valid but special "class-name" string for the ex lambda, that of which can easily be serialized, it obviously won't exist or be available in the same location (ie: memory location and/or with the same UUID) in a different class-loader in the same Java Virtual Machine.

This is why a LinkageError is raised when the outer-lambda is deserialized as it's not possible to resolve the class of the non-static anonymous inner lambda expression.

Why doesn't this happen for the outer lambda expression as well? Simple, the location is known and fixed. It's known at compile time. It's not a variable as ex is, which means it can change. This is why folding the ex lambda into the outer lambda and/or using a constant for the ex works (or using a Method Reference).

Why doesn't this happen when running the test in a separate process? I can only reason that compiled byte-code and memory layout is identical. However I don't think this should ever be assumed or is guaranteed to work forever, hence why I state above, it's probably a bad idea to use this type of pattern.

Basically if you're going to use Lambda expressions across a distributed system, you should always avoid nested non-static anonymous use-cases. While they may work in some circumstances, there's no guarantee they will forever.

This is an interesting case, thanks for sharing it!

I don't know how Bedrock / Coherence (de-)serialize lambdas, however when a standard Java serialization is used then the synthetic classname is not part of the serialized state.

It's certainly possible to create a capturing (non-static?) lambda, serialize its state (including captured arguments), send it over a network and deserialize it. Or just deserialize it via another classloader in the same JVM. (provided the classloader has an access to the bytecode)

I think the last part is the biggest issue. The Child-First-Class-Loader loads the classes again, where as Parent-First-Class-Loaders won't suffer from the same issue.

Bedrock actually uses standard Java serialization, where as Coherence itself uses patented serialization extensions from the JVM itself. Coherence does a lot of magic identify and dynamically ship the bytecode, so the lambda classes don't actually need to be on the class-path (in the servers). This allows developers and applications to be running different versions of their applications (and thus having different lambda expressions in different locations), without corrupting server-side data.

eg: Image two almost identical copies of the same class where by they only differ in the location of a method / lambda in the class file. Standard Java serialization won't resolve the location of the serialized lambdas, where as Coherence will. This allows refactoring of application code without impacting clients or servers, regardless of the version of Java they are running.

Without the serialization magic in Coherence, customers would have to restart the entire cluster when a single lambda expression is changed.

Thanks for in-depth explanation. But there are some points where I can't fully agree with you.

Basically if you're going to use Lambda expressions across a distributed system, you should always avoid nested non-static anonymous use-cases. While they may work in some circumstances, there's no guarantee they will forever

According to this presentation by Aleksandar Seovic, nested lambda expressions are supported by Coherence and my example is almost identical to Aleksandar's.

But let's back to this issue. I've tried few more things and it seems it has nothing to do with nested lambdas. I've slightly modified the example I gave earlier to not use any API of ValueExtractor inside lambda expression:

ValueExtractor ex1 = e -> e;
namedCache.invokeAll(entry -> String.valueOf(ex1));

There is no surprise that the same LinkageError is thrown. But this error is also thrown is folowing cases:

// non-lambda value extractor        
ValueExtractor ex2 = ValueExtractor.identity();
namedCache.invokeAll(entry -> String.valueOf(ex2));

// null value extractor        
ValueExtractor ex3 = null;
namedCache.invokeAll(entry -> String.valueOf(ex3));

But the following cases works fine and do not cause errors:

// null Object reference
Object ex4 = null;
namedCache.invokeAll(entry -> String.valueOf(ex4));

// Object reference pointing to value extractor       
Object ex5 = ValueExtractor.identity();
namedCache.invokeAll(entry -> String.valueOf(ex5));

// Object reference pointing lambda      
Object ex6 = (ValueExtractor) e -> e;
namedCache.invokeAll(entry -> String.valueOf(ex6));

From cases above (and some other tests that I've done) I came to conclusions:

  1. The problem is in no way connected to nested lambda expressions
  2. Error occurs when lambda expresion uses some object that is defined outside this expression
  3. It occurs only when this "external" object's type is not a core JDK type (so it occurs for e.g. ValueExtractor or any project-specific class but doesn't happen for e.g. String or Object)
  4. Actual object that is used in lambda expression doesn't matter. It is reference type that matters (e.g. when ValueExtractor object is assigned to Object reference it works fine)

And now my guess what is happening (based upon your explanations and my observations):

  1. When serializing lambda expression, Coherence creates class definition (bytecode) for this lambda expression. This class has a constructor that as an argument takes object defined outside the lambda expression (in my example it was ValueExtractor). Lambda serialized in this way is send to other Coherence nodes
  2. When lambda is deserialized on other node, its class definition (Class object) is loaded by that node using perhaps ThreadContextClassLoader (in our case it will be com.oracle.bedrock.runtime.java.container.ContainerClassLoader)
  3. Then the method lookup is performed on this class to find the constructor with specific signature (in our case com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(Lcom/tangosol/util/ValueExtractor;)V) using java.lang.invoke.MethodHandles$Lookup.findConstructor(MethodHandles.java:919).

This is the point where failure happens. Let's analyze error message

[1] java.lang.LinkageError: loader constraint violation: when resolving method
[2] "com.oracle.bedrock.runtime.coherence.ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654.<init>(Lcom/tangosol/util/ValueExtractor;)V" 
[3] the class loader (instance of <bootloader>) of the current class, java/lang/Object, 
[4] and the class loader (instance of com/tangosol/internal/util/invoke/RemotableSupport) for the method's defining class, 
[5] com/oracle/bedrock/runtime/coherence/ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654, 
[6] have different Class objects for the type com/tangosol/util/ValueExtractor used in the signature`

From this message I can guess, that:

  1. The lookup is performed to find this constructor for class com/oracle/bedrock/runtime/coherence/ContainerBasedCoherenceClusterBuilderTest$lambda$shouldSupportLambdaAsEntryProcessor$bee7b61e$1$B3CBA80B23DC783FB40D51EF42FE8654 (a serialized lambda class) [5]
  2. This class has been loaded by RemotableSupport class loader (its parent class loader is ContainerClassLoader)[4]. It has a constructor that takes ValueExtractor [2] (and ValueExtractor class used in method signature is also loaded by ContainerClassLoader)
  3. Lookup object has lookupClass field and in this case it is of Object type. When performing lookup, the Object's class loader (the <bootloader> [3]) is used. Unfortunately, this class loader also finds a ValueExtractor class that was loaded by this class loader (the class used in test)
  4. As a result Java finds two ValueExtractor class definitions (loaded by two separate class loaders) used in one context. This causes LinkageError to be thrown.

I guess that if Coherence used different Lookup object (with different lookupClass, e.g. the deserialized lambda class), the constructor lookup would not fail, no matter what class loader would be used to load the lambda class definition.

And why it works when separate JVM is used for each node? Because ValueExtractor class is always loaded once by one class loader only.

But - I say it once more - my guesses are only guesses and maybe someone with access to Coherence source code might be able to confirm this.

Having all this said, I am still not sure if anything could be done to fix it in Bedrock without changing Coherence.

According to this presentation by Aleksandar Seovic, nested lambda expressions are supported by
Coherence and my example is almost identical to Aleksandar's.

Ah... fortunately / unfortunately Bedrock uses a different serialization approach than that provided Coherence! ie: Bedrock uses out-of-the-box Java serialization, where as Coherence has "fixed" and "worked around" challenges with Java serialization.

Consequently I'm looking at how we can make Bedrock use Coherence serialization approaches for lambdas instead of the default Java Serialization.

I guess that if Coherence used different Lookup object (with different lookupClass, e.g. the
deserialized lambda class), the constructor lookup would not fail, no matter what class loader
would be used to load the lambda class definition.

I think you're right about this. I'm looking at how the class-loader lookups can re-organized so that this will work.

PS: From the Java Documentation about Serializing Lambdas.

http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html#serialization

Serialization

You can serialize a lambda expression if its target type and its captured arguments are serializable.
However, like inner classes, the serialization of lambda expressions is strongly discouraged.

My guess it that this guidance is based on the feeling that Serialized Lambdas is non-trivial and that it's highly likely that there are challenges that haven't been discovered and thus solved as yet.

I think it's perfectly ok that there are two different ValueExtractor classes. This is what the Child-First-Class-Loader does and it works perfectly with any regular (non-lambda) instance of a ValueExtractor. They are serialized and deserialized correctly.

The problem is that the lambda expression serialized as part of the ValueExtractor can't be deserialized.

Yes, of course, it is ok that there are two ValueExtractor classes. Until both are used in one context (i.e. the programm expects one of them, but is given the other). And that happens when lookup is performed to find constructor that takes ValueExtractor (or any other non core JDK class) as argument.

When regular (non-lambda) class is deserialized, either lookup is not performed at all or lookup is performed to find no-arg constructor (e.g. POF serialization uses no-arg constructor to instantiate object). The latter also happens during deserialization of lambda expression that is totally self-contained (does not use object defined externally).