Strange error when doing a `StdArrays.copyFrom` with a matrix using JDK 17
hmf opened this issue · 14 comments
Expected Behavior
Note: title changed - Strange error interfacing with Scala when doing a StdArrays.copyFrom
with a matrix using JDK 17
While exploring and documenting the Index API, I get errors when using Scala. I get a java.nio.BufferOverflowException
exception. However, what seems like an equivalent Java test, has no problem. I expect the Scala code to work just as it does in Java. I am trying to replicate the error in Java.
I understand that Scala is not a supported platform, but will appreciate any help. I have been at this for a week.
Actual Behavior
I have created a test that succeeds here . The relevant code is:
String[][] indexData = new String[5][4];
for (int i=0 ; i < 5; i++)
for (int j=0 ; j < 4; j++)
indexData[i][j] = "("+j+", "+i+")";
NdArray<String> matrix2d = StdArrays.ndCopyOf(indexData);
assertEquals(2, matrix2d.rank());
/*
|(0, 0), (1, 0), (2, 0), (3, 0)|
|(0, 1), (1, 1), (2, 1), (3, 1)|
|(0, 2), (1, 2), (2, 2), (3, 2)|
|(0, 3), (1, 3), (2, 3), (3, 3)|
|(0, 4), (1, 4), (2, 4), (3, 4)|
*/
// all rows, columns 1 to 2
NdArray<String> same7 = matrix2d.slice(Indices.all(), Indices.slice(1,3));
assertEquals(2, same7.rank());
assertEquals(Shape.of(5,2), same7.shape());
assertEquals(10, same7.size());
String[][] expectedr7 = new String[][]
{
{"(1, 0)", "(2, 0)"},
{"(1, 1)", "(2, 1)"},
{"(1, 2)", "(2, 2)"},
{"(1, 3)", "(2, 3)"},
{"(1, 4)", "(2, 4)"}
};
String[][] lArray = new String[5][2];
StdArrays.copyFrom(same7, lArray);
The equivalent Scala code is:
val indexes =
for
i <- (0 until 5)
yield
val row = for { j <- (0 until 4) } yield s"($j, $i)"
row.toArray
val indexData = indexes.toArray
val matrix2d = StdArrays.ndCopyOf(indexData)
// all rows, columns 1 to 2
val same7 = matrix2d.slice(Indices.all(), Indices.slice(1,3))
assert(same7.rank() == 2)
assert(same7.shape() == Shape.of(5,2))
assert(same7.size() == 10)
val expected_r7 = Array
(
Array("(1, 0)", "(2, 0)"),
Array("(1, 1)", "(2, 1)"),
Array("(1, 2)", "(2, 2)"),
Array("(1, 3)", "(2, 3)"),
Array("(1, 4)", "(2, 4)")
);
val lArray = Array.ofDim[String](5,2)
assert(lArray.size == 5)
assert(lArray(0).size == 2)
StdArrays.copyFrom(same7, lArray)
Steps to Reproduce the Problem
- I forked the NdArray repository, created and ran the test above with success
- I am using the latest version of the NdArray library for the Scala code
- I created an equivalent "test" as shown above
- When I execute the Scala code I get the following error:
Exception in thread "main" java.nio.BufferOverflowException
at org.tensorflow.ndarray.impl.buffer.Validator.copyToArgs(Validator.java:61)
at org.tensorflow.ndarray.impl.buffer.misc.ArrayDataBuffer.copyTo(ArrayDataBuffer.java:52)
at org.tensorflow.ndarray.impl.dense.DataTransfer.execute(DataTransfer.java:114)
at org.tensorflow.ndarray.impl.dense.AbstractDenseNdArray.read(AbstractDenseNdArray.java:94)
at org.tensorflow.ndarray.StdArrays.copyFrom(StdArrays.java:2735)
at org.tensorflow.ndarray.StdArrays.lambda$copyFrom$75(StdArrays.java:2752)
at org.tensorflow.ndarray.impl.sequence.SlicingElementSequence.lambda$forEachIndexed$0(SlicingElementSequence.java:65)
at org.tensorflow.ndarray.impl.sequence.NdPositionIterator.forEachIndexed(NdPositionIterator.java:43)
at org.tensorflow.ndarray.impl.sequence.SlicingElementSequence.forEachIndexed(SlicingElementSequence.java:64)
at org.tensorflow.ndarray.StdArrays.copyFrom(StdArrays.java:2751)
at core.TensorExamples$.ndArraySlicing(TensorExamples.scala:751)
at core.TensorExamples$.main(TensorExamples.scala:1176)
at core.TensorExamples.main(TensorExamples.scala)
It is difficult for me get a failing example in Scala that anyone here could easily run and check. So my goal is to try and replicate this in Java. In particular I have found that the Java version does not follow the same path as the Scala code. When debugging the Java version I manually get this trace:
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/StdArrays.java [2747]
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/StdArrays.java [2751]
/ndarray/src/main/java/org/tensorflow/ndarray/impl/dense/AbstractDenseNdArray.java [50]
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/impl/sequence/SlicingElementSequence.java []
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/StdArrays.java [2752]
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/StdArrays.java [2735]
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/impl/dense/AbstractDenseNdArray.java [94]
dst: ArrayDataBuffer length = 2, offset = 0, readOnly = false
/workspaces/java-ndarray/ndarray/src/main/java/org/tensorflow/ndarray/impl/dense/DataTransfer.java [104] <-----> [114]
Note that with the Java version the DataTransfer
class executes the copyByElement [104]
method. The Scala code does a srcBuffer.copyTo [114]
. The test srcDimensions.isSegmented()
seems to be different. The relevant code is shown below:
static <T, B extends DataBuffer<T>> void execute(B srcBuffer, DimensionalSpace srcDimensions, B dstBuffer, OfValue<B> valueTransfer) {
if (srcDimensions.isSegmented()) {
long elementSize = srcDimensions.get(srcDimensions.segmentationIdx()).elementSize();
copyByElement(
srcBuffer,
PositionIterator.create(srcDimensions, srcDimensions.segmentationIdx()),
dstBuffer,
PositionIterator.sequence(elementSize, dstBuffer.size()),
elementSize,
valueTransfer
);
} else {
srcBuffer.copyTo(dstBuffer, srcDimensions.physicalSize());
}
}
So my question is, how can one change the Java test code so that a srcDimensions.isSegmented()
is false and I srcBuffer.copyTo
method is used.
On another note, would it be interesting to provide additional tests as a PR, including the one above?
Specifications
- Version: 0.3.3
- Platform: Java JDK 17 on Linux
This is not a problem with Scala. It is an issue with the JDK version. If I copy the Java code to my Scala project and run it their, it also fails. It does not help if set the java compiler options to use JDK 11, as per this project's POM (source and target).
I am using Github's CodeSpaces, so that VM uses:
openjdk 11.0.14.1 2022-02-08 LTS
I am using on my local machine:
openjdk 17.0.5 2022-10-18
I am assuming version 17 should be backwards compatible, but it does not seem like it. Can this be considered a bug?
Thanks for reporting this @hmf ,
I've tried the Java snippet with OpenJDK 11, 17 and 19 distributed by Zulu and it always passes, so I don't think it is a JDK version problem (or which distribution do you use?)
Now I'll have to dig a bit deeper to understand how isSegmented
should behave. In my case, I'm entering the case where it is true
and it is working, though I would have expected it to be false
since we are ending up copying by vectors to the destination 2D-array (so one contiguous 1D-array at a time). Need to double-check.
@karllessard Thanks for the feedback. You are correct. I setup the CodeSpace to use JDK 17. It uses the following:
openjdk 17.0.4.1 2022-08-12 LTS
OpenJDK Runtime Environment Microsoft-40354 (build 17.0.4.1+1-LTS)
OpenJDK 64-Bit Server VM Microsoft-40354 (build 17.0.4.1+1-LTS, mixed mode, sharing)
and all tests passed. On my machine I have:
openjdk 17.0.5 2022-10-18
OpenJDK Runtime Environment (build 17.0.5+8-Ubuntu-2ubuntu122.04)
OpenJDK 64-Bit Server VM (build 17.0.5+8-Ubuntu-2ubuntu122.04, mixed mode, sharing)
which fails. Wondering how I can change the JDK version just to make sure.
@Craigacp I am using Scala 3.2.1. Its compatible with JDK 11 and 17. I have not encountered such compatibility issues before. Going to see if I can check on this.
Can you pull down OpenJDK 17.0.4.1 from your preferred JDK provider and check that locally? I wouldn't expect there to be that kind of bug in a point release, but such things happen very occasionally.
I was first trying to see if I can use JDK 17.0.5 on the dev container. Failing that I will see what I can do on my system. Don't want to mess with it if I can (using deb packages).
You can always download a tar.gz version, unpack that locally and then set JAVA_HOME
which won't interfere with your system, but if you can get it running in codespaces that's good too.
Seems the JDK might not be my issue. I have on the dev container:
openjdk 17.0.5 2022-10-18 LTS
OpenJDK Runtime Environment Microsoft-6841604 (build 17.0.5+8-LTS)
OpenJDK 64-Bit Server VM Microsoft-6841604 (build 17.0.5+8-LTS, mixed mode, sharing)
Not the Ubuntu build, but same JDK version. The dev container does not have the open version, so I cannot test that. I am starting to suspect the build script on my side. Going to clone and run locally to see what happens.
EDIT: local execution also runs correctly.
I have created a bare-bones Java project. It has only 1 dependency - NdArray (0.3.3). I use Mill for my build script. I then executed the test (no asserts) and got the same error. Checking the dependencies I get:
./mill -i show ndarray.resolvedRunIvyDeps
[1/1] show
[1/1] show > [5/5] ndarray.resolvedRunIvyDeps
[
"qref:232ea057:/home/hmf/.cache/coursier/v1/https/repo1.maven.org/maven2/org/tensorflow/ndarray/0.3.3/ndarray-0.3.3.jar"
]
I am assuming this is all I need. I then created a fat jar and executed the Java application directly:
:~/VSCodeProjects/stensorflow$ ./mill -i ndarray.assembly
[31/31] ndarray.assembly
~/VSCodeProjects/stensorflow$ java -cp out/ndarray/assembly.dest/out.jar core.JJArrays
(0, 0)
Exception in thread "main" java.nio.BufferOverflowException
at org.tensorflow.ndarray.impl.buffer.Validator.copyToArgs(Validator.java:61)
at org.tensorflow.ndarray.impl.buffer.misc.ArrayDataBuffer.copyTo(ArrayDataBuffer.java:52)
at org.tensorflow.ndarray.impl.dense.DataTransfer.execute(DataTransfer.java:114)
at org.tensorflow.ndarray.impl.dense.AbstractDenseNdArray.read(AbstractDenseNdArray.java:94)
at org.tensorflow.ndarray.StdArrays.copyFrom(StdArrays.java:2735)
at org.tensorflow.ndarray.StdArrays.lambda$copyFrom$75(StdArrays.java:2752)
at org.tensorflow.ndarray.impl.sequence.SlicingElementSequence.lambda$forEachIndexed$0(SlicingElementSequence.java:65)
at org.tensorflow.ndarray.impl.sequence.NdPositionIterator.forEachIndexed(NdPositionIterator.java:43)
at org.tensorflow.ndarray.impl.sequence.SlicingElementSequence.forEachIndexed(SlicingElementSequence.java:64)
at org.tensorflow.ndarray.StdArrays.copyFrom(StdArrays.java:2751)
at core.JJArrays.testIndices(JJArrays.java:155)
at core.JJArrays.main(JJArrays.java:168)
I have in the link below the the fat jar and the source code. Could any brave and generous soul run that and confirm that the exception occurs? After decompressing the contents please open the fat jar and confirm its contents or use a sandbox.
In the meantime I will try and check the original Maven script for any differences.
TIA
I have gone through the Maven script and included all of the dependencies. I have added the following:
val junitVersion = "5.6.2"
val jmhVersion = "1.21"
val junit = ivy"org.junit.jupiter:junit-jupiter-api:${junitVersion}"
val junitEngine = ivy"org.junit.jupiter:junit-jupiter-engine:${junitVersion}"
val jmh = ivy"org.openjdk.jmh:jmh-core:${jmhVersion}"
val jmhGen = ivy"org.openjdk.jmh:jmh-generator-annprocess:${jmhVersion}"
I then reactivated all the original asserts I have in the original test code. For good measure I have also added the export and opens module to the JVM fork arguments (although this seems to be used for compilation only, tested with and without). Once again the copyFom
call fails.
Once again in the link below is the compiled version of the fat jar. I have also included the source. Anyone care to execute this also?
I don't see what else I can do to replicate the error I have, save create a Mill script in my fork for someone else to test. Suggestions welcomed.
Hey @hmf , so I've ran both of your JARs in a docker image I can reproduce this error... But what I've noticed even more is that I did my previous tests with 0.4.0-SNAPSHOT and not 0.3.3 :-/ sorry for that.
So on 0.3.3 it fails, on 0.4.0-SNAPSHOT it does not, I've just validated that. At this point, I cannot tell why.
Can you please try using the snapshot version? We're planning to make the release of 0.4.0 before the end of the year.
<repositories>
<repository>
<id>tensorflow-snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>ndarray</artifactId>
<version>0.4.0-SNAPSHOT</version>
</dependency>
</dependencies>
@karllessard works perfectly. Both Java and Scala examples pass the tests. Should have figured this out too. Thank you very much.
I have some additional code I can add to the existing test. If you like I can make a PR with these tests. Is this of interest to the project?
If you like, I can close this issue.
Once again thanks.
If you like I can make a PR with these tests. Is this of interest to the project?
Of course, please do, that'll be very appreciated!