HPI-Information-Systems/Metanome

Handling NULLs in databases

sekruse opened this issue · 1 comments

It seems that Metanome cannot handle NULLs in databases so far. Here is an example stacktrace:

Exception in thread "main" java.lang.NullPointerException: at index 8
        at com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:240)
        at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:231)
        at com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:226)
        at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:295)
        at de.metanome.backend.input.database.ResultSetIterator.next(ResultSetIterator.java:94)
        at de.metanome.backend.input.database.ResultSetIterator.next(ResultSetIterator.java:28)
        at de.hpi.metanome.algorithms.hyucc.structures.PLIBuilder.calculateClusterMaps(PLIBuilder.java:49)
        at de.hpi.metanome.algorithms.hyucc.structures.PLIBuilder.getPLIs(PLIBuilder.java:38)
        at de.hpi.metanome.algorithms.hyucc.HyUCC.executeHyUCC(HyUCC.java:188)
        at de.hpi.metanome.algorithms.hyucc.HyUCC.execute(HyUCC.java:170)
        at de.hpi.metanome.flixrunner.mocks.MetanomeMock.executeHyUCC(MetanomeMock.java:153)
        at de.hpi.metanome.flixrunner.mocks.MetanomeMock.execute(MetanomeMock.java:56)
        at de.hpi.metanome.flixrunner.MetanomeTestRunner.run(MetanomeTestRunner.java:42)
        at de.hpi.metanome.flixrunner.Main.main(Main.java:8)

It seems that the culprit is ImmutableList.copyOf(...), which enforces all elements to not be null. In this case, it is used to wrap a tuple fetched from the database. Here, NULLs in the tuple will be represented by null values and, hence, the ImmutableList.copyOf(...) fails.

To solve this issue, it might be considered to change the signature of the method

public ImmutableList<String> next() throws InputIterationException

in ResultSetIterator to

public List<String> next() throws InputIterationException

so as to use a different List implementation that allows for nulls..

The other solution would be to represent NULL values with empty Strings, then we lose a bit of semantic.

Nevermind, it just seems that the currently deployed snapshot version is subject to this issue. But it seems that this bug has been fixed in 4086abf.