jtablesaw/tablesaw

JTableSaw deadlocks on column initialization

howard-3 opened this issue · 1 comments

Example code to reproduce

import tech.tablesaw.api.DoubleColumn;
import tech.tablesaw.api.StringColumn;
 
public class Test {
 
  public static void main(String[] args) throws Exception {
    // uncomment the next line to prevent the initialization deadlock.
    // ColumnType.values();
    Runnable r1 = () -> {
      StringColumn.create("abc");
    };
    Runnable r2 = () -> {
      DoubleColumn.create("def");
    };
    Thread t1 = new Thread(r1);
    Thread t2 = new Thread(r2);
    System.out.println("Starting");
    t1.start();
    t2.start();
    t1.join();
    t2.join();
    System.out.println("Done");
  }
 
}

Constructor flow

  graph TD;
      StringColumn --> |in constructor refers to| StringColumnType
      StringColumnType --> |is an extension of| AbstractColumnType
      AbstractColumnType --> |is an impl of|ColumnType
      ColumnType --> |initializes the final values of|StringColumnType
Loading

The root cause seems to be ColumnType class referencing values of many *ColumnType classes.

When you have StringColumn and other ColumnTypes being constructed for the first time concurrently. One thread can hold the class initialization lock for StringColumn (and StringColumnType), and the other thread can hold the one for DoubleColumn and DoubleColumnType. In that scenario, StringColumn cannot finish initialization because it depends on the init lock for ColumnType which in turn relies on DoubleColumnType.

My temporary fix is simply to ensure the class for ColumnType is loaded first.

Relevant JVM tickets:
https://bugs.openjdk.org/browse/JDK-8037567

I'm happy to contribute a fix, but not sure what's the best approach here? Maybe move all the initialization for the different *ColumnTypes to a new class?