cmu-phil/tetrad

Some algorithms take discrete dataset as continous one. Resulting in error.

yasu-sh opened this issue · 13 comments

Dear @jdramsey,

The other day I noticed an error when I ran a typical pipeline for explaing the workflow to my colleagues.
Some algortihms make error even the dataset made as BayesNet with 3 category in tetrad simulation box, i.e. default.

OS: Windows 10 Pro 22H2, locale = japanese
Java: JDK 21 (Oracle)
PC: CPU i7-7820HQ / 32GB HP note PC

Reproduction steps:

  1. Start tetrad gui.
  2. Blank, untitled1.tet session opens.
  3. From toolbar, select [pipeline] - [simulate, search then compare]
  4. Double-click Simulation1 Box
  5. No parameters changes, just Press [Simulate] - Button
  6. Press [OK] in simulate dataset dialog
  7. Press [Done] at Simulation1 window
  8. Double-click Search1 Box
  9. Select PC algorithm. No change in other options, parameters.
  10. Press [Set Parameters]
  11. Press [Run Search & Generate Graph]. No change any parameters in PC algorithm.
  12. Error message appears: [Stopped with error: Not a continous data set]
  13. User confused, since dataset looks discrete dataset and no way to understand why dialog treated this dataset as continous one.

I tried some algorithms, not all.
Success: FAS, FCI, GFCI, IMaGES, RFCI
Fail: BOSS, CPC, FGES, FOFC, GRaSP, PC

Some users might have faced this and noticed workarounds.
Unfortunately I have not. I am wondering if somebody tell me a workaround.

image

Error message when using PC algorithm.

java.lang.IllegalArgumentException: Not a continuous data set.
        at edu.cmu.tetrad.data.CovarianceMatrix.<init>(CovarianceMatrix.java:90)
        at edu.cmu.tetrad.data.CovarianceMatrix.<init>(CovarianceMatrix.java:85)
        at edu.cmu.tetrad.data.SimpleDataLoader.getCovarianceMatrix(SimpleDataLoader.java:384)
        at edu.cmu.tetrad.search.score.SemBicScore.getCovarianceMatrix(SemBicScore.java:240)
        at edu.cmu.tetrad.search.score.SemBicScore.<init>(SemBicScore.java:133)
        at edu.cmu.tetrad.search.score.SemBicScorer.scoreDag(SemBicScorer.java:49)
        at edu.cmu.tetrad.search.score.SemBicScorer.scoreDag(SemBicScorer.java:30)
        at edu.cmu.tetrad.algcomparison.statistic.BicEst.getValue(BicEst.java:41)
        at edu.cmu.tetrad.search.utils.LogUtilsSearch.stampWithBic(LogUtilsSearch.java:188)
        at edu.cmu.tetrad.algcomparison.algorithm.oracle.cpdag.Pc.search(Pc.java:112)
        at edu.cmu.tetradapp.model.GeneralAlgorithmRunner.lambda$execute$1(GeneralAlgorithmRunner.java:391)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at edu.cmu.tetradapp.model.GeneralAlgorithmRunner.execute(GeneralAlgorithmRunner.java:366)
        at edu.cmu.tetradapp.editor.GeneralAlgorithmEditor$1MyWatchedProcess.watch(GeneralAlgorithmEditor.java:183)
        at edu.cmu.tetradapp.util.WatchedProcess.lambda$startLongRunningThread$0(WatchedProcess.java:62)
        at java.base/java.lang.Thread.run(Thread.java:1583)

Oh, I know what the problem is there. I fixed it for someone else for the Python version. Here's what happened. I thought it might be a good idea to "stamp" algorithms results with their BIC score. However, the method I used to do this assumed the data was continuous. This is where the exception is coming from.

I'll try to post a revision in the next couple of days.

Thanks.

cg09 commented

@jdramsey @cg09 Thanks for your quick checks. I will check other Java versions afterwords.

@cg09 Thanks for your comment.

Running 7.6.1 on my Mac. No idea what Java version, no problem.

I tried Java 11 with adoptium temurin JRE 11 and oracle JDK17.
The error occurs as well. Currently I use Tetrad 7.6.1 also, latest release.
It means this occurs only in Windows.

I tried tetrad 7.6.1 at Mac mini (M2, 8GB). The error was reproduced as well.

diagram

% java -version
java version "21.0.1" 2023-10-17 LTS
Java(TM) SE Runtime Environment (build 21.0.1+12-LTS-29)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.1+12-LTS-29, mixed mode, sharing)

I will fix it. It's fixed in py-tetrad; I'll need to make a note of all the changes I've made since then and post another version. (I suppose I could post a version with just that one change.)

This will be fixed in the upcoming release, which will come out in a few days.

Good news! I'll be await for the release.

@jdramsey This issue looks being fixed. Thank you.
Only FOFC algorithm just indicated the similar error as I reported.
But my build might not work appropriately now. I will stay this issue opened but It could be solved already.

used commit:
76318f0

Let me check FOFC--that shouldn't be stamping a BIC score, but let me check. Maybe it's a different problem...

Well I did see one bug for FOFC--if you make a random MIM in the Graph box, it doesn't show the edges among the latents, though if you close the graph box and re-open it, the edges are shown. Hmm...

I think this issue is fixed at the release 7.6.2.

@jdramsey This issue looks being fixed. Thank you. Only FOFC algorithm just indicated the similar error as I reported. But my build might not work appropriately now. I will stay this issue opened but It could be solved already.

used commit: 76318f0

@jdramsey I successfully reproduced as you mentioned.

  1. Places simulation box.
  2. Double clicks simulation box
  3. Selects Random One Factor MIM at Type of Graph.
  4. Presses Simulate
  5. Closes Dialog by press Done.
  6. Places Graph Box
  7. Double-click Graph Box
  8. Select Graph, Then Press OK.
  9. The edges between Latent nodes disappear
  10. Closes Dialog by Pressing Done.
  11. Double-clicks Graph Box
  12. The edges at step 9 come up this time.
  13. If you does select 'Direct Acyclic Graph' or 'Structual Equation Model Graph', this symptoms does not come up.

Well I did see one bug for FOFC--if you make a random MIM in the Graph box, it doesn't show the edges among the latents, though if you close the graph box and re-open it, the edges are shown. Hmm...

I feel like closing this issue. let me set up on this.