NAICNO/Jobanalyzer

Small variations in sonalyze output in runs on the same data

Opened this issue · 0 comments

There are two variations I've observed:

  • records are transposed in the output, ie the sort order is not total
  • there are small variations in the calculated results, I see this in peak RAM use for jobs for example

The first one is more a matter of taste than anything, but it comes down to improving the predicate used by sortableSummaries in jobs/print.go. For testing purposes we can pass the output through sort and we'll have something more stable.

The second one is more worrisome. Here's the diff from two adjacent runs of jobs (the outputs have been post-sorted to avoid the first problem):

2139c2139
< 2141439   ec-veronsua                 0d 3h35m   int-3    1690     2866      25       34        0        0         0           0            STAR,gzip,sh,sh <defunct>
---
> 2141439   ec-veronsua                 0d 3h35m   int-3    1690     2865      25       34        0        0         0           0            STAR,gzip,sh,sh <defunct>
7692c7692
< 610538    ec-bthj                     0d 1h50m   c1-11    840      867       20       22        0        0         0           0            features,kromosynth,kromosynth-gRPC,kromosynth-rend,quality_mood
---
> 610538    ec-bthj                     0d 1h50m   c1-11    840      866       20       22        0        0         0           0            features,kromosynth,kromosynth-gRPC,kromosynth-rend,quality_mood
57631c57631
< 685540    ec-milas                    0d 2h15m   c1-26    1591     1917      60       79        0        0         0           0            java,mutect2_v3.sh
---
> 685540    ec-milas                    0d 2h15m   c1-26    1591     1918      60       79        0        0         0           0            java,mutect2_v3.sh
57721c57721
< 685670    ec-milas                    1d11h30m   c1-9     3439     50489     144      226       0        0         0           0            java,mutect2_v3.sh,perl
---
> 685670    ec-milas                    1d11h30m   c1-9     3439     50490     144      226       0        0         0           0            java,mutect2_v3.sh,perl
60543c60543
< 691952    ec-edwardfb                 0d14h10m   gpu-5    1945     4249      84       120       350      376       94          94           python
---
> 691952    ec-edwardfb                 0d14h10m   gpu-5    1945     4250      84       120       350      376       94          94           python
70421c70421
< 708211    ec-milas                    0d15h30m   c1-22    3301     37799     116      207       0        0         0           0            java,mutect2_v3.sh,perl
---
> 708211    ec-milas                    0d15h30m   c1-22    3301     37800     116      207       0        0         0           0            java,mutect2_v3.sh,perl
72660c72660
< 711767    ec-koenvg                   0d10h 0m   c1-19    567      631       30       31        0        0         0           0            python,python3.11
---
> 711767    ec-koenvg                   0d10h 0m   c1-19    567      630       30       31        0        0         0           0            python,python3.11

(Command: sonalyze jobs -data-dir ~/sonar/data/fox.educloud.no -u - -from 2024-05-01 -to 2024-06-30)

There's usually a small difference - one ULP - in the memory readings. This is probably some kind of numeric instability, which may in turn come down to the order in which records are processed, but it would be good to verify that, and if so, to fix it.