DIVBYZERO and INVALID FPEs in CMS tests (pp_dy012j.mad in P2_uc_epemuc/G2)
valassi opened this issue · 22 comments
This is an issue reported by Jin @choij1589 (thanks!) during yesterday's meeting with CMS https://indico.cern.ch/event/1373473/ (issue not in the slides, it was reported in the later discussion)
Details are here https://github.com/choij1589/madgraph4gpu/tree/dev_cms_integration
See in particular this commit master...choij1589:madgraph4gpu:dev_cms_integration
Description and analysis
IIUC there were some DIVBYZERO (and INVALID?) FPEs during the CMS madevent tests - note in particular that this was during CUDA runs, so they cannot come from vectorized C++ code.
My initial guess is that this comes from Fortran code, probably from auto-vectorized Fortran code. We saw something similar in #855 for rotxxx, worked around in #857 with a volatile.
It would be annoying if these issues keep popping up, because decorating the full fortran code with volatile's does not look like a scalable solution.
@oliviermattelaer : is there an easy way in which @choij1589 can remove -O3 from fortran and replace it with -O1 when running madgraph in his CMS environment? (just as a test to see if this makes it disappear)
@oliviermattelaer : is there an easy way in which @choij1589 can remove -O3 from fortran and replace it with -O1 when running madgraph in his CMS environment? (just as a test to see if this makes it disappear)
@choij1589 I imagine that you probably have this in the runcard
-O3 -ffast-math -fbounds-check = global_flag ! build flags for all Fortran code (for a fair comparison to cudacpp; default is -O)
just try removing the -O3 and see if there is a difference when you have it or not have it
Let me know also about if it goes much slower without -O3... (it should not, if you only do this in the cuda or cpp runs, but we never know)
In my run_card.dat, there is no global_flag assigned, so it's using the default one
#*********************************************************************
# Compilation flag.
#*********************************************************************
-O = global_flag ! fortran optimization flag use for the all code.
--fast-math = aloha_flag ! fortran optimization flag for aloha function. Suggestions: '-ffast-math'
-O3 = matrix_flag ! fortran optimization flag for matrix.f function. Suggestions: '-O3'
16384 = vector_size ! size of fortran arrays allocated in the multi-event API for SIMD/GPU (VECSIZE_MEMMAX)
I have tried both -O3 -ffast-math -fbounds-check = global_flag
and -O -ffast-math -fbounds-check = global_flag
flags, but the FPE_* warning remains.
Thanks Jin! Can you check if you have a GLOBAL_FLAG in Source/make_opts anyway please?
Thanks Jin I am having a look.
One point Jin (also for Sapta): you are aware of ccache in the cudacpp? I saw that DY012j is now running builds since more than 1h, it might take 3h, so I do not even want to imagine DY+4jets... I imagine that normally you will run these builds only once, or at least only once per software version, but I can very much imagine that sometimes you will rerun the same test with the same build. In that case, using ccache helps enormously. You should have ccache installed, export USECCACHE=1, and have CCACHE_DIR point to a directory that contains your build caches. I never use AFS or EOS because (with or without cacche) builds are horrible there, so CCACHE_DIR is a local disk on my machine. I never managed to configure this with network CCACHE, but in principle also possible. Maybe one thing to investigate for your gen productions.
I have tried both
-O3 -ffast-math -fbounds-check = global_flag
and-O -ffast-math -fbounds-check = global_flag
flags, but the FPE_* warning remains.
Thanks Jin again. A couple more questions
-
You mention 'FPE warning'. Is this just a warning 'FPE_xx have been reported' or is this a crash?
-
Do you get this FPE warning only in cuda runs or also in fortran runs? (You said cpp does not build right? What was the problem?)
-
Would you be able to tell me in which subdirectory this happens by any chance? I mean which P0, P1 or P2 subprocess, and possibly which G* subdirectory, and what is the input_app.txt file in that G?
Thanks
Andrea
Thanks @valassi , regarding the FPE warning, there is no such warnings in fortran, only happening in CUDA. if I comment out fpeEnable() function, than it gives warnings 'FPE_xx' which are triggered from MatrixElementKernel.cc. Otherwise, it would give me crash in runtime.
Regarding the subprocess, this happens in most of the subdirectory from DY+0j to DY+4j, I have quickly checked with u u~ > e+ e- process and it also generated the same warning. Here is the input_app.txt for P1_uux_epem:
2000 8 3 !Number of events and max and min iterations
0.01 !Accuracy
2 !Grid Adjustment 0=none, 2=adjust
1 !Suppress Amplitude 1=yes
0 !Helicity Sum/event 0=exact
Thanks Jin I am having a look.
One point Jin (also for Sapta): you are aware of ccache in the cudacpp? I saw that DY012j is now running builds since more than 1h, it might take 3h, so I do not even want to imagine DY+4jets... I imagine that normally you will run these builds only once, or at least only once per software version, but I can very much imagine that sometimes you will rerun the same test with the same build. In that case, using ccache helps enormously. You should have ccache installed, export USECCACHE=1, and have CCACHE_DIR point to a directory that contains your build caches. I never use AFS or EOS because (with or without cacche) builds are horrible there, so CCACHE_DIR is a local disk on my machine. I never managed to configure this with network CCACHE, but in principle also possible. Maybe one thing to investigate for your gen productions.
Regading using cached directories, I am currently using condor environment for testing productions so I should build the processes in somewhere(might be EOS area) and xrdcp throught the condor's own disk... but first build on EOS with local node (lxplus8-gpu) would be painful due to other user's interruption:(. But still I believe it would be a good option to check. I will test it around. Thanks!
Regading using cached directories, I am currently using condor environment for testing productions so I should build the processes in somewhere(might be EOS area) and xrdcp throught the condor's own disk... but first build on EOS with local node (lxplus8-gpu) would be painful due to other user's interruption:(. But still I believe it would be a good option to check. I will test it around. Thanks!
Thanks Jin, ok I understand this is not very usable now. But lets keep this in mind. I opened #954.
Thanks @valassi , regarding the FPE warning, there is no such warnings in fortran, only happening in CUDA. if I comment out fpeEnable() function, than it gives warnings 'FPE_xx' which are triggered from MatrixElementKernel.cc. Otherwise, it would give me crash in runtime.
Thanks Jin! Ok so I understand these are warnings and not crashes, only because you modified the code and disabled the FPEs...
On my side I am redoing some tests of DY+2jets. I did see something that looks very much like your FPE issues
Error when reading /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/G2/results.dat
Command "generate_events -f" interrupted with error:
Exception : Reported error: End code 136.0
Full associated log:
(ompnumthreadsNotSetMeansOneThread) DEBUG: OMP_NUM_THREADS is not set: will use only 1 thread
(ompnumthreadsNotSetMeansOneThread) omp_get_max_threads() = 1
__GpuRuntime: calling GpuSetDevice(0)
WARNING! Instantiate device Bridge (nevt=16384, gpublocks=64, gputhreads=256, gpublocks*gputhreads=16384)
INFO: The following Floating Point Exceptions will cause SIGFPE program aborts: FE_DIVBYZERO, FE_INVALID, FE_OVERFLOW
FBRIDGE_MODE (default) = 1
VECSIZE_USED (default) = 16384
Process in group number 2
A PDF is used, so alpha_s(MZ) is going to be modified
Old value of alpha_s from param_card: 0.11799999999999999
****************************************
NNPDFDriver version 1.0.3
Grid: NNPDF23_lo_as_0130_qed_mem0.grid
****************************************
New value of alpha_s from PDF nn23lo1: 0.13000000000000000
Warning! ptj set to xqcut= 30.000000000000000 to improve integration efficiency
Note that this might affect non-radiated jets,
e.g. from decays. Use cut_decays=F in run_card.
Warning! mmjj set to xqcut= 30.000000000000000 to improve integration efficiency
Note that this might affect non-radiated jets,
e.g. from decays. Use cut_decays=F in run_card.
Define smin to 4000.0000000000000
*****************************************************
* MadGraph/MadEvent *
* -------------------------------- *
* http://madgraph.hep.uiuc.edu *
* http://madgraph.phys.ucl.ac.be *
* http://madgraph.roma2.infn.it *
* -------------------------------- *
* *
* PARAMETER AND COUPLING VALUES *
* *
*****************************************************
External Params
---------------------------------
mdl_MT = 173.00000000000000
mdl_MTA = 1.7769999999999999
mdl_MZ = 91.188000000000002
mdl_MH = 125.00000000000000
aEWM1 = 132.50700000000001
mdl_Gf = 1.1663900000000000E-005
aS = 0.11799999999999999
mdl_ymt = 173.00000000000000
mdl_ymtau = 1.7769999999999999
mdl_WT = 1.4915000000000000
mdl_WZ = 2.4414039999999999
mdl_WW = 2.0476000000000001
mdl_WH = 6.3823389999999999E-003
Internal Params
---------------------------------
mdl_conjg__CKM3x3 = 1.0000000000000000
mdl_conjg__CKM1x1 = 1.0000000000000000
mdl_CKM3x3 = 1.0000000000000000
mdl_complexi = (0.0000000000000000,1.0000000000000000)
mdl_MZ__exp__2 = 8315.2513440000002
mdl_MZ__exp__4 = 69143404.913893804
mdl_sqrt__2 = 1.4142135623730951
mdl_MH__exp__2 = 15625.000000000000
mdl_aEW = 7.5467711139788835E-003
mdl_MW = 80.419002445756163
mdl_sqrt__aEW = 8.6872153846781555E-002
mdl_ee = 0.30795376724436879
mdl_MW__exp__2 = 6467.2159543705357
mdl_sw2 = 0.22224648578577766
mdl_cw = 0.88190334743339216
mdl_sqrt__sw2 = 0.47143025548407230
mdl_sw = 0.47143025548407230
mdl_g1 = 0.34919219678733299
mdl_gw = 0.65323293034757990
mdl_vev = 246.21845810181637
mdl_vev__exp__2 = 60623.529110035903
mdl_lam = 0.12886910601690263
mdl_yt = 0.99366614581500623
mdl_ytau = 1.0206617000654717E-002
mdl_muH = 88.388347648318430
mdl_I2x33 = (0.99366614581500623,0.0000000000000000)
mdl_I3x33 = (0.99366614581500623,0.0000000000000000)
mdl_ee__exp__2 = 9.4835522759998875E-002
mdl_sw__exp__2 = 0.22224648578577769
mdl_cw__exp__2 = 0.77775351421422245
Internal Params evaluated point by point
----------------------------------------
mdl_sqrt__aS = 0.34351128074635334
mdl_G__exp__2 = 1.4828317324943823
Couplings of sm-no_b_mass
---------------------------------
GC_10 = -0.12177E+01 0.00000E+00
GC_11 = 0.00000E+00 0.12177E+01
GC_1 = -0.00000E+00 -0.10265E+00
GC_2 = 0.00000E+00 0.20530E+00
GC_3 = -0.00000E+00 -0.30795E+00
GC_50 = -0.00000E+00 -0.28804E+00
GC_58 = -0.00000E+00 -0.27437E-01
GC_59 = 0.00000E+00 0.82310E-01
Collider parameters:
--------------------
Running at P P machine @ 13000.000000000000 GeV
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7f2e29623860 in ???
#1 0x7f2e29622a05 in ???
#2 0x7f2e29254def in ???
#3 0x433257 in ???
#4 0x434487 in ???
#5 0x43507e in ???
#6 0x45047a in ???
#7 0x430937 in ???
#8 0x40371e in ???
#9 0x7f2e2923feaf in ???
#10 0x7f2e2923ff5f in ???
#11 0x403844 in ???
#12 0xffffffffffffffff in ???
ls status:
input_app.txt
run1_app.log
Please report this bug on https://bugs.launchpad.net/mg5amcnlo
More information is found in '/data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/run_01_tag_1_debug.log'.
Please attach this file to your report.
I actually have very many FPEs. I was chaining several backends, fortran, cpp's, cuda, so I am not 100% sure where it happened. But I imagine it was cuda.
Note
more /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/G2/input_app.txt
8192 1 1 !Number of events and max and min iterations
0.1 !Accuracy
2 !Grid Adjustment 0=none, 2=adjust
1 !Suppress Amplitude 1=yes
0 !Helicity Sum/event 0=exact
2
Ok here is a reproducer
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc> ./madevent_cpp < /tmp/avalassi/input_app.txt
And through gdb
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc> gdb ./madevent_cpp
...
(gdb) run < /tmp/avalassi/input_app.txt
...
Program received signal SIGFPE, Arithmetic exception.
0x0000000000433257 in dsig1_vec (all_pp=<error reading variable: value requires 3145728 bytes, which is more than max-value-size>,
all_xbk=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_q2fact=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_cm_rap=<error reading variable: value requires 131072 bytes, which is more than max-value-size>,
all_wgt=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, imode=0,
all_out=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, vecsize_used=16384)
at auto_dsig1.f:395
395 R=R-DABS(ALL_PD(IPSEL,IVEC))/ALL_PD(0,IVEC)
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.el9.x86_64 libgcc-11.3.1-4.3.el9.alma.x86_64 libgfortran-11.3.1-4.3.el9.alma.x86_64 libgomp-11.3.1-4.3.el9.alma.x86_64 libquadmath-11.3.1-4.3.el9.alma.x86_64 libstdc++-11.3.1-4.3.el9.alma.x86_64
(gdb) where
#0 0x0000000000433257 in dsig1_vec (all_pp=<error reading variable: value requires 3145728 bytes, which is more than max-value-size>,
all_xbk=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_q2fact=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_cm_rap=<error reading variable: value requires 131072 bytes, which is more than max-value-size>,
all_wgt=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, imode=0,
all_out=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, vecsize_used=16384)
at auto_dsig1.f:395
#1 0x0000000000434488 in dsigproc_vec (all_p=...,
all_xbk=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_q2fact=<error reading variable: value requires 262144 bytes, which is more than max-value-size>,
all_cm_rap=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, iconf=1, iproc=1, imirror=2,
symconf=..., confsub=..., all_wgt=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, imode=0,
all_out=<error reading variable: value requires 131072 bytes, which is more than max-value-size>, vecsize_used=16384)
at auto_dsig.f:1032
#2 0x000000000043507f in dsig_vec (all_p=..., all_wgt=..., all_xbk=..., all_q2fact=..., all_cm_rap=..., iconf=1, iproc=1, imirror=2,
all_out=..., vecsize_used=16384) at auto_dsig.f:327
#3 0x000000000045047b in sample_full_ ()
#4 0x0000000000430938 in driver () at driver.f:257
#5 0x000000000040371f in main (argc=<optimized out>, argv=<optimized out>) at driver.f:302
#6 0x00007ffff743feb0 in __libc_start_call_main () from /lib64/libc.so.6
#7 0x00007ffff743ff60 in __libc_start_main_impl () from /lib64/libc.so.6
#8 0x0000000000403845 in _start ()
(gdb) p ivec
$1 = 11
(gdb) p all_pd(0,ivec)
$2 = 0
It seems that the issue happens both with -O3 and with -O, -O1 or without -O... (in the fortran side)
Poor man's debugging
write(6,*) 'maxproc=', maxproc
DO IVEC=1,VECSIZE_USED
write(6,*) 'ivec, all_pd(0,ivec) =', ivec, all_pd(0,ivec)
C Do not need those three here do I?
gives
maxproc= 2
ivec, all_pd(0,ivec) = 1 71.066572418427214
ivec, all_pd(0,ivec) = 2 365.96232532682160
ivec, all_pd(0,ivec) = 3 1296.3895857186428
ivec, all_pd(0,ivec) = 4 132.50043236269377
ivec, all_pd(0,ivec) = 5 15.597965701852193
ivec, all_pd(0,ivec) = 6 3942.0292625352567
ivec, all_pd(0,ivec) = 7 238.75633877329895
ivec, all_pd(0,ivec) = 8 143.34436759532593
ivec, all_pd(0,ivec) = 9 586.74521438125407
ivec, all_pd(0,ivec) = 10 422.52108709808942
ivec, all_pd(0,ivec) = 11 0.0000000000000000
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7f5e9b023860 in ???
#1 0x7f5e9b022a05 in ???
#2 0x7f5e9ac54def in ???
#3 0x48c764 in dsig1_vec_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/auto_dsig1.f:397
#4 0x48f3d0 in dsigproc_vec_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/auto_dsig.f:1032
#5 0x490906 in dsig_vec_
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/auto_dsig.f:327
#6 0x4ae2ea in ???
#7 0x488896 in driver
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/driver.f:257
#8 0x488e62 in main
at /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc/driver.f:302
Floating point exception (core dumped)
In other words: this does not look like a SIMD issue (so it makes sense that -O3 or -O1 makes no difference)
It seems that there is a real issue, one event has 0 weight? Why only in cpp and not in fortran?...
Ops... also in fortran it is 0, but then why does it not crash?
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc> ./madevent_fortran < /tmp/avalassi/input_app.txt | more
...
maxproc= 2
ivec, all_pd(0,ivec) = 1 71.066572418427214
ivec, all_pd(0,ivec) = 2 365.96232532682160
ivec, all_pd(0,ivec) = 3 1296.3895857186428
ivec, all_pd(0,ivec) = 4 132.50043236269377
ivec, all_pd(0,ivec) = 5 15.597965701852193
ivec, all_pd(0,ivec) = 6 3942.0292625352567
ivec, all_pd(0,ivec) = 7 238.75633877329895
ivec, all_pd(0,ivec) = 8 143.34436759532593
ivec, all_pd(0,ivec) = 9 586.74521438125407
ivec, all_pd(0,ivec) = 10 422.52108709808942
ivec, all_pd(0,ivec) = 11 0.0000000000000000
ivec, all_pd(0,ivec) = 12 0.0000000000000000
ivec, all_pd(0,ivec) = 13 1563.2708732525296
ivec, all_pd(0,ivec) = 14 21.762972077135977
...
In principle All_PD(0, ivec) can not be zero ...
(without looking at the code to avoid bias) that variable should be a sum (of absolute value) for ALL_PD(IPSEL, IVEC)
For value of ipsel which should be 1 or 2 (corresponding to imirror).
Since we are doing LHC setup ALL_PD(1, IVEC) = ALL_PD(2, IVEC)
(since this correspond to the PDF contribution of the event ...)
Now clearly, given your update I'm wrong somewhere, ...
Thanks @oliviermattelaer I was going to ask you in fact ;-)
More debugging
write(6,*) 'maxproc=', maxproc
DO IVEC=1,VECSIZE_USED
write(6,*) 'ivec, all_pd(0,ivec) =', ivec, all_pd(0,ivec)
write(6,*) 'all_pd(0,ivec) == 0?', all_pd(0,ivec).eq.0
The variable IS 0 both in cpp and fortran. The former prints T and crashes, the latter prints T and does not crash
[avalassi@itscrd90 gcc11/usr] /data/avalassi/GPU2023/madgraph4gpuX/epochX/cudacpp/pp_dy012j.mad/SubProcesses/P2_uc_epemuc> ./madevent_fortran < /tmp/avalassi/input_app.txt | more
...
maxproc= 2
ivec, all_pd(0,ivec) = 1 71.066572418427214
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 2 365.96232532682160
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 3 1296.3895857186428
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 4 132.50043236269377
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 5 15.597965701852193
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 6 3942.0292625352567
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 7 238.75633877329895
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 8 143.34436759532593
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 9 586.74521438125407
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 10 422.52108709808942
all_pd(0,ivec) == 0? F
ivec, all_pd(0,ivec) = 11 0.0000000000000000
all_pd(0,ivec) == 0? T
ivec, all_pd(0,ivec) = 12 0.0000000000000000
all_pd(0,ivec) == 0? T
ivec, all_pd(0,ivec) = 13 1563.2708732525296
all_pd(0,ivec) == 0? F
I imagine that the difference in behaviour (crash or no crash) is just because in cpp there is an explicit call to fpeenable that crashes on divbyzero.
Now why this is 0 is probably for Olivier to debug...
What I will look at in the meantime is if there is a possible workaround to skip some computations if that is zero (i.e. cure the symptoms, not cure the root cause)
I would say that it is pointless to cure the symptoms
Let me understand the real issue which is anyway crucial and required.
I would say that it is pointless to cure the symptoms
Let me understand the real issue which is anyway crucial and required.
Hi yes I had a look and I agree :-)
Thanks Olivier for taking a look!
Indeed the crash is BEFORE the ME calculation. So what is 0 is not the ME, it is the PDF?!
The ALL_PD is a product of U1 and C2 which are the PDFs. So this indicates a zero pdf, which clearly looks like a bug...
DO IVEC=1,VECSIZE_USED
IF (ABS(LPP(IB(1))).GE.1) THEN
!LP=SIGN(1,LPP(IB(1)))
U1(IVEC)=PDG2PDF(LPP(IB(1)),2, IB(1),ALL_XBK(IB(1),IVEC)
$ ,DSQRT(ALL_Q2FACT(IB(1), IVEC)))
ENDIF
IF (ABS(LPP(IB(2))).GE.1) THEN
!LP=SIGN(1,LPP(IB(2)))
C2(IVEC)=PDG2PDF(LPP(IB(2)),4, IB(2),ALL_XBK(IB(2),IVEC)
$ ,DSQRT(ALL_Q2FACT(IB(2), IVEC)))
ENDIF
ENDDO
ALL_PD(0,:) = 0D0
IPROC = 0
IPROC=IPROC+1 ! u c > e+ e- u c
DO IVEC=1, VECSIZE_USED
ALL_PD(IPROC,IVEC)=U1(IVEC)*C2(IVEC)
ALL_PD(0,IVEC)=ALL_PD(0,IVEC)+DABS(ALL_PD(IPROC,IVEC))
ENDDO
IPROC=IPROC+1 ! u c > mu+ mu- u c
DO IVEC=1, VECSIZE_USED
ALL_PD(IPROC,IVEC)=U1(IVEC)*C2(IVEC)
ALL_PD(0,IVEC)=ALL_PD(0,IVEC)+DABS(ALL_PD(IPROC,IVEC))
ENDDO
Looks like my fortran code is not the same as yours (I'm in a middle of merge, which is likely the cause --but means that such merging is super problematic--) but your fortran code makes more sense than mine (and correspond to my explanation above). (Sorry but before focusing on this, will need to fix the merging, one issue at the time).
Now given that "ALL_PD" is a local variable indeed the only possible explanation is that they are ... zero due to the PDF call indeed. Now this likely means that the momenta are likely "bad", The question is obviously why?
One option would be because of the black-box vetoing events, this is possible here but I doubt that this is what is happening here. (this will likely be REWGT set to zero).
The second hyppothesis would be that this is due to a reset of the code after 10 events, Those can occur but typically this happens after the call to the matrix-element... but can you try to set in the run_card: False = mc_grouped_subproc
to see if this might be related to that?
(I'm in a middle of merge, which is likely the cause --but means that such merging is super problematic--) but your fortran code makes more sense than mine (and correspond to my explanation above). (Sorry but before focusing on this, will need to fix the merging, one issue at the time).
Thanks Olivier... maybe open another issue for the merging and we follow that up.
Note, I merged a lot in june24, I am waiting for that. (And I know you are waiting for other things from me... sorry).
The second hyppothesis would be that this is due to a reset of the code after 10 events, Those can occur but typically this happens after the call to the matrix-element... but can you try to set in the run_card: False = mc_grouped_subproc
to see if this might be related to that?
Ok I will try later this week (will be off this afternoon)
Just to follow up on the merging (on which I will open 2 PR on that). This why my mistake (or bias, since I read what "I wanted to see"). So here they are no imirror (which is actually handled in auto_dsig.f, but you still have iproc=2 due to the e+ e- and mu+mu- flavor difference in the final state (where both have identical matrix element). Sorry for the noise here.