Comments matching `//*` in header files can cause trick-ICG and data record variable errors
ddj116 opened this issue · 6 comments
Summary
First discovered in the Ramtares transition to Rocky 8 which includes an upgrade of gcc
from 4.8.5 to 8.5.0 and an upgrade of clang
from 3.4.2 to 16.0.6, we have found trick-ICG
to produce errors when a comment block matches a particular syntax in a header file. When this happens, the Trick memory manager doesn't understand structures from that point in the header file down, leading to data recording errors if those structures are configured to be recorded.
How to replicate using SIM_ball_L2
On Trick version 19.7.1, build trick per normal directions and apply the following patch to model code to introduce the comment which causes the ICG error:
diff --git a/trick_sims/Ball/models/ball/L2/include/ball_state.h b/trick_sims/Ball/models/ball/L2/include/ball_state.h
index 3a6cebb8..72c99ab9 100644
--- a/trick_sims/Ball/models/ball/L2/include/ball_state.h
+++ b/trick_sims/Ball/models/ball/L2/include/ball_state.h
@@ -58,6 +58,9 @@ typedef struct { /* BSTATE_WORK ----------------------------------------------*/
typedef struct { /* BSTATE ---------------------------------------------------*/
BSTATE_IN input ; /* -- User inputs */
+ //****************************************************************************
+ // Outputs
+ //****************************************************************************
BSTATE_OUT output ; /* -- User outputs */
BSTATE_WORK work ; /* -- EOM workspace */
After adding just these three lines of comments, the output of trick-ICG
will include this error:
In file included from /<path_to>/trick/trick_sims/Ball/SIM_ball_L2/S_source.hh:103:
/<path_to>/trick/trick_sims/Ball/models/ball/L2/include/ball_state.h:61:3: error: expected member name or ';' after declaration specifiers
//****************************************************************************
^
Even though this error is popping up, S_main
still compiles. When running ./S_main_Linux_8.5_x86_64.exe RUN_test/input.py
we now see these Data Record errors:
|T 0|0.000000| Could not find Data Record variable ball.state.output.position[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.position[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.velocity[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.velocity[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.acceleration[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.acceleration[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.external_force[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.external_force[1].
Additionally, if you attempt to query the trick memory manager for information about ball.state.<something>
, trick_mm.mm.ref_attributes()
will return None
for any structure defined below the 3 comment lines added above. For example, instrumenting the bottom of RUN_test/input.py
with import pdb; pdb.set_trace()
, we can see this behavior pretty easily:
p trick_mm.mm.ref_attributes("ball.state.output.position[0]") # None
p trick_mm.mm.ref_attributes("ball.state.output.position") # None
p trick_mm.mm.ref_attributes("ball.state.output") # None
p trick_mm.mm.ref_attributes("ball.state.input") # <sim_services.REF2; proxy of <Swig Object of type 'REF2 *' at 0x7f2abf12cc30> >
p trick_mm.mm.ref_attributes("ball.state.work") # None
Additional information
Upon more testing, I have found that only //*
above the output
structure is needed to see this behavior, the full 3 lines are not necessary to replicate.
This behavior is not seen on CentOS 7.9 with gcc
4.8.5 and clang
3.4.2 for the exact same code state. We've got several model header files using the //****
style of comment blocks which I will go change as an attempted workaround.
Thanks for reporting. Was able to reproduce based on the provided info. LLVM 14 doesn't complain about //*
while LLVM 15, 16 & 17 don't like //*
///+*
(3 /
or more with *
) works.
Worth noting here that in the process of clearing these //*
comments in our model code, the only clue we had that something was wrong in some of our small unit-level sims was that the regression data (a copy of log*.csv
produced by trick) did not match what we expected. After removing these comments and rebuilding, we find the generated log*.csv
matches the regression copy.
In other words, a non-fatal error during trick-CP
caused by the comment pattern //*
in a header file may lead to an apparent change in sim response. This is certainly surprising, I would expect the data logged variables to be missing from log*.csv
entirely, but that's not what I'm seeing for one of our small unit sims in our CML project (models/interactions/aero/verif/SIM_Test_table/
). Perhaps there's some disconnect between the memory manager and trick data logger.
I feel this is actually an LLVM bug, b/c singling out //* but not /* or ///^n * is kind of an insane thing to do?
@ddj116 Please give it a try with the branch for this issue. The fix will terminate the build if any error from user code so that it won't provide a compiled trick with missing variables potentially. When I was testing without this fix, some cases with //* built successfully and some didn't. This fix will go through all user header files and terminate the build at the end of ICG if any error found in user code.
Confirmed that this new branch detects the errors associated with //*
and returns non-zero when the build fails. Also, I can confirm that without any //*
our sim builds normally with this new branch as well. Thanks!
Can you elaborate on specifically which errors are being detected now and how this works under the hood? Is this llvm
parsing the header files? Were errors deliberately being ignored before or should this have always been killing the sim builds? I'm just trying to understand if we're getting any other error checking from this new approach or if it's purely just looking at problematic comment syntax as described in this specific issue.
Thanks for confirming!
This fix doesn't change anything on how clang detects errors. This is rather a wrapper for doing a bit extra in addition to what is done by clang. These errors are not compilation errors. They are diagnostic info through clang frontend. The error will be always from clang before or after the fix. The fix doesn't check code at all.
Before the fix, regardless what kind of error identified by clang, trick build process will continue. The error was not deliberated being ignored rather if the error didn't cause any compilation error after ICG then the build could be successful. If the error caused any compilation error after ICG, the build could be failed.
The extra work done in this fix is to terminate the trick build if the error is found from user code during ICG. If the error is from system code such as like from wchar.h, it won't cause ICG to terminate the build process. gcc decides whether to terminate the build during compilation.
Hope I didn't make it more confusing.