nasa/trick

Comments matching `//*` in header files can cause trick-ICG and data record variable errors

ddj116 opened this issue · 6 comments

Summary

First discovered in the Ramtares transition to Rocky 8 which includes an upgrade of gcc from 4.8.5 to 8.5.0 and an upgrade of clang from 3.4.2 to 16.0.6, we have found trick-ICG to produce errors when a comment block matches a particular syntax in a header file. When this happens, the Trick memory manager doesn't understand structures from that point in the header file down, leading to data recording errors if those structures are configured to be recorded.

How to replicate using SIM_ball_L2

On Trick version 19.7.1, build trick per normal directions and apply the following patch to model code to introduce the comment which causes the ICG error:

diff --git a/trick_sims/Ball/models/ball/L2/include/ball_state.h b/trick_sims/Ball/models/ball/L2/include/ball_state.h
index 3a6cebb8..72c99ab9 100644
--- a/trick_sims/Ball/models/ball/L2/include/ball_state.h
+++ b/trick_sims/Ball/models/ball/L2/include/ball_state.h
@@ -58,6 +58,9 @@ typedef struct { /* BSTATE_WORK ----------------------------------------------*/
 typedef struct { /* BSTATE ---------------------------------------------------*/
 
   BSTATE_IN   input ;           /*    --   User inputs */
+  //****************************************************************************
+  // Outputs
+  //****************************************************************************
   BSTATE_OUT  output ;          /*    --   User outputs */
   BSTATE_WORK work ;         /*    --   EOM workspace */

After adding just these three lines of comments, the output of trick-ICG will include this error:

In file included from /<path_to>/trick/trick_sims/Ball/SIM_ball_L2/S_source.hh:103:
/<path_to>/trick/trick_sims/Ball/models/ball/L2/include/ball_state.h:61:3: error: expected member name or ';' after declaration specifiers
  //****************************************************************************
  ^

Even though this error is popping up, S_main still compiles. When running ./S_main_Linux_8.5_x86_64.exe RUN_test/input.py we now see these Data Record errors:

|T 0|0.000000| Could not find Data Record variable ball.state.output.position[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.position[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.velocity[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.velocity[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.acceleration[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.acceleration[1].
|T 0|0.000000| Could not find Data Record variable ball.state.output.external_force[0].
|T 0|0.000000| Could not find Data Record variable ball.state.output.external_force[1].

Additionally, if you attempt to query the trick memory manager for information about ball.state.<something>, trick_mm.mm.ref_attributes() will return None for any structure defined below the 3 comment lines added above. For example, instrumenting the bottom of RUN_test/input.py with import pdb; pdb.set_trace(), we can see this behavior pretty easily:

  p trick_mm.mm.ref_attributes("ball.state.output.position[0]")  # None
  p trick_mm.mm.ref_attributes("ball.state.output.position")     # None
  p trick_mm.mm.ref_attributes("ball.state.output")              # None
  p trick_mm.mm.ref_attributes("ball.state.input")               # <sim_services.REF2; proxy of <Swig Object of type 'REF2 *' at 0x7f2abf12cc30> >
  p trick_mm.mm.ref_attributes("ball.state.work")                # None   

Additional information

Upon more testing, I have found that only //* above the output structure is needed to see this behavior, the full 3 lines are not necessary to replicate.

This behavior is not seen on CentOS 7.9 with gcc 4.8.5 and clang 3.4.2 for the exact same code state. We've got several model header files using the //**** style of comment blocks which I will go change as an attempted workaround.

Thanks for reporting. Was able to reproduce based on the provided info. LLVM 14 doesn't complain about //* while LLVM 15, 16 & 17 don't like //*
///+* (3 / or more with *) works.

Worth noting here that in the process of clearing these //* comments in our model code, the only clue we had that something was wrong in some of our small unit-level sims was that the regression data (a copy of log*.csv produced by trick) did not match what we expected. After removing these comments and rebuilding, we find the generated log*.csv matches the regression copy.

In other words, a non-fatal error during trick-CP caused by the comment pattern //* in a header file may lead to an apparent change in sim response. This is certainly surprising, I would expect the data logged variables to be missing from log*.csv entirely, but that's not what I'm seeing for one of our small unit sims in our CML project (models/interactions/aero/verif/SIM_Test_table/). Perhaps there's some disconnect between the memory manager and trick data logger.

I feel this is actually an LLVM bug, b/c singling out //* but not /* or ///^n * is kind of an insane thing to do?

@ddj116 Please give it a try with the branch for this issue. The fix will terminate the build if any error from user code so that it won't provide a compiled trick with missing variables potentially. When I was testing without this fix, some cases with //* built successfully and some didn't. This fix will go through all user header files and terminate the build at the end of ICG if any error found in user code.

Confirmed that this new branch detects the errors associated with //* and returns non-zero when the build fails. Also, I can confirm that without any //* our sim builds normally with this new branch as well. Thanks!

Can you elaborate on specifically which errors are being detected now and how this works under the hood? Is this llvm parsing the header files? Were errors deliberately being ignored before or should this have always been killing the sim builds? I'm just trying to understand if we're getting any other error checking from this new approach or if it's purely just looking at problematic comment syntax as described in this specific issue.

Thanks for confirming!

This fix doesn't change anything on how clang detects errors. This is rather a wrapper for doing a bit extra in addition to what is done by clang. These errors are not compilation errors. They are diagnostic info through clang frontend. The error will be always from clang before or after the fix. The fix doesn't check code at all.

Before the fix, regardless what kind of error identified by clang, trick build process will continue. The error was not deliberated being ignored rather if the error didn't cause any compilation error after ICG then the build could be successful. If the error caused any compilation error after ICG, the build could be failed.

The extra work done in this fix is to terminate the trick build if the error is found from user code during ICG. If the error is from system code such as like from wchar.h, it won't cause ICG to terminate the build process. gcc decides whether to terminate the build during compilation.

Hope I didn't make it more confusing.