Importing the same dat file to Controller_Event_Log table causes duplicate table entries
dbrandesky opened this issue · 2 comments
I'm running ATSPM 4.2.4 currently as I have a bit of an older server setup and 4.3.1 wouldn't work. I am also using my own script to retrieve dat files from EOS controllers, placing them into folders named with the signal ID. I have no issues importing the data using NewDecodeandImportASC3Logs. However if I forget to remove the dat files from the signal ID directory once they've already been imported and I run NewDecodeandImportASC3Logs again over those same files I start seeing weird behavior in the signal reports. Most notably the Purdue Phase Termination chart shows "Unknown" for basically all of the phase terminations that coincide with SPM data that was imported more than once.
I didn't dive too deep into the ATSPM code as I'm not a C# guy, but I do know in SQL it's quite possible to INSERT to a table conditionally, checking to make sure the current row doesn't already exist in that table before inserting the row. Is there a reason this was not done in ATSPM? I specifically do not want the dat files deleted from my controllers so that means I have to manually manage which files get brought into my ATSPM instance to avoid duplication.
We used to do a unique constraint on the table. This is something you could still try but our dataset is so large that it could not do the unique check and keep up with the inserts.
Using temp tables was also considered but the solution we looked at was more of a daily combine with the production table. This did not meet the requirement to get the data as close to real time as is possible.
Originally when we were trying to address this with the charts it was determined that in the case of an occasional duplicate it shouldn't affect the chart too much. At the time we did not have as many charts and I can see how this would be a bigger issue now due to many charts looking for events in a specific order. Duplicates would cause these to fail with Phase Termination and categorize everything as Unkown. You could write a script to clean the duplicates but I don't currently have anything.
This will be addressed in version 5.0. The check will be done in memory as part of the compressed object.
Thanks for the quick answer. I can appreciate how this gets far more complicated over a large dataset.
For the sake of others with the same issue, I did some googling and found a way to find and delete duplicates using common table expressions (CTE) in a Microsoft SQL query:
WITH CTE([SignalID],
[Timestamp],
[EventCode],
[EventParam],
DuplicateCount)
AS (SELECT [SignalID],
[Timestamp],
[EventCode],
[EventParam],
ROW_NUMBER() OVER(PARTITION BY [SignalID],
[Timestamp],
[EventCode],
[EventParam]
ORDER BY Timestamp) AS DuplicateCount
FROM [MOE].[dbo].[Controller_Event_Log])
DELETE FROM CTE
WHERE DuplicateCount > 1;