Event order is incorrect when adjacent events have only miliseconds difference
lmpeiris opened this issue · 3 comments
Summary:
I'm importing a dataframe in to pm4py event logs. In some cases in the event logs, the adjacent events in the same case have only milliseconds difference. When event log start activities are checked, it can be seen that event which comes second in the case is selected as start activity. This is also seen when events are plotted using heuristic miner's visualization output.
`
event_df.dtypes
id object
title object
action object
user object
time datetime64[ns]
case object
dtype: object
event_log = pm4py.convert_to_event_log(event_df.rename(columns={'case': 'case:concept:name', 'time': 'time:timestamp','action': 'concept:name'}))
pm4py.get_start_activities(event_log)
{'gl_branch_created': 530,
'gl_issue_created': 37,
'gl_MR_created': 286,
'gl_PL_created': 907,
'gl_issue_assigned': 11}
problem_logs = pm4py.filter_start_activities(event_log, ['gl_issue_assigned'])
problem_df = pm4py.convert_to_dataframe(problem_logs)
problem_df
`
It can be clearly seen that the actual start action for the both of the cases shown in the screenshot is 'gl_issue_created'. However, pm4py considered it to be 'gl_issue_assigned'. Same is there for other actions as well .
versions:
Reproduced on pm4py versions 2.7.11.6 and 2.5.0
pandas version is 1.5.3
running on windows 11, python 3.10.9
Tested using timestamp with tz information as well, same result.
also tested using pandas 2.2.2
On further checking i found that:
- If using format_dataframe method, the issue does not reproduce. I think this is mandatory for proper formatting, after all. If the documentation does not need to be updated, this issue can be closed as invalid.
- When using log_converter instead of convert_to_event_log, issue is reproduced - if format_dataframe is not used
Dear @lmpeiris , the format_dataframe method indeed ensures the correct sorting of events based on timestamp (first sorting criteria inside a case) and event index (second sorting criteria inside a case).