Infinite Plaso import (spinning wheel) - Duplicate timeline statuses in database
Opened this issue · 0 comments
Describe the bug
When importing a set of Plaso files at the same time (concurrent HTTP importations), one can reach to the following
undesired state: In the Timesketch database, the table timeline_status
contains (at least) 2 lines with the same
parent_id
:
- An "old" one (related to the
created_at
column), not deleted, that has itsstatus
set to processing, - The latest one (related to the
created_at
column) with thestatus
set to ready.
The ready line shows that the import is already done, but the processing line is still there: This is the bug.
At the frontend side, the related timeline permanently shows a spinning wheel, expecting the end of the (already
terminated) process.
Actually, I can identify several cases with the same parent_id
in the timeline_status
table, which is already
problematic
However, the bug becomes visible in the frontend when one of the less recent line's status
is set to processing.
I suspect this is linked to concurrent access to the database from workers when importing Plaso data.
I mean, I think no transaction protects the access to the timeline_status
table.
I think the bug is precisely here (see the TODO
comment) (timesketch/models/annotations.py):
def set_status(self, status):
"""
Set status on object. Although this is a many-to-many relationship
this makes sure that the parent object only has one status set.
Args:
status: Name of the status
"""
# TODO Fix refresh self.status now.
for _status in self.status:
self.status.remove(_status)
self.status.append(self.Status(user=None, status=status))
db_session.add(self)
db_session.commit()
To Reproduce
Steps to reproduce the behavior:
- Create a campaign,
- Create a timeline,
- At the same time (concurrent HTTP requests), import a set of Plaso files to this timeline.
The problem cannot be reproduced systematically and Plaso files sizes can be small.
Expected behavior
The processing line should have been removed from the timeline_status
table when the import is done, and the ready
line should be the only one remaining.
In such a case, the spinning wheel should disappear from the GUI, making data available for this timeline.
More generally in the database, a timeline
should never have more than a single timeline_status
line.
Screenshots
Desktop (please complete the following information):
- OS: Windows
- Browser Firefox
- Version 115.15.0esr (64 bits)
Note: The desktop set-up is not related to the problem as it is a backend bug.
Additional context
GET /api/v1/version
:
{
"meta":{
"plaso_version":"20240308",
"version":"20240508.1"
},
"objects":[]
}
Below is a set of SQL queries to easily identify the bug:
Getting timelines with the bug, that is:
- Having more than one
timeline_status
, - Having a
timeline_status
one with thestatus
set to processing, - This
timeline_status
is not the most recent one.
select
sketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.status,
timeline_status.rank
from (
select * from (
select *, row_number() over (partition by parent_id order by created_at desc) rank
from timeline_status
where parent_id is not null
) duplicates
where
duplicates.rank > 1
and duplicates.status = 'processing'
order by created_at
) timeline_status
inner join timeline on timeline_status.parent_id = timeline.id
inner join sketch on timeline.sketch_id = sketch.id;
Fixing the bug, that is, deleting the timeline_status
lines with the status
set to processing when they are not
the most recent one.
delete from timeline_status where id in (
select id
from (
select *, row_number() over (partition by parent_id order by created_at desc) rank
from timeline_status
where parent_id is not null
) duplicates
where
duplicates.rank > 1
and duplicates.status = 'processing'
);
timeline_status
lines for timelines
bound to more than one with the processing bug:
select
sketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.status
from (
select
parent_id as id
from (
select
parent_id,
status,
row_number() over (partition by parent_id order by created_at desc) as rank
from timeline_status
where parent_id is not null
) grouped_timeline_status
where
grouped_timeline_status.rank > 1
and grouped_timeline_status.status = 'processing'
group by parent_id
order by parent_id
) problematic_timeline
inner join timeline_status on problematic_timeline.id = timeline_status.parent_id
inner join timeline on problematic_timeline.id = timeline.id
inner join sketch on timeline.sketch_id = sketch.id
order by
sketch.id,
timeline.id,
timeline_status.created_at;
All timeline_status
lines for timelines
bound to more than one (a more general problem):
select
sketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.status
from (
select
parent_id as id
from (
select
parent_id,
row_number() over (partition by parent_id order by created_at desc) as rank
from timeline_status
where parent_id is not null
) grouped_timeline_status
where
grouped_timeline_status.rank > 1
group by parent_id
order by parent_id
) problematic_timeline
inner join timeline_status on problematic_timeline.id = timeline_status.parent_id
inner join timeline on problematic_timeline.id = timeline.id
inner join sketch on timeline.sketch_id = sketch.id
order by
sketch.id,
timeline.id,
timeline_status.created_at;
Distribution of timelines with more than a single timeline_status
:
select
sketch.name,
timeline.name,
timeline_status_stat.count
from
sketch
inner join timeline on sketch.id = timeline.sketch_id
inner join (
select parent_id,
count(1) as count
from timeline_status
where parent_id is not null
group by parent_id
having count(1) > 1
) timeline_status_stat on timeline.id = timeline_status_stat.parent_id
order by
timeline_status_stat.count desc,
sketch.name,
timeline.name;