Routing harvesting duplicates channelepochs (and related epochs) in the routing DB
kaestli opened this issue · 4 comments
pre-existing channel-epochs are not recognized - each harvesting of the routing declaration introduces all channel epochs again as new ones, rather than updating the last-seen dates of the pre-existing ones.
(This behaviour is not observed with network epochs and station epochs; however those still have the issue that the lastseen date is not updated; cfr #31
affected are the entities channelepoch, epoch (if referred to by a channel epoch), and routing (if referring to such an epoch, however, this is always the case...)
evidence:
Select network.code as net, station.code as sta, channelepoch.id,
channelepoch.locationcode as loc, channelepoch.code as chan,
epoch.starttime as start,
epoch.endtime as end,
epoch.lastseen as ep_lastseen,
routing.lastseen as rout_lastseen,
service.name
from network inner join channelepoch on network.id = channelepoch.network_ref
inner join station on station.id = channelepoch.station_ref
inner join epoch on channelepoch.epoch_ref = epoch.id
left join routing on routing.epoch_ref = epoch.id
left join endpoint on routing.endpoint_ref = endpoint.id
left join service on endpoint.service_ref = service.id
where station.code = 'HAMIK'
and channelepoch.code = 'HGZ'
order by epoch.starttime, epoch.lastseen, channelepoch.id;
-- 5000
-- note: only channelepochs have routing,
-- but all - channelepoch, epoch and routing are multiplied
Select station.code as sta,
epoch.starttime as start,
epoch.endtime as end,
epoch.lastseen as ep_lastseen,
routing.lastseen as rout_lastseen,
service.name
from station inner join stationepoch on station.id = stationepoch.station_ref
inner join epoch on stationepoch.epoch_ref = epoch.id
left join routing on routing.epoch_ref = epoch.id
left join endpoint on routing.endpoint_ref = endpoint.id
left join service on endpoint.service_ref = service.id
where station.code = 'HAMIK'
order by epoch.starttime, epoch.lastseen, stationepoch.id;
-- 1 epoch, no routing
Select network.code as sta,
epoch.starttime as start,
epoch.endtime as end,
epoch.lastseen as ep_lastseen,
routing.lastseen as rout_lastseen,
service.name
from network inner join networkepoch on network.id = networkepoch.network_ref
inner join epoch on networkepoch.epoch_ref = epoch.id
left join routing on routing.epoch_ref = epoch.id
left join endpoint on routing.endpoint_ref = endpoint.id
left join service on endpoint.service_ref = service.id
where network.code = 'CH'
order by epoch.starttime, epoch.lastseen, networkepoch.id
-- 1 epoch, no routing.
(note that this issue would widely be mediated by the cleanup mechanism for routes which disappeared from EIDA (old lastseen dates), however that process is not automated.
While channelepochs and routes are duplicated rather than the lastseen date updated,
for network epochs and station epochs the lastseen date is not updated (which is probably fine as long as it is not used).
The issue does not prevent the federator from "functioning", however it bloats the database and makes queries slow in the long-run.
Until the code is fixed, the database can be kept under control as follows:
-- do this only once:
CREATE INDEX IF NOT EXISTS epoch_epochtype_ref_idx
ON public.epoch USING btree
(epochtype_ref ASC NULLS LAST)
TABLESPACE pg_default;
CREATE UNIQUE INDEX IF NOT EXISTS epochtype_id_idx
ON public.epochtype USING btree
(id ASC NULLS LAST)
TABLESPACE pg_default;
-- do the following regularly:
delete from channelepoch where epoch_ref in
(select id from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp));
delete from routing where epoch_ref in
(select id from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp));
delete from epoch where lastseen < (select ((now() - '3 days'::interval)::date)::timestamp)
and id not in (select epoch_ref from stationepoch)
and id not in (select epoch_ref from networkepoch);
delete from epochtype where id not in (select epochtype_ref from epoch);