LouisK130/IFCB-Annotate

Bin dates and bin sorting in the annotation tool

Closed this issue · 18 comments

In the case where I load files from a large time range, the files come into the tool sorted in alphabetical order. This is not ideal, because if I get interrupted and want to restart, I cannot just select a shorter time range as I will see all of the "D" files again. It would be much more convenient to have the bins sorted by time of when they were created. Matlab scripts already exist to parse times from file names, but the times are not part of the annotation data base. We have recently found other reasons to need times associated with the bins as well.

At the moment, I have a bin that is crashing on a bad stitch. There isn't a streamlined way to just start on the next bin and continue. I am on bin 300+/1912 bins.....

A feature that could help work around some of this would be to have the ability to "jump to" a different set of 10 bins. That would be extremely helpful.

Here is an SQL function that parses the bin ID timestamp.

create function bin_timestamp(bin text)
returns timestamp with time zone as $$
begin
set time zone UTC;

if left(bin, 1) = 'I' then
    return to_timestamp(substring(bin from 7 for 15), 'YYYY_DDD_HH24MISS');
else
    return to_timestamp(substring(bin from 2 for 8) || substring(bin from 11 for 6), 'YYYYMMDDHH24MISS');
end if;

end;
$$ language plpgsql;

this could be used to populate a timestamp column in the database

I believe the bins are currently sorted however they come out of the dashbaord API, as I don't do anything to them. Is the issue only when the range includes bins of each of the 2 different naming formats?

There are sort of three different naming formats:
IFCB1...
IFCB5...
D...

And yes, that is the issue.

Another example of needing dates is that we want to export some pngs randomly from all of the years of data. But, if the bins don't know what year they are from, that is not straight forward.

I just confirmed that the dashboard feed API returns bins in time order.

For example in this output the I bins come before the D bins

https://ifcb-data.whoi.edu/mvco/api/feed/temperature/start/2018-01-10/end/2018-01-23

@eepeacock Can you give me a time range you're experiencing this issue with? I'm not able to reproduce it. If I search January 10 2018 through January 23 2018, the bins appear in time order just as in the API link Joe sent.

The whole time series. For example: 1 June 2006- 9 March 2019. Then choose a case of a certain class, to weed out bins without existing annotations (for example, Chaetoceros).

I don't think 2018 has an I files. Try all of 2017, taxa Chaetoceros:
image

2018 does have I files. For example

https://ifcb-data.whoi.edu/mvco/IFCB5_2018_010_163444.html

for all of 2017, the bins come back from my API in time order.

https://ifcb-data.whoi.edu/mvco/api/feed/temperature/start/2017-01-01/end/2018-01-01

BTW doing a time range query for the entire MVCO time series takes a long time, which will slow down the annotation tool, and it returns a 28MB JSON file containing information for 252k bins.

Perhaps searching by class is the issue?

I would love to have a way to search by type of annotation done etc, and not query the whole time series, but that is a whole separate development project... of tracking what is completed on each bin!
I think adding the view could be the issue.

Yup, the view selection is the source of the issue. I'll push a fix.

thank you!

Should be resolved. Can you confirm?

Joe, let me know when to try.

ready to go.

That did work for this use case.
I suppose for the current batch I was on, I still need to start back at the beginning, but after that, I will be in a better position. Thank you!