Helioviewer-Project/api

Error for long STA EUVI 195 JPX request

Closed this issue · 29 comments

A user reported the kdu_merge error

Unable to open the box identified by the `jp2_locator' object supplied to
`jp2_input_box::open'.  The server is deliberately preventing access to the the
box or any stream equivalent.

for STEREO_A_SECCHI_EUVI_195_F2020-01-01T00.00.00Z_T2023-03-01T00.00.00ZB864000L.jpx, i.e., request between 2020-01-01 and 2023-03-01 with 10 days cadence.

This looks to me as corrupted or missing JP2 file on disk.
That was for IAS; GSFC returns JPIP Internal server error, although the JPIP server works in general, that hints a broken JPX file is being cached.

I believe the problem is with esajpip. The jpx is created and it's accessible through normal HTTP.

esajpip log:

2023-04-13 14:49:30,799: /root/esajpip-SWHV/esajpip/esajpip/src/jpeg2000/file_manager.cc:15: ERROR: The image file '/var/www/jp2/movies/STEREO_A_SECCHI_EUVI_195_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx' can not be read
2023-04-13 14:49:30,799: /root/esajpip-SWHV/esajpip/esajpip/src/client_manager.cc:143: ERROR: The image file '/var/www/jp2/movies/STEREO_A_SECCHI_EUVI_195_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx' can not be read

File exists:

$ ls -l /var/www/jp2/movies/STEREO_A_SECCHI_EUVI_195_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx
-rw-r--r-- 1 redacted redacted 874464 Apr 13 14:42 /var/www/jp2/movies/STEREO_A_SECCHI_EUVI_195_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx

Helioviewer link to file:

http://api.helioviewer.org/jp2/movies/STEREO_A_SECCHI_EUVI_195_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx

Please pay attention about start time, end time and cadence.

I'm still getting the 500 error with JHelioviewer

Please remove the file from the movies directory.

Brand new request

image

[13/Apr/2023:15:10:54 -0400] "GET /v2/getJPX/?sourceId=23&startTime=2020-01-01T11:42:33Z&endTime=2023-03-13T11:42:33Z&cadence=1050300&verbose=true&linked=true&jpip=true HTTP/1.1" 200 550 "-" "JHV/SWHV-4.4.2.10777 (aarch64 Mac OS X 12.6.2) Eclipse Adoptium JRE 19.0.2"

http://api.helioviewer.org/jp2/movies/STEREO_A_SECCHI_EUVI_304_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx

After deleting the file and trying again

[13/Apr/2023:15:14:33 -0400] "GET /v2/getJPX/?sourceId=23&startTime=2020-01-01T11:42:33Z&endTime=2023-03-13T11:42:33Z&cadence=1050300&verbose=true&linked=true&jpip=true HTTP/1.1" 200 550 "-" "JHV/SWHV-4.4.2.10777 (aarch64 Mac OS X 12.6.2) Eclipse Adoptium JRE 19.0.2"

jpip log:

2023-04-13 15:14:33,328: /root/esajpip-SWHV/esajpip/esajpip/src/client_manager.cc:143: ERROR: The image file '/var/www/jp2/movies/STEREO_A_SECCHI_EUVI_304_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx' can not 
be read 

I think something is wrong with the file, please hold

never mind, file seems okay I think. I don't know why esajpip is saying it can't read it.

To set precisely the cadence: File->New Image Layer, set Time step in days.

Ah, I missed a line in the log before

2023-04-13 15:14:33,328: /root/esajpip-SWHV/esajpip/esajpip/src/jpeg2000/file_manager.cc:127: ERROR: The code-stream does not include any PLT marker 
2023-04-13 15:14:33,328: /root/esajpip-SWHV/esajpip/esajpip/src/jpeg2000/file_manager.cc:15: ERROR: The image file '/var/www/jp2/movies/STEREO_A_SECCHI_EUVI_304_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx' can not be read 
2023-04-13 15:14:33,328: /root/esajpip-SWHV/esajpip/esajpip/src/client_manager.cc:143: ERROR: The image file '/var/www/jp2/movies/STEREO_A_SECCHI_EUVI_304_F2020-01-01T11.42.33Z_T2023-03-13T11.42.33ZB1050300L.jpx' can not be read 

The code stream error is why it can't read it.

Looks that a file was not transcoded before ingesting.

image

image

image

kdu_merge appears to run successfully. There's no output.
The codestream problem appears to be that the jp2s are not transcoded, right?

Yes, for the JPIP to work, the files made by IDL have to be transcoded. The merge phase doesn't care about that.

It seems there is a separate problem on the IAS server, where the same request raises an error at merging stage.

Ok. Is there any way to identify files that have not been transcoded?

I found the section of times that were failing and transcoded the small date range. The long dated requests are working on gsfc now

Great, I confirm, this specific request now works on GSFC.

The https://github.com/openpreserve/jpylyzer package can show the JP2 file structure. Grep the XML produced and it should have a <pltCount>1</pltCount> element.

This is how I notice you transcribe also the JP2 files produced by ROB (SWAP, EUI, SUVI), which is harmless but unnecessary: they are not produced with IDL and have those markers built-in. The original files have <comment>SIDC OpenJPEG v1.99.0</comment> and on your server <comment>Kakadu-v6.4.1</comment>.

@bogdanni is this still a problem on IAS?

Yes, tagging @ebuchlin
They should update the API server.

Sorry, got confused: they should re-get the data STEREO-A data.
The update of installation is for the other fixed problem (SUVI, EUI).

So, if I understand correctly, some JP2 files have to be deleted and redownloaded, do we know which ones? Only STEREO-A 304 (there seems to be no error with 195) from 2020-01-01T11:42:33 to 2023-03-13T11:42:33, or a smaller interval? (I understand that one cannot dichotomize the interval to find one faulty JP2 because of time sampling)

I do have the error with EUVI 195, also the initial report triggered it with 195.

We are now trying to run kdu_merge on the server on the list of files producing an error, and on parts of this list to identify the file(s) producing the error (we are aware that there might be other problematic files, that have not been sampled over the test interval). We are also running jpylyzer on these and on the other files, and indeed there are differences between normal files and files producing an error when included in the list passed to kdu_merge:

  • User warning: ignoring unknown box ''
  • <tests> in the XML contains more False values
  • <properties> in the XML is less complete

It seems that GSFC has updated these files. Does GSFC have a list of all files checksums? (by the way, this could perhaps be done with multihash).

The issue is now to know which ones we should redownload:

  • the whole time interval?
  • files identified as problematic with jpylyzer?

And should this be done by replacing the files, or by deleting them on disk and in the database (rows in the data table), and re-running hvpull?

I think you can simply replace the files on disk.

Those errors from jpylyzer may indicate file data corruption. Can you please give some URLs for files exhibiting those errors?

Indeed corrupted files... (filled with 0s)

We have a script to detect the corrupted files and replace them. It is running for about a week now, it is not finished, but when I try (as in the original post by @bogdanni) to get STEREO-A 195 between 2020-01-01T00:00:00 and 2023-03-01T00:00:00 in JHelioViewer there is no error anymore (however, this is not strictly 10 days cadence, I don't know how you managed to select exactly a 10-day cadence).

To get precisely 10 day cadence: use File->New Image Layer, then set Time step to 10 days. Of course, the server decides, but it's usually a good approximation, depending on the available observations.

Both GSFC and IAS seem to work for the given instrument and time interval, could this issue be closed?

Yes, I close it now. Worth keeping an eye on the file corruption. Maybe a mechanism for computing and storing a hash on ingestion? Also, it would be useful to be able to compare those hashes between servers, via HAPI, maybe?