sat-utils/sat-api

LandsatMetadataProcessorStateMachine not finding index.html

Closed this issue · 4 comments

I'm trying to stand up my own instance of the sat-api, and am currently stuck with the lambda not able to find the scene files (?)

START RequestId: 3234ee9a-790a-4e8b-a357-a591c0b17130 Version: $LATEST
2018-07-16T06:02:40.619Z	3234ee9a-790a-4e8b-a357-a591c0b17130
{
    "bucket": "foobar",
    "key": "sat-api-v1-dev/ingest/landsat/",
    "currentFileNum": 4215,
    "lastFileNum": 4215,
    "arn": "arn:aws:states:us-east-1:791757209086:stateMachine:LandsatMetadataProcessorStateMachine-t6P5ZogX0wF0"
}
2018-07-16T06:02:40.747Z	3234ee9a-790a-4e8b-a357-a591c0b17130	connected to elasticsearch
2018-07-16T06:02:40.750Z	3234ee9a-790a-4e8b-a357-a591c0b17130	Processing s3://foobar/sat-api-v1-dev/ingest/landsat/4215.csv
2018-07-16T06:02:41.955Z	3234ee9a-790a-4e8b-a357-a591c0b17130	error processing LC81870162018194LGN00: https://landsat-pds.s3.amazonaws.com/c1/L8/187/016/LC08_L1TP_187016_20180713_20180714_01_RT/index.html not available: 
2018-07-16T06:02:42.298Z	3234ee9a-790a-4e8b-a357-a591c0b17130	error processing LC81870172018194LGN00: https://landsat-pds.s3.amazonaws.com/c1/L8/187/017/LC08_L1TP_187017_20180713_20180714_01_RT/index.html not available: 
2018-07-16T06:02:42.629Z	3234ee9a-790a-4e8b-a357-a591c0b17130	error processing LC81870182018194LGN00: https://landsat-pds.s3.amazonaws.com/c1/L8/187/018/LC08_L1TP_187018_20180713_20180714_01_RT/index.html not available: 
2018-07-16T06:02:42.975Z	3234ee9a-790a-4e8b-a357-a591c0b17130	error processing LC81870192018194LGN00: https://landsat-pds.s3.amazonaws.com/c1/L8/187/019/LC08_L1TP_187019_20180713_20180714_01_RT/index.html not available: 
...

A copy of the file 4215.csv referenced above can be found here:

https://gist.github.com/metasim/8123689f232e0951c3fedfd616c9fc05

@metasim
This happens with some files, Not all Landsat files in the CSV index from USGS are actually on s3, so the landsat lambda checks for the existence of the index.html file (it could have checked for any one of the files, but I chose index.html) before it adds it to sat-api.

However, in this case it looks like the files do in fact exist:
https://landsat-pds.s3.amazonaws.com/c1/L8/187/019/LC08_L1TP_187019_20180713_20180714_01_RT/index.html

So I'm wondering if this was just a latency issue in that it was in the CSV file, but had not been ingested into s3 yet when you tried yesterday. If you run this CSV again does it find them this time?

It's very strange.... none of my ingests are successfully populating the database (ES is empty except for the catalog definition). When I look at the most recent logs (from today), the index.html pages referenced all give errors like this:

<Error>
  <Code>NoSuchKey</Code>
  <Message>The specified key does not exist.</Message>
  <Key>
    c1/L8/168/076/LC08_L1GT_168076_20180708_20180717_01_T2/index.html
  </Key>
  <RequestId>D16C5FBADBF102D0</RequestId>
  <HostId>
    eXUUzlNKN4QeA4BDci83nGjk7s+DoIrTcfg7KozWOVpUfjeaObyjLreS9OO4t5lulOpo5Mk7f54=
  </HostId>
</Error>

However, if I go back and look at logs 2 days ago, the index.html files are now available.

Is there a way to have the ingest start populating backwards in time? Or some other way to work around this latency problem? I don't want to keep hitting your sat-api instance for my testing ;-)

@metasim Yeah, you can specify numFiles when firing the lambda, and if you want to go back further just increase numFiles and it will go back farther.

@matthewhanson Win!

screen shot 2018-07-18 at 1 56 14 pm

Thanks for your help on this!