Esri/geoportal-server-harvester

Debugging Failed Harvest Entries

Closed this issue · 5 comments

What is the best way to debug the cause of failed harvest entries?
After evaluating the hrv.log I get a count of the No. succeeded and the no. Failed.

Failed entries look like this:

Nov 03, 2016 3:52:41 PM com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess lambda$null$0
WARNING: Failed harvesting REF \\UNCPATH.xml | Mon Aug 15 09:15:28 MDT 2016 | \\UNCPATH.xml during UNC [\\UNCPATH] --> [GPT [http://server:8088/geoportal/]]
Nov 03, 2016 3:52:41 PM com.esri.geoportal.harvester.support.ReportLogger error
SEVERE: Error processing task: PROCESS:: status: working, title: UNC [\\UNCPATH] --> [GPT [http://server:8088/geoportal/]]
com.esri.geoportal.harvester.api.ex.DataOutputException: Error publishing data.
	at com.esri.geoportal.harvester.gpt.GptBroker.publish(GptBroker.java:141)
	at com.esri.geoportal.harvester.api.base.BrokerLinkActionAdaptor.push(BrokerLinkActionAdaptor.java:64)
	at com.esri.geoportal.harvester.api.base.SimpleLink.push(SimpleLink.java:71)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$null$0(DefaultProcessor.java:135)
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
	at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
	at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1(DefaultProcessor.java:133)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.http.client.HttpResponseException: Bad Request
	at com.esri.geoportal.commons.gpt.client.Client.publish(Client.java:131)
	at com.esri.geoportal.harvester.gpt.GptBroker.publish(GptBroker.java:131)
	... 7 more

inside the geoportal.log file I see the entries:

2016-11-03 15:52:34,293 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,316 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,332 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,349 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,471 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,877 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,913 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:34,939 DEBUG [com.esri.geoportal.context.AppResponse] - Unrecognized metadata type.
2016-11-03 15:52:35,091 DEBUG [com.esri.geoportal.base.util.DateUtil] - Bad ISO date: REQUIRED: The year (and optionally month, or month and day) for which the data set corresponds to the ground.
2016-11-03 15:52:35,093 DEBUG [com.esri.geoportal.base.util.DateUtil] - Bad ISO date: REQUIRED: The year (and optionally month, or month and day) for which the data set corresponds to the ground.
2016-11-03 15:52:35,527 DEBUG [com.esri.geoportal.context.AppResponse] - Validation exception.

Basically tons of Unrecognized metadata type and Validation exception errors. For Validation errors, how would I know what is failing? The Bad ISO date error is great as it clearly tells me whats going wrong.
Thanks!

I agree that neither "Error publishing data" nor "Bad Request" gives any clue about what went wrong. With the most recent version of Harvester, you would see something like:

failed to parse [apiso_Modified_dt]

This is exactly what comes from the Catalog.

Thanks. Is this the version that has recently been pushed? Or is it unreleased? I just tried it again with the latest version here on git and am getting the same result.

The code is already there (pushed last week).

Well, I would be more than happy to take look at the metadata file. Could you please, attach a single metadata of yours which fails validation?

It does appear to be working. A nice to have would be in the geoportal.log file have a reference to the filename/filepath that the error is referring to:
I find the geoportal.log file much easier to sort through than the hrv.log.

`2016-11-09 17:07:37,744 DEBUG [com.esri.geoportal.base.util.DateUtil] - **INSERT FILEPATH HERE** Bad ISO date: REQUIRED: The year (and optionally month, or month and day) for which the data set corresponds to the ground.

2016-11-09 17:07:37,745 DEBUG [com.esri.geoportal.base.util.DateUtil] - Bad ISO date: REQUIRED: The year (and optionally month, or month and day) for which the data set corresponds to the ground.`