Esri/geoportal-server-harvester

Harvest failing-- 'no content to map' at JSON mapping

Closed this issue · 10 comments

I'm trying to harvest ISO xml from a web accessible folder, with gpt 2 .? and harvester 2.5. The xml files from the WAF seem to be getting to the harvester, but there's some problem when it gets to a JSON mapping step; Cause is "No content to map due to end-of-input". The log files aren't providing much to go on. I set logging.properties to 'ALL' for the harvester, still only getting errors and Info.

Help!

log extracts from hrv log
17-Apr-2017 16:52:18.260 INFO [https-jsse-nio-8443-exec-4] com.esri.geoportal.harvester.engine.defaults.DefaultProcessor.createProcess SUBMITTING: PROCESSOR: DEFAULT[], SOURCE: WAF[waf-host-url=http://get.iedadata.org/metadata/iso/usap/, waf-pattern=, cred-username=, cred-password=], DESTINATIONS: [GPT[gpt-host-url=http://localhost:8080/geoportal, cred-username=gptadmin, cred-password=, gpt-index=, gpt-cleanup=false]], INCREMENTAL: false, IGNOREROBOTSTXT: true

then... (some entries omitted)

com.esri.geoportal.harvester.api.ex.DataOutputException: Error publishing data: id: http://get.iedadata.org/metadata/iso/usap/600025iso.xml, modified: Wed Apr 05 15:43:19 EDT 2017, source URI: http://get.iedadata.org/metadata/iso/usap/600025iso.xml, broker URI: WAF:http://get.iedadata.org/metadata/iso/usap/
at com.esri.geoportal.harvester.gpt.GptBroker.publish(GptBroker.java:164)
at com.esri.geoportal.harvester.api.base.BrokerLinkActionAdaptor.push(BrokerLinkActionAdaptor.java:64)
.... skip some 'at's
at java.lang.Thread.run(Thread.java:745)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: ; line: 1, column: 0]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:216)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3833)

Stephen, could you please upload/attach your task definition? You can obtain it from 'Tasks' tab by clicking 'Export' button.

Task definition:
{
"source": {
"type": "WAF",
"label": "USAP from IEDA",
"properties": {
"waf-host-url": "http://get.iedadata.org/metadata/iso/usap/",
"waf-pattern": "",
"cred-username": "",
"cred-password": ""
},
"keywords": []
},
"destinations": [{
"action": {
"type": "GPT",
"label": "geoportal local 8080",
"properties": {
"gpt-host-url": "http://localhost:8080/geoportal",
"cred-username": "gptadmin",
"cred-password": "MNPl4ysIqOrNyQHDEXzBxg==JTVFaWVkYTEyMzQlMjE=",
"gpt-index": "",
"gpt-cleanup": "false"
},
"keywords": []
}
}],
"keywords": [],
"incremental": false,
"ignoreRobotsTxt": false
}

Possible problem??? the geoportal is running on https accessed from outside on port 8443, on CentOS. Does this affect communication between tomcat servlets using localhost?

Well, my attempt succeeded. Your theory regarding running on https might be correct, since the error occurs during publishing - I am guessing an empty response from the 'Catalog' has been received. I would be glad to see if there is more in the log file.

Thanks Piotr. I can send you the full log if that would be useful. I've tried updating the logging.properties on the harvester to 'ALL' for
com.esri.geoportal.level = ALL
com.esri.geoportal.harvester.support.ErrorLogger.level = ALL, but I'm still seeing INFO and WARNINGS in the log. Is there something else I need to modify to get more detail (and how do I keep it all from going to catalina.out...?). I'll also check into opening 8080 on the geoportal VM.

INFO is good enough; I would be looking for severe events which happen before.

here's the log from most recent try. Our SysAdmin opened 8080, but from outside its redirecting to 8443 still; not sure if that happens to localhost:8080 requests as well.
zipped log attached
hrv.2017-04-18.zip

Logs tell me that publishing fails every time at the token generation step. That step is a POST request to the http://localhost:8080/geoportal/oauth/token URL with the particular load. In response, there is a JSON returned with token in it for later use. I bet this is what is failing.

In order to investigate this issue I would get myself familiar with a 'Postman' latest version. This is a Google Chrome extension which allow to send any request to the desired end point and inspect the response.

Attached file is a zipped so called 'collection' (Postman term) with just one request in it - to obtain token. You can import it into your postman and run that request. Make sure credentials are correct and the host name is right. This request is exactly the request issued by the harvester. You should expect response like the one below:

{
"access_token": "eyJhbGciOiJIUzI1NiJ9.eyJleHAiOjE0OTI1NDM2ODEsInVzZXJfbmFtZSI6ImdwdGFkbWluIiwiYXV0aG9yaXRpZXMiOlsiUk9MRV9BRE1JTiIsIlJPTEVfUFVCTElTSEVSIl0sImp0aSI6IjgyZjA4NjNiLTgzNjItNDQxNy1iN2M4LTc5Y2IwMTljZmI5MSIsImNsaWVudF9pZCI6Imdlb3BvcnRhbC1jbGllbnQiLCJzY29wZSI6WyJyZWFkIiwid3JpdGUiXX0.3yvcjFxBrZnjmNZVHEqOWwmQJynNQjQ7PIJaQSn1N4E",
"token_type": "bearer",
"expires_in": 7199,
"scope": "read write",
"jti": "82f0863b-8362-4417-b7c8-79cb019cfb91"
}

Anything else indicates problems. You can twist that request the way you want and conduct your own experiments.

One more thing: if any 'redirect' response is being send back then Apache HTTP Client used inside harvester to conduct HTTP communication doesn't handle redirect requests - it would take to change the code to do that). Solution is to register https endpoint instead of the regular http, but this would also require to deal with SSL certificate if your server has only a self-signed certificate (if this is a case you will notice certificate error exception in the log file).

Token.postman_collection.json.zip

We turned off all the ssl config stuff in server.xml running the geoportal, and the harvest works now. Is there any way to get the portal to work with the harvester on https?

the post works, but the harvest is working now as well, with ssl turned off, no https. I have to move on with getting things configured, purging the content in our test instance and reharvesting, so for now I'll forgo https, hopefully figure it out at some point. Thanks for explaining. I'll close for now