Esri/geoportal-server-harvester

Files not harvested to the geoportal

Durga07 opened this issue · 13 comments

I tried harvesting the files from Arcgis online to the Geoportal 2.6.3 through the Harvester 2.6.3. In the home section, it is said that the task is 'completed' but the file is not harvested to the Geoportal.(I'm not using the local system on which geoportal is installed, I'm using my Personal Computer)
This is the exported data of the task performed:
{
"name": "",
"source": {
"type": "AGP-IN",
"label": "ArcgisonlineD",
"properties": {
"agp-host-url": "https://arcg.is/0GqLzP",
"agp-folder-id": "Trail Geoportal",
"cred-username": "Durga7",
"cred-password": "zzzzz",
"agp-emit-xml": "true",
"agp-emit-xml-fmt": "DEFAULT",
"agp-emit-json": "false",
"agp-max-redirects": "5"
},
"keywords": [],
"ref": "4bb5f339-0910-438f-9b19-6a7601ea1a4f"
},
"destinations": [
{
"action": {
"type": "GPT",
"label": "X",
"properties": {
"gpt-host-url": "http://49.207.9.178:8080/geoportal",
"cred-username": "publisher",
"cred-password": "zzzzz",
"gpt-index": "",
"gpt-cleanup": "true",
"gpt-accept-xml": "false",
"gpt-accept-json": "false",
"gpt-translate-pdf": "false"
},
"keywords": [],
"ref": "5cee0cbd-07cb-43fb-8f93-4ec404b26204"
}
}
],
"keywords": [],
"incremental": false,
"ignoreRobotsTxt": false,
"ref": "bc376995-ada1-4ac7-9bfe-3be9232c97d6"
}

hi, the agp-host-url points to a webmap. you need to just point to the root ArcGIS Online URL as indicated in the sample below the input box for the URL.

I tried it this way, the file is still not harvested. Is this correct?
Harvester1

the URL should be https (we'll update the example in the next release). Is your account Durga7 an ArcGIS Online user account (not the geoportal account, right)

Yes, Durga7 is Arcgis Online user account

ah, one other thing. the Folder field is a reference to the folder in ArcGIS Online. To get the proper folder id (not the name), access your ArcGIS Online organization and then in your Content tab, select the folder in the list of folders on the left under 'My Content'. Then you will see the folder id in the URL as the folder parameter

Is this it?
Screenshot (267)

Still, the file is not harvested

yes, that would be the ID

Hi, this is how it came. The files are not updated to the Geoportal
Harvester2

when you select the link of the id on the right what do you see?
also, can you check the harvester AND geoportal logs? some of the validation happens in the geoportal and not in the harvester.

Hi, the id doesn't navigate me to anywhere

Log file of Harvester:
hrv.2020-08-04.log

com.esri.geoportal.harvester.api.ex.DataOutputException: Error publishing data: id: 809f1de4ad424eadb86cb76ab63a8d36, modified: Mon Aug 03 10:47:24 IST 2020, source URI: 809f1de4ad424eadb86cb76ab63a8d36, broker URI: AGP:https://www.arcgis.com
at com.esri.geoportal.harvester.gpt.GptBroker.publish(GptBroker.java:214)
at com.esri.geoportal.harvester.api.base.BrokerLinkActionAdaptor.push(BrokerLinkActionAdaptor.java:64)
at com.esri.geoportal.harvester.api.base.SimpleLink.push(SimpleLink.java:71)
at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$null$0(DefaultProcessor.java:158)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Unknown Source)
at java.util.stream.ReferencePipeline$Head.forEach(Unknown Source)
at com.esri.geoportal.harvester.engine.defaults.DefaultProcessor$DefaultProcess.lambda$new$1(DefaultProcessor.java:156)
at java.lang.Thread.run(Unknown Source)
Caused by: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
at sun.security.ssl.InputRecord.handleUnknownRecord(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:436)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:374)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at com.esri.geoportal.commons.gpt.client.Client.execute(Client.java:700)
at com.esri.geoportal.commons.gpt.client.Client.generateToken(Client.java:754)
at com.esri.geoportal.commons.gpt.client.Client.getAccessToken(Client.java:725)
at com.esri.geoportal.commons.gpt.client.Client.queryIds(Client.java:544)
at com.esri.geoportal.commons.gpt.client.Client.publish(Client.java:256)
at com.esri.geoportal.harvester.gpt.GptBroker.publish(GptBroker.java:199)
... 7 more
05-Aug-2020 14:11:16.747 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportLogger.completed Completed processing task: PROCESS:: status: completed, title: NAME: 04-08, PROCESSOR: DEFAULT[], SOURCE: AGP-IN[agp-host-url=https://www.arcgis.com, agp-folder-id=b63d2f2d73644d00a9aa302dff4c913a, cred-username=Durga7, cred-password=, agp-emit-xml=true, agp-emit-xml-fmt=DEFAULT, agp-emit-json=true, agp-max-redirects=5], DESTINATIONS: [GPT[gpt-host-url=https://49.207.9.178:8080/geoportal, cred-username=publisher, cred-password=, gpt-index=, gpt-cleanup=false, gpt-accept-xml=true, gpt-accept-json=true, gpt-translate-pdf=true]], INCREMENTAL: false, IGNOREROBOTSTXT: true
05-Aug-2020 14:11:16.747 INFO [HARVESTING] com.esri.geoportal.harvester.support.ReportStatistics.completed Harvesting of PROCESS:: status: completed, title: NAME: 04-08, PROCESSOR: DEFAULT[], SOURCE: AGP-IN[agp-host-url=https://www.arcgis.com, agp-folder-id=b63d2f2d73644d00a9aa302dff4c913a, cred-username=Durga7, cred-password=, agp-emit-xml=true, agp-emit-xml-fmt=DEFAULT, agp-emit-json=true, agp-max-redirects=5], DESTINATIONS: [GPT[gpt-host-url=https://49.207.9.178:8080/geoportal, cred-username=publisher, cred-password=, gpt-index=, gpt-cleanup=false, gpt-accept-xml=true, gpt-accept-json=true, gpt-translate-pdf=true]], INCREMENTAL: false, IGNOREROBOTSTXT: true completed at Wed Aug 05 14:11:16 IST 2020. No. succeded: 0, no. failed: 1

your geoportal URL is not correct. you provide port 8080 in combination with https protocol. use http://49.207.9.178:8080/geoportal instead

Yes, the files are harvested to Geoportal.
But why can't I add this in the Map viewer
Capture6(Add AO)

The XML file : http://49.207.9.178:8080/geoportal/rest/metadata/item/02ff69fc5afa473c8e50e7a72184cc2e/xml

that is because the actual resource referenced in the xml is the portal map viewer application. Geoportal can currently only add web services to a map. It looks like your map included the dark grey canvas basemap and the transportation world map service on top of that. if you register that map service separately in your portal, then harvest into geoportal, you should be able to add it to the map.