Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException

Question

Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException

Closed this issue 3 years ago · 1 comments

When I run the below command,

java -Dconfig.file=application.conf -cp batch/target/scala-2.11/ons-ai-batch-assembly-0.0.1.jar uk.gov.ons.addressindex.Main

I met an error.

21/07/14 02:17:15 WARN Utils: Your hostname, myimac.local resolves to a loopback address: 127.0.0.1; using 192.168.1.5 instead (on interface en0)
21/07/14 02:17:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/07/14 02:17:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/07/14 02:17:20 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
21/07/14 02:17:20 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: RECORD_IDENTIFIER, CHANGE_TYPE, PRO_ORDER, UPRN, UDPRN, ORGANISATION_NAME, DEPARTMENT_NAME, SUB_BUILDING_NAME, BUILDING_NAME, BUILDING_NUMBER, DEPENDENT_THOROUGHFARE, THROUGHFARE, DOUBLE_DEPENDENT_LOCALITY, DEPENDENT_LOCALITY, POST_TOWN, POSTCODE, POSTCODE_TYPE, DELIVERY_POINT_SUFFIX, WELSH_DEPENDENT_THOROUGHFARE, WELSH_THOROUGHFARE, WELSH_DOUBLE_DEPENDENT_LOCALITY, WELSH_DEPENDENT_LOCALITY, WELSH_POST_TOWN, PO_BOX_NUMBER, PROCESS_DATE, START_DATE, END_DATE, LAST_UPDATE_DATE, ENTRY_DATE
 Schema: recordIdentifier, changeType, proOrder, uprn, udprn, organisationName, departmentName, subBuildingName, buildingName, buildingNumber, dependentThoroughfare, thoroughfare, doubleDependentLocality, dependentLocality, postTown, postcode, postcodeType, deliveryPointSuffix, welshDependentThoroughfare, welshThoroughfare, welshDoubleDependentLocality, welshDependentLocality, welshPostTown, poBoxNumber, processDate, startDate, endDate, lastUpdateDate, entryDate
Expected: recordIdentifier but found: RECORD_IDENTIFIER
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/delivery_point/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, ORGANISATION, LEGAL_NAME
 Schema: uprn, organisation, legalName
Expected: legalName but found: LEGAL_NAME
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/organisation/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, PRIMARY_UPRN, THIS_LAYER, PARENT_UPRN
 Schema: uprn, primaryUprn, thisLayer, parentUprn
Expected: primaryUprn but found: PRIMARY_UPRN
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/hierarchy/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, LPI_KEY, LANGUAGE, LOGICAL_STATUS, START_DATE, END_DATE, LAST_UPDATE_DATE, SAO_START_NUMBER, SAO_START_SUFFIX, SAO_END_NUMBER, SAO_END_SUFFIX, SAO_TEXT, PAO_START_NUMBER, PAO_START_SUFFIX, PAO_END_NUMBER, PAO_END_SUFFIX, PAO_TEXT, USRN, USRN_MATCH_INDICATOR, LEVEL, OFFICIAL_FLAG
 Schema: uprn, lpiKey, language, logicalStatus, startDate, endDate, lastUpdateDate, saoStartNumber, saoStartSuffix, saoEndNumber, saoEndSuffix, saoText, paoStartNumber, paoStartSuffix, paoEndNumber, paoEndSuffix, paoText, usrn, usrnMatchIndicator, level, officialFlag
Expected: lpiKey but found: LPI_KEY
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/lpi/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: USRN, STREET_CLASSIFICATION
 Schema: usrn, streetClassification
Expected: streetClassification but found: STREET_CLASSIFICATION
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/street/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: USRN, STREET_DESCRIPTOR, LOCALITY, TOWN_NAME, LANGUAGE
 Schema: usrn, streetDescriptor, locality, townName, language
Expected: streetDescriptor but found: STREET_DESCRIPTOR
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/street_descriptor/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, CROSS_REFERENCE, SOURCE
 Schema: uprn, crossReference, source
Expected: crossReference but found: CROSS_REFERENCE
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/crossref/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, CLASSIFICATION_CODE, CLASS_SCHEME
 Schema: uprn, classificationCode, classScheme
Expected: classificationCode but found: CLASSIFICATION_CODE
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/classification/ABP_E811a_v111017.csv
21/07/14 02:17:24 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, PRIMARY_UPRN, PARENT_UPRN, ESTAB_TYPE, ADDRESS_TYPE
 Schema: uprn, primaryUprn, parentUprn, addressType, estabType
Expected: primaryUprn but found: PRIMARY_UPRN
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/hierarchy/ABP_E811a_v111017.csv
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
        at org.elasticsearch.spark.rdd.EsSpark$.doSaveToEs(EsSpark.scala:104)
        at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:79)
        at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:76)
        at org.elasticsearch.spark.package$SparkRDDFunctions.saveToEs(package.scala:56)
        at uk.gov.ons.addressindex.writers.ElasticSearchWriter$.saveHybridAddresses(ElasticSearchWriter.scala:27)
        at uk.gov.ons.addressindex.Main$.saveHybridAddresses(Main.scala:96)
        at uk.gov.ons.addressindex.Main$.delayedEndpoint$uk$gov$ons$addressindex$Main$1(Main.scala:56)
        at uk.gov.ons.addressindex.Main$delayedInit$body.apply(Main.scala:14)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.App$class.main(App.scala:76)
        at uk.gov.ons.addressindex.Main$.main(Main.scala:14)
        at uk.gov.ons.addressindex.Main.main(Main.scala)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [] failed; server[pvr-locations.es.eu-west-2.aws.cloud.es.io:9243] returned [400|Bad Request:]
        at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:469)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
        at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
        ... 17 more

I have no idea how to resolve this error.

Answer 1 · 2021-07-14T10:21:54.000Z

The Spark job has got to the point of writing the data to ES, but is unable to connect to the cluster. If you Google org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version you'll see there are a number of possible causes, the simplest being the port number - the value set in the reference.conf file is 9200 (default for a local ES installation, cloud deployments are usually on 443 or 80).