ONSdigital/address-index-data

Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException

Closed this issue · 1 comments

When I run the below command,

java -Dconfig.file=application.conf -cp batch/target/scala-2.11/ons-ai-batch-assembly-0.0.1.jar uk.gov.ons.addressindex.Main

I met an error.

21/07/14 02:17:15 WARN Utils: Your hostname, myimac.local resolves to a loopback address: 127.0.0.1; using 192.168.1.5 instead (on interface en0)
21/07/14 02:17:15 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/07/14 02:17:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/07/14 02:17:20 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
21/07/14 02:17:20 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: RECORD_IDENTIFIER, CHANGE_TYPE, PRO_ORDER, UPRN, UDPRN, ORGANISATION_NAME, DEPARTMENT_NAME, SUB_BUILDING_NAME, BUILDING_NAME, BUILDING_NUMBER, DEPENDENT_THOROUGHFARE, THROUGHFARE, DOUBLE_DEPENDENT_LOCALITY, DEPENDENT_LOCALITY, POST_TOWN, POSTCODE, POSTCODE_TYPE, DELIVERY_POINT_SUFFIX, WELSH_DEPENDENT_THOROUGHFARE, WELSH_THOROUGHFARE, WELSH_DOUBLE_DEPENDENT_LOCALITY, WELSH_DEPENDENT_LOCALITY, WELSH_POST_TOWN, PO_BOX_NUMBER, PROCESS_DATE, START_DATE, END_DATE, LAST_UPDATE_DATE, ENTRY_DATE
 Schema: recordIdentifier, changeType, proOrder, uprn, udprn, organisationName, departmentName, subBuildingName, buildingName, buildingNumber, dependentThoroughfare, thoroughfare, doubleDependentLocality, dependentLocality, postTown, postcode, postcodeType, deliveryPointSuffix, welshDependentThoroughfare, welshThoroughfare, welshDoubleDependentLocality, welshDependentLocality, welshPostTown, poBoxNumber, processDate, startDate, endDate, lastUpdateDate, entryDate
Expected: recordIdentifier but found: RECORD_IDENTIFIER
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/delivery_point/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, ORGANISATION, LEGAL_NAME
 Schema: uprn, organisation, legalName
Expected: legalName but found: LEGAL_NAME
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/organisation/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, PRIMARY_UPRN, THIS_LAYER, PARENT_UPRN
 Schema: uprn, primaryUprn, thisLayer, parentUprn
Expected: primaryUprn but found: PRIMARY_UPRN
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/hierarchy/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, LPI_KEY, LANGUAGE, LOGICAL_STATUS, START_DATE, END_DATE, LAST_UPDATE_DATE, SAO_START_NUMBER, SAO_START_SUFFIX, SAO_END_NUMBER, SAO_END_SUFFIX, SAO_TEXT, PAO_START_NUMBER, PAO_START_SUFFIX, PAO_END_NUMBER, PAO_END_SUFFIX, PAO_TEXT, USRN, USRN_MATCH_INDICATOR, LEVEL, OFFICIAL_FLAG
 Schema: uprn, lpiKey, language, logicalStatus, startDate, endDate, lastUpdateDate, saoStartNumber, saoStartSuffix, saoEndNumber, saoEndSuffix, saoText, paoStartNumber, paoStartSuffix, paoEndNumber, paoEndSuffix, paoText, usrn, usrnMatchIndicator, level, officialFlag
Expected: lpiKey but found: LPI_KEY
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/lpi/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: USRN, STREET_CLASSIFICATION
 Schema: usrn, streetClassification
Expected: streetClassification but found: STREET_CLASSIFICATION
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/street/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: USRN, STREET_DESCRIPTOR, LOCALITY, TOWN_NAME, LANGUAGE
 Schema: usrn, streetDescriptor, locality, townName, language
Expected: streetDescriptor but found: STREET_DESCRIPTOR
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/street_descriptor/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, CROSS_REFERENCE, SOURCE
 Schema: uprn, crossReference, source
Expected: crossReference but found: CROSS_REFERENCE
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/crossref/ABP_E811a_v111017.csv
21/07/14 02:17:23 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, CLASSIFICATION_CODE, CLASS_SCHEME
 Schema: uprn, classificationCode, classScheme
Expected: classificationCode but found: CLASSIFICATION_CODE
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/classification/ABP_E811a_v111017.csv
21/07/14 02:17:24 WARN CSVDataSource: CSV header does not conform to the schema.
 Header: UPRN, PRIMARY_UPRN, PARENT_UPRN, ESTAB_TYPE, ADDRESS_TYPE
 Schema: uprn, primaryUprn, parentUprn, addressType, estabType
Expected: primaryUprn but found: PRIMARY_UPRN
CSV file: file:///Volumes/REPO/Goran/address-index/batch/src/test/resources/csv/hierarchy/ABP_E811a_v111017.csv
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:340)
        at org.elasticsearch.spark.rdd.EsSpark$.doSaveToEs(EsSpark.scala:104)
        at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:79)
        at org.elasticsearch.spark.rdd.EsSpark$.saveToEs(EsSpark.scala:76)
        at org.elasticsearch.spark.package$SparkRDDFunctions.saveToEs(package.scala:56)
        at uk.gov.ons.addressindex.writers.ElasticSearchWriter$.saveHybridAddresses(ElasticSearchWriter.scala:27)
        at uk.gov.ons.addressindex.Main$.saveHybridAddresses(Main.scala:96)
        at uk.gov.ons.addressindex.Main$.delayedEndpoint$uk$gov$ons$addressindex$Main$1(Main.scala:56)
        at uk.gov.ons.addressindex.Main$delayedInit$body.apply(Main.scala:14)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.App$class.main(App.scala:76)
        at uk.gov.ons.addressindex.Main$.main(Main.scala:14)
        at uk.gov.ons.addressindex.Main.main(Main.scala)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [] failed; server[pvr-locations.es.eu-west-2.aws.cloud.es.io:9243] returned [400|Bad Request:]
        at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:469)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:426)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:388)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:392)
        at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:168)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:735)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:330)
        ... 17 more

I have no idea how to resolve this error.

The Spark job has got to the point of writing the data to ES, but is unable to connect to the cluster. If you Google org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version you'll see there are a number of possible causes, the simplest being the port number - the value set in the reference.conf file is 9200 (default for a local ES installation, cloud deployments are usually on 443 or 80).