opensearch-project/opensearch-spark

[BUG] Nondescript error when EMR roles are misconfigured

Opened this issue · 1 comments

What is the bug?
In AWS, when setting up a cluster and connecting an improper IAM role directly for a new data source, Spark queries will fail with the message:

Failed to verify existing mapping: Failed to get OpenSearch index mapping for query_execution_result_[data source]

A quick code search reveals the exception comes from getIndexMetadata. The solution is to carefully configure a new role with the correct permissions, as described in https://docs.aws.amazon.com/opensearch-service/latest/developerguide/direct-query-s3-creating.html.

It would be helpful to flesh out this error message, to make it more clear.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. In AWS, create a new OpenSearch cluster
  2. Create an IAM role with insufficient permissions for modifying the request index, such as only giving it S3FullAccess.
  3. Create a data source with this role (e.g. example).
  4. Attempt to query the data source. The error is:
Failed to verify existing mapping: Failed to get OpenSearch index mapping for query_execution_result_example

What is the expected behavior?
An error message that hints that the access to the index is misconfigured. This error does explain the problem, but at too low of a level to be especially useful without some familiarity with OS-Spark's internal implementation. Strictly speaking it's likely that this is due to a more specific underlying exception and might not be relevant for all occurrences of this error -- some form of pattern matching may be the solution.

What is your host/environment?

  • OS: Amazon OS 2.13

Do you have any screenshots?
image

Do you have any additional context?
N/A

[Catch All Triage - 1, 2, 3, 4, 5]