audienceproject/spark-dynamodb

NPE while trying to connect to a LSI of a table from TableIndexConnector

Opened this issue · 1 comments

I am not sure if there is a support for reading from a LSI in the adapter as it is trying to get getGlobalSecondaryIndexes (My table has none) and then read from it. Please suggest if there is a way to read from LSI of my table.

java.lang.NullPointerException: null
at com.audienceproject.spark.dynamodb.connector.TableIndexConnector.(TableIndexConnector.scala:44)
at com.audienceproject.spark.dynamodb.datasource.DynamoTable.(DynamoTable.scala:49)
at com.audienceproject.spark.dynamodb.datasource.DefaultSource.getTable(DefaultSource.scala:36)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:73)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:256)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:203)
at com.audienceproject.spark.dynamodb.implicits$DynamoDBDataFrameReader.dynamodbAs(implicits.scala:52)

Hello Bhumika
There is currently no support for reading Local Secondary Indexes in the connector. However, I also do not think it would have any value in the absence of "smart query" functionality. The connector does a full table scan no matter what, so reading from the LSI is exactly the same as reading from the base table and projecting the attributes differently. Therefore I suggest you just read from the base table and process the data in Spark.
The connector has support for Global Secondary Indexes because the semantics are different, and items can be duplicated or not exist at all in the GSI depending on the partition key value.
Does this answer your question?