Read getting stuck at stage 0
arturzangiev opened this issue · 1 comments
arturzangiev commented
Trying to read a dataframe from redis instance of AWS, but get stuck at stage 0.
[Stage 0:> (0 + 1) / 1]
self.__spark = SparkSession.builder\
.config('spark.jars.packages', 'com.redislabs:spark-redis_2.12:3.1.0')\
.config("spark.redis.host", "AWS-HOST")\
.config("spark.redis.port", "6379")\
.getOrCreate()
def __read_redis_keys(self) -> DataFrame:
df = self.__spark.read.format("org.apache.spark.sql.redis")\
.option("keys.pattern", "SOME_PATH*")\
.option("infer.schema", True)\
.load()
return df
Spark 3.3.1
Scala 2.12.15
Java 17.0.1
Python 3.8.14
pyspark 3.3.1
Macbook M1
arturzangiev commented
I managed to figure it out. It is clearly networking issue to do with AWS Elasticache. As I deployed to EMR the job successfully get executed. The thing I can't figure out now is why I can't execute it locally as I am on VPN and if I just use redis-cli I can access Elasticache fine. It looks like spark locally can't assign IP correctly.