RedisLabs/spark-redis

How to load faster from redis to spark?

f771216203 opened this issue · 0 comments

I load a txt file with 23645053 rows into pyspark and save it to redis, but I found that it cost about three minutes to load and show, and there is also very slow when I want to search query in values. Do you have any suggestion to load and search?

Here is my code:

df = SparkSession(sc).read.csv('/media/yian/666/spark_data/data_end_35.txt',sep='\t')
df = df['_c2', '_c6']
df = df.withColumnRenamed("_c2", "key").withColumnRenamed("_c6", "value")
df.write.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").mode('append').save()

df = spark.read.format("org.apache.spark.sql.redis").option("table", "test").option("key.column", "key").load()
df.show()
test = df.filter(df.value.contains('中華民國'))
test.count()