Is it possible to write dataframe into specific redis index without mentioning the table name?

Question

Is it possible to write dataframe into specific redis index without mentioning the table name?

manjudr opened this issue 4 years ago · 1 comments

val spark = SparkSession.builder()
  .appName("redis-df")
  .master("local[*]")
  .config("spark.redis.host", "localhost")
  .config("spark.redis.port", "6379")
  .config("spark.redis.db", 5)
  .config("spark.cassandra.connection.host", "localhost")
  .getOrCreate()```

  Import spark.implicits._
    val someDF = Seq(
      (8, "bat"),
      (64, "mouse"),
      (-27, "horse")
    ).toDF("number", "word")

    someDF.write
      .format("org.apache.spark.sql.redis")
      .option("keys.pattern", "*")
      //.option("table", "person"). // Is it mandatory ?
      .save()

Can I save data into Redis without a table name? Actually just I want to save all data into Redis index 5 without table name is it possible?

I'm currently using this version of spark redis-connector

<dependency>
        <groupId>com.redislabs</groupId>
        <artifactId>spark-redis_2.11</artifactId>
        <version>2.5.0</version>
    </dependency>

The error I get if I do not mention the table name in the config

FAILED java.lang.IllegalArgumentException: Option 'table' is not set. at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208) at org.apache.spark.sql.redis.RedisSourceRelation$$anonfun$tableName$1.apply(RedisSourceRelation.scala:208) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.redis.RedisSourceRelation.tableName(RedisSourceRelation.scala:208) at org.apache.spark.sql.redis.RedisSourceRelation.saveSchema(RedisSourceRelation.scala:245) at org.apache.spark.sql.redis.RedisSourceRelation.insert(RedisSourceRelation.scala:121) at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:30) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)

Answer 1 · 2020-07-08T08:18:39.000Z

Hi @manjudr ,
The table option is mandatory. The idea is that you specify the table name, so it is possible to read the dataframe back from Redis providing that table name.
In your case another option is to convert the dataframe to the key/value RDD and use sc.toRedisKV(rdd)