FINRAOS/HiveQLUnit

IllegalArgumentException on 'mapred.reduce.tasks' property when running hql

Closed this issue · 3 comments

Hi,

I've been trying out HiveQLUnit and I got a test that sets up some tables and data. During this setup I get this exception:

java.lang.IllegalArgumentException: Setting negative mapred.reduce.tasks for automatically determining the number of reducers is not supported. at org.apache.spark.sql.execution.SetCommand.run(commands.scala:92) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144) at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)

I've tried setting the property to a positive value hqlContext.setConf("mapred.reduce.tasks", "2"); but it not being picked up. I've debugged and the hqlContext is the same object where I set the value as where it is being checked. Not really sure what is going on, any ideas? What's the best way to override this value?

Thanks,
Patrick

This is not an error I've ever encountered before. What system are you running on? How are you executing the tests (ie maven or through ide)? Any other context would also be helpful.

The easiest way to set system parameters like that is usually a command line argument, ie -Dmapred.reduce.tasks="2", like some of the other params needed are passed. That might not be the correct way to solve the issue, since like I've said I've never seen this come up before.

Seems that this parameter is deprecated:
https://spark.apache.org/docs/latest/sql-programming-guide.html#reducer-number

Not sure if the other property "spark.sql.shuffle.partitions" will work here.

Thanks for the replies.

I found the problem one of the quite large setup hql scripts did an override of the argument slipped by me. Apologies for taking your time. I'll close this.

BTW I'm running on OSX, running via maven and eclipse.