[BUG] An exception occurred while hive table join tidb table
shanhai3000 opened this issue · 0 comments
Describe the bug
When executing a query of a hive partition table (big one) inner join a tidb table(small one), the hive partition table is auto broadcasted, which leads an error.
The query is somelike
select hive_table.col1,tidb_table.col2 from hive_table inner join tidb_table on hive_table.col2=tidb_table.col3 where ...
== Physical Plan ==
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Project [... 109 more fields]
+- Generate HiveGenericUDTF#udf.json.JsonExtractValueUDTF(xxx), [, ... 101 more fields], false, [...]
+- Project [, ... 102 more fields]
+- BroadcastHashJoin [xxx#94], [xxxx#475], Inner, BuildRight, false
:- TiKV CoprocessorRDD{[table: xxx] TableReader, Columns: xxxx(): { TableRangeScan: { RangeFilter: [], Range: [([t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_s\000\000\000\000\000\000\000\000])([t\200\000\000\000\000\000\004\253_r\000\000\000\000\000\000\000\000], [t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000])] } }, startTs: 440854942292115639} EstimatedCount:20837
+- BroadcastExchange HashedRelationBroadcastMode(List(input[107, string, false]),false), [plan_id=32]
+- Filter isnotnull(xxx#475)
+- Scan hive xx.xxxxxxxx [, ... 100 more fields], HiveTableRelation [xx
.xxx
, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [..., Partition Cols: [#520, #521, #522, #523], Pruned Partitions: [(, , , )]], [isnotnull(), (), (xx = xx)]
Here I got some log info maybe helpful.
The plan.stats.sizeInBytes
of the LogicalPlan of the hive table is too small and the plan.stats.sizeInBytes
of LogicalPlan of the tidb table is too big.
The stats of the LogicalPlans of the two seems reversed.
Spark and TiSpark version info
Spark 3.2.3
TiSpark 3.1.2(with a profile of spark-3.2)
Additional context