hbutani/spark-druid-olap

ClassCastException, SQLContext can not be cast to SparkLineDataContext

spencerlisp opened this issue · 4 comments

Hi, we want to use sparksql to query druid with the 0.4 version. but encounter the problem which is ClassCastException in SparkLineDataContext.scala:110.

Our Use Case:
step 1: register a temp table from a parquet datasource
DataFrame baseDF = sqlContext.read().parquet("...");
baseDF .registerTempTable("BaseTable");

step2: create temporary table for druid
DruidPlanner planer = new DruidPlanner(sqlContext);
String c_sql = ("CREATE TEMPORARY TABLE DruidTable"
+ " USING org.sparklinedata.druid"
+ " OPTIONS ( sourceDataFrame "BaseTable", "
+ "druidDatasource "test","
+ "timeDimensionColumn "timestamp","
+ "druidHost " + DruidHosts
+ "starSchema '{"factTable" : "DruidTable","relations" : []}' "
+ ")"
);
planer.sqlContext().sql(c_sql);

we use spark-sql to read data from druid like TPCHBENCHMARK cases, It works but we found that operations like project, filters, group by.. can not be pushed down to druid. Whether some component is not initialized correctly? The method we use is really the same to TPCHBENCHMARK.

Yes the SparklineDataContext and wrapper for HiveThriftServer have been there since Jan 2016. This is used to provide a pluggable way to introduce language extensions, Runtime views, Logical and Physical Optimizations, and custom Function.
These get enabled when the wrapped thriftserver is invoked through the start-sparklinedatathriftserver.sh. Take a look at org.apache.spark.sql.hive.sparklinedata.SparklineDataContext, as you can see through ModuleLoaders, the calls for catalog, optimizer, parser are enahnced by loading components from configured ModuleLoaders. An example of a custom Module, is the mysql-compatibility-sample compatibility module.

Now for your example try this:
val sparkLineContext = new SparklineDataContext(sparkContext)
Then use the sparklineContext to setup the DruidPlanner and also use it in place of the sqlContext.

This may work as a test, but the right way is to setup a wrapped SparkShell, just as we did for HiveThirftServer(see org.apache.spark.sql.hive.thriftserver.sparklinedata.SparklineSQLEnv)
See how the Test Framework is setup, as an example: org.apache.spark.sql.hive.test.sparklinedata.SparklineTestHiveContext

The TpchBenchMark class is very very old, it needs to be updated to use SparklineDataContext. Its much easier to run a Benchmark through the SparklineData Thriftserver now(using something like Jmeter), hence we haven't maintained it.

We strongly recommend that you use the wrapped HiveThriftServer, through the start-sparklinedatathriftserver shell script; the jdbc path is the usage path we will support.

hbutani, thank you very much for fast reply!
As your suggestion, I use SparklineSQLEnv.init() to init the SparklineDataContext.
and run with spark yarn cluster, but something wrong happened as below:

Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 21 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 27 more
Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught.
NestedThrowables:
java.lang.reflect.InvocationTargetException
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
... 32 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
... 51 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.derby.jdbc.AutoloadedDriver40
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at java.sql.DriverManager.isDriverAllowed(DriverManager.java:556)
at java.sql.DriverManager.getConnection(DriverManager.java:661)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:349)
at com.jolbox.bonecp.BoneCP.(BoneCP.java:416)
at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)
at org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:298)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
... 59 more
17/03/16 22:56:16 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.reflect.InvocationTargetException)
17/03/16 22:56:16 INFO spark.SparkContext: Invoking stop() from shutdown hook

Can you try adding the jars from the spark jars folder to your classpath