waterguo/antsdb

Random error can not connect to antsbd

iambudi opened this issue · 5 comments

Randomly can not connect to antsdb from dbeaver especially after cancelling add index that takes long time.

ERROR server.SaltedFish - failed to start
com.antsdb.saltedfish.storage.OrcaHBaseException: hbase is currently linked to a different antsdb instance 1114301489047
	at com.antsdb.saltedfish.storage.HBaseStorageService.init(HBaseStorageService.java:248)
	at com.antsdb.saltedfish.storage.HBaseStorageService.open(HBaseStorageService.java:153)
	at com.antsdb.saltedfish.nosql.Humpback.initStorage(Humpback.java:296)
	at com.antsdb.saltedfish.nosql.Humpback.init(Humpback.java:249)
	at com.antsdb.saltedfish.nosql.Humpback.open(Humpback.java:195)
	at com.antsdb.saltedfish.sql.Orca.<init>(Orca.java:132)
	at com.antsdb.saltedfish.server.SaltedFish.startDatabase(SaltedFish.java:137)
	at com.antsdb.saltedfish.server.SaltedFish.start(SaltedFish.java:54)
	at com.antsdb.saltedfish.server.SaltedFish.run(SaltedFish.java:133)
	at com.antsdb.saltedfish.server.SaltedFishMain.run(SaltedFishMain.java:83)
	at com.antsdb.saltedfish.server.SaltedFishMain.main(SaltedFishMain.java:44)

antsdb-check

antsdb home: /Applications/antsdb
antsdb configuration: /Applications/antsdb/conf/conf.properties
storage: hbase
hbase config: /Applications/hbase-2.1.4/conf/hbase-site.xml
zookeeper quorum: localhost
quorum is connected

Tried to run hbase-relink does not help. I can no longer connect to dbeaver.

I can reproduce the issue. When doing long running and slow query (like sum and join) in large data, cancelling the process from dbeaver does not stop antdsb to continue running the query in the background. So stopping antsdb and hbase service and rerun will dump the error above. antsdb failed to start.

I'm doing trial and error why this happened. It seems i did not aware of changing humpback.hbase-system-ns to something else that make the error above raised.

They are separate issues that you mentioned.

  1. error "hbase is currently linked to a different antsdb instance 1114301489047". Looks like you installed a new instance of antsdb but trying to use the same hbase. You cannot use the same hbase namespace with different antsdb instance because that will cause data corruption. There are 3 solutions
    (a) Empty the hbase, so you can bind the new antsdb instance to it
    (b) Use a different namespace using humpback.hbase-system-ns setting
    (c) Relink the new antsdb installation to the existing hbase namespace using hbase-relink command

  2. "cancelling the process from dbeaver does not stop antdsb to continue running the query in the background". This is the same behaviour as MySQL database. AntDB is trying to simulate the same behaviour as MySQL. However you can use KILL QUERY statement to kill the running query.

  3. AntsDB is not designed to handle long and complex query on a very large record set efficiently since there are existing technologies can handle it very well. Our users are currently using SparkSQL/Spark for long and complex queries in HBase directly. One of our users requested the capability to reroute complex query to Spark intelligently internally but it is not happening in the near future. In a short word, if you want real-time queries on a small amount of records (less that 1m) do it in antsdb. If you want to run a complex query on a large results, do it in Spark.

Thank you for the answer.

AntsDB is not designed to handle long and complex query on a very large record set efficiently since there are existing technologies can handle it very well.

How much/large data that antsdb can handle best for performance?
I currently use generated sample data around 19 million rows and doing a simple join to one smaller table never succeed, it's like never ends. Querying to only single table works nicely can give result in few seconds.

In a short word, if you want real-time queries on a small amount of records (less that 1m) do it in antsdb

For small amount of records, mysql 8 can still be considered competitive with antsdb.

This is interesting, can you give more explanation about use case of antsdb.

AntsDB is designed to complement Hadoop technology so that people can run front-end application directly on top of Hadoop and reuse existing MySQL applications or components. It brings the MySQL compatibility and very low latency on par with traditional RDMS to Hadoop. It doesn't replace Spark/MapReduce. Instead it synergies well with them. In the end, you can have your operational applications and complex big data analytics running in the same Hadoop environment.

AntsDB can work on very big tables - tables with more than 100m rows. But it is not efficient to scan that many rows to come up with the result. For example, if you want to do a GROUP BY on top of 100m rows, you should use Spark/SparkSQL/MapReduce. But if you have a good filter in the WHERE clause that can limit the scan to under 1m rows, AntsDB can produce the result a lot faster thanks to the secondary index.

There are couple of benefits.
(1) you can build a much larger operational database than MySQL. In real life, people start to have performance issues with MySQL databases going beyond 500G in size. With AntsDB, it scales as good as HBase
(2) MySQL has a poorly implemented query optimizer. It struggles with complex queries on large tables. People usually build a data warehouse and ETL for the complex stuff. However with AntsDB, you have both the same time. Real-time queries using AntsDB and complex queries using Spark/MR
(3) Cost saving. With AntsDB, you will have a single database with scalability, real-time analytics, big data analytics, relational queries and non-relational data processing. It saves lots of money and time than doing the conventional way - front-end MySQL, ETL and Hadoop.

I think the problem you encountered is coming from one of the limitation of AntsDB at this stage. Ideally, AntsDB should evaluate the query, if it is can be done efficiently in AntsDB, do it in AntsDB, otherwise do it in Spark. But unfortunately we don't have this capability yet. It will require the user to decide if it is better to use Spark or not. In your case, you have 19m rows, without a good WHERE clause, it would be much better to do it in Spark.

HTH