[MySQL] Retaining a primary key in a scramble
Closed this issue · 34 comments
When creating a scramble, VerdictDB does not keep the primary key information of the original tables. This makes the MySQL engine chooses an inferior join plan when a query involves joins.
Please refer to the following page for the syntax:
https://dev.mysql.com/doc/refman/5.6/en/create-table-select.html
Please add a new method getPrimaryKey()
in DbmsConnection.java; and use that method for retrieving the primary key information in the scrambling plan.
I have the same issue when i running queries with join, using MySQL in verdictdb-tutorial.
the issue is solved ?
No, the code may need to be cleaned and tested more.
Ok, I really wanted to test the join with VerdictDB today.
Then, let me quickly test git fetch origin pull/314/head:joezhong-fix-312
without merging it into master yet.
Thank you very much
According to my tests, the code is not optimized for mysql, so it was even slower. I will request a fix and test again after the fix.
I saw that there is a new code, but it has not been tested yet.
A question, does this error also happen when we use VerdictDB in PostgreSQL?
I can affect PostgreSQL as well.
However, there is no problem for SQL-on-Hadoop databases since they don't have the primary key.
I have confirmed that the code in this pull request reduces the latency for MySQL join queries. For now, the code is not merged until all tests pass. The test errors seem to be due to CircleCI's issues.
Ok, good :) so the code still can not be used today ?
@solangepaz You can still use the code as follows:
git pull
git checkout joezhong-fix-312
mvn -DskipTests -DtestPhase=false -DdeployPhase=false clean install
And change VerdictDB's version to "0.5.7-SNAPSHOT" in your pom file.
Does this is valid in the verdictdb tutorial?
If you look at the tutorial directory, there is pom.xml, which contains verdictdb as one of its dependencies.
The version in the former should be set to "0.5.7-SNAPSHOT". Note that the version includes snapshot.
The bottom one is the tutorial's own version. Conceptually, the tutorial is simply another project that relies on "verdictdb-core".
I did what you said, but I still get worse results with VerdictDB.
Please "mvn package" again and re-create scrambles if you haven't.
This is the latencies I see for TPCH 1G.
>>> verdict.sql('show tables')
10 row(s) in the result (0.016 seconds)
table
0 customer
1 lineitem
2 lineitem_x
3 nation
4 orders
5 orders_x
6 part
7 partsupp
8 region
9 supplier
>>>
>>>
>>> verdict.sql("""
... bypass
... SELECT SUM(l_extendedprice * (1 - l_discount))
... FROM customer, orders, lineitem
... WHERE c_mktsegment = 'BUILDING'
... AND c_custkey = o_custkey
... AND l_orderkey = o_orderkey
... """)
1 row(s) in the result (1.0 mins 24.035 seconds)
SUM(l_extendedprice * (1 - l_discount))
0 44141243552.3543
>>>
>>>
>>> verdict.sql("""
... SELECT SUM(l_extendedprice * (1 - l_discount))
... FROM customer, orders, lineitem
... WHERE c_mktsegment = 'BUILDING'
... AND c_custkey = o_custkey
... AND l_orderkey = o_orderkey
... """)
12:05:26.433 [Thread-2] INFO o.v.sqlreader.ScrambleTableReplacer - Automatic table replacement: tpch1gtest.orders -> tpch1gtest.orders_x
12:05:26.434 [Thread-2] INFO o.v.sqlreader.ScrambleTableReplacer - Automatic table replacement: tpch1gtest.lineitem -> tpch1gtest.lineitem_x
1 row(s) in the result (35.784 seconds)
s4
0 44162870008.78130
>>>
I made this again, but this query has been running for 5 minutes and not show a result.
I have this in my pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.verdictdb</groupId>
<artifactId>verdictdb-tutorial</artifactId>
<version>0.5.7-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
</properties>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<archive>
<manifest>
<mainClass>VerdictTutorialMain</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.verdictdb</groupId>
<artifactId>verdictdb-core</artifactId>
<version>0.5.6</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.46</version>
</dependency>
</dependencies>
</project>
and this in my run.sh:
#!/bin/bash
java -cp ./target/verdictdb-tutorial-0.5.7-SNAPSHOT-jar-with-dependencies.jar VerdictTutorialMain $@
Is the query blocked on the step of creating scramble tables? I have encountered this problem before. Enlarging MySQL buffer pool size would solve this problem.
The query creates the scramble tables, but does not show a result after
The following should be used in your pom:
<dependency>
<groupId>org.verdictdb</groupId>
<artifactId>verdictdb-core</artifactId>
<version>0.5.7-SNAPSHOT</version>
</dependency>
The version of verdictdb-tutorial does not matter.
I have this error when i used that in my pom:
org.verdictdb.exception.VerdictDBDbmsException: Unknown database 'test'
at org.verdictdb.connection.JdbcConnection.setDefaultSchema(JdbcConnection.java:389)
at org.verdictdb.connection.JdbcConnection.copy(JdbcConnection.java:413)
at org.verdictdb.connection.CachedDbmsConnection.copy(CachedDbmsConnection.java:73)
at org.verdictdb.VerdictContext.createNewExecutionContext(VerdictContext.java:319)
at org.verdictdb.jdbc41.VerdictStatement.<init>(VerdictStatement.java:44)
at org.verdictdb.jdbc41.VerdictConnection.createStatement(VerdictConnection.java:113)
at VerdictTutorialMain.createScrambleTable(VerdictTutorialMain.java:108)
at VerdictTutorialMain.main(VerdictTutorialMain.java:70)
Creating a scrambled table for lineitem...
Exception in thread "main" java.lang.NullPointerException
at org.verdictdb.jdbc41.VerdictStatement.execute(VerdictStatement.java:109)
at VerdictTutorialMain.createScrambleTable(VerdictTutorialMain.java:118)
at VerdictTutorialMain.main(VerdictTutorialMain.java:70)
and my MySQL databases are:
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| tpch1g |
| verdictdbmeta |
| verdictdbtemp |
+--------------------+
I am working on the issue. The fix is there, but I need to merge branches somehow.
Meanwhile, you can just use full table names like "tpch1g.lineitem" in queries.
I've tried this befoI had tried this before and the error is the same, but if I do not use version 0.5.7-SNAPSHOT this error disappears when creating the scramble table.
Sorry for the inconvenience, but doing the following should resolve the issue. FYI, in the tutorial, I have added "run2" for the join query.
In "verdictdb" directory:
git pull
git checkout yongjoo
mvn -DskipTests -DtestPhase=false -DdeployPhase=false clean install
In "verdictdb-tutorial" directory
git pull
Change the version on line 48 of pom.xml to "0.5.7-SNAPSHOT" (as you did before). Then,
mvn package
./run.sh localhost 3306 tpch1g create
./run.sh localhost 3306 tpch1g run
./run.sh localhost 3306 tpch1g run2
Below is what I got
pyongjoo@umich:verdictdb_on_mysql $ ./run.sh localhost 3306 tpch1gtest create
Creating a scrambled table for lineitem...
14:54:36.515 [main] INFO o.v.c.ScramblingCoordinator - Starts to create a new UNIFORM scramble tpch1gtest.lineitem_scramble from tpch1gtest.lineitem
14:54:36.516 [main] INFO o.v.c.ScramblingCoordinator - Method: UNIFORM
14:54:36.517 [main] INFO o.v.c.ScramblingCoordinator - Relative size: 1.000000 (or equivalently, 100.0000 %)
14:57:24.241 [main] INFO o.v.c.ScramblingCoordinator - Finished creating tpch1gtest.lineitem_scramble
Scrambled table for lineitem has been created.
Time Taken = 168 s
Creating a scrambled table for orders...
14:57:24.330 [main] INFO o.v.c.ScramblingCoordinator - Starts to create a new UNIFORM scramble tpch1gtest.orders_scramble from tpch1gtest.orders
14:57:24.331 [main] INFO o.v.c.ScramblingCoordinator - Method: UNIFORM
14:57:24.331 [main] INFO o.v.c.ScramblingCoordinator - Relative size: 1.000000 (or equivalently, 100.0000 %)
14:57:54.467 [main] INFO o.v.c.ScramblingCoordinator - Finished creating tpch1gtest.orders_scramble
Scrambled table for orders has been created.
Time Taken = 30 s
pyongjoo@umich:verdictdb_on_mysql $
pyongjoo@umich:verdictdb_on_mysql $
pyongjoo@umich:verdictdb_on_mysql $ ./run.sh localhost 3306 tpch1gtest run
Without VerdictDB: average(l_extendedprice) = 38255.138485
Time Taken = 62 s
15:01:16.167 [main] INFO o.v.sqlreader.ScrambleTableReplacer - Automatic table replacement: tpch1gtest.lineitem -> tpch1gtest.lineitem_scramble
With VerdictDB: average(l_extendedprice) = 38228.40949764581
Time Taken = 19 s
pyongjoo@umich:verdictdb_on_mysql $
pyongjoo@umich:verdictdb_on_mysql $
pyongjoo@umich:verdictdb_on_mysql $ ./run.sh localhost 3306 tpch1gtest run2
Without VerdictDB: SUM(l_extendedprice * (1 - l_discount)) = 4.41412435523543E10
Time Taken = 85 s
15:03:11.029 [main] INFO o.v.sqlreader.ScrambleTableReplacer - Automatic table replacement: tpch1gtest.orders -> tpch1gtest.orders_scramble
15:03:11.030 [main] INFO o.v.sqlreader.ScrambleTableReplacer - Automatic table replacement: tpch1gtest.lineitem -> tpch1gtest.lineitem_scramble
With VerdictDB: SUM(l_extendedprice * (1 - l_discount)) = 4.41249582491536E10
Time Taken = 33 s
Now i have this error, when i run ./run.sh localhost 3306 tpch1g create:
Creating a scrambled table for lineitem... 20:50:57.727 [main] INFO o.v.c.ScramblingCoordinator - Starts to create a new UNIFORM scramble tpch1g.lineitem_scramble from tpch1g.lineitem 20:50:57.728 [main] INFO o.v.c.ScramblingCoordinator - Method: UNIFORM 20:50:57.729 [main] INFO o.v.c.ScramblingCoordinator - Relative size: 1.000000 (or equivalently, 100.0000 %) org.verdictdb.exception.VerdictDBDbmsException: Issued the following query: create table
tpch1g.
lineitem_scramble (PRIMARY KEY (
L_ORDERKEY,
L_PARTKEY,
verdictdbblock)) partition by list columns (
verdictdbblock) (partition p0 values in (0), partition p1 values in (1), partition p2 values in (2), partition p3 values in (3), partition p4 values in (4), partition p5 values in (5), partition p6 values in (6)) select t.
L_ORDERKEY, t.
L_PARTKEY, t.
L_SUPPKEY, t.
L_LINENUMBER, t.
L_QUANTITY, t.
L_EXTENDEDPRICE, t.
L_DISCOUNT, t.
L_TAX, t.
L_RETURNFLAG, t.
L_LINESTATUS, t.
L_SHIPDATE, t.
L_COMMITDATE, t.
L_RECEIPTDATE, t.
L_SHIPINSTRUCT, t.
L_SHIPMODE, t.
L_COMMENT, 0 as
verdictdbtier, cast(floor(rand() * 7) as unsigned) as
verdictdbblockfrom
tpch1g.
lineitemas t Duplicate entry '2080546-112379-0' for key 'PRIMARY' at org.verdictdb.connection.JdbcConnection.executeSingle(JdbcConnection.java:210) at org.verdictdb.connection.JdbcConnection.execute(JdbcConnection.java:170) at org.verdictdb.connection.CachedDbmsConnection.execute(CachedDbmsConnection.java:49) at org.verdictdb.core.execplan.ExecutableNodeRunner.execute(ExecutableNodeRunner.java:424) at org.verdictdb.core.execplan.ExecutableNodeRunner.executeAndBroadcast(ExecutableNodeRunner.java:407) at org.verdictdb.core.execplan.ExecutableNodeRunner.run(ExecutableNodeRunner.java:335) at java.lang.Thread.run(Thread.java:748) Exception in thread "main" java.lang.RuntimeException: org.verdictdb.exception.VerdictDBDbmsException: Issued the following query: create table
tpch1g.
lineitem_scramble (PRIMARY KEY (
L_ORDERKEY,
L_PARTKEY,
verdictdbblock)) partition by list columns (
verdictdbblock) (partition p0 values in (0), partition p1 values in (1), partition p2 values in (2), partition p3 values in (3), partition p4 values in (4), partition p5 values in (5), partition p6 values in (6)) select t.
L_ORDERKEY, t.
L_PARTKEY, t.
L_SUPPKEY, t.
L_LINENUMBER, t.
L_QUANTITY, t.
L_EXTENDEDPRICE, t.
L_DISCOUNT, t.
L_TAX, t.
L_RETURNFLAG, t.
L_LINESTATUS, t.
L_SHIPDATE, t.
L_COMMITDATE, t.
L_RECEIPTDATE, t.
L_SHIPINSTRUCT, t.
L_SHIPMODE, t.
L_COMMENT, 0 as
verdictdbtier, cast(floor(rand() * 7) as unsigned) as
verdictdbblockfrom
tpch1g.
lineitemas t Duplicate entry '2080546-112379-0' for key 'PRIMARY' at org.verdictdb.core.resulthandler.ExecutionTokenReader.takeOne(ExecutionTokenReader.java:53) at org.verdictdb.core.resulthandler.ExecutionTokenReader.next(ExecutionTokenReader.java:83) at org.verdictdb.core.execplan.ExecutablePlanRunner.runTillEnd(ExecutablePlanRunner.java:61) at org.verdictdb.coordinator.ScramblingCoordinator.scramble(ScramblingCoordinator.java:310) at org.verdictdb.coordinator.ScramblingCoordinator.scramble(ScramblingCoordinator.java:193) at org.verdictdb.coordinator.ExecutionContext.sql(ExecutionContext.java:171) at org.verdictdb.jdbc41.VerdictStatement.execute(VerdictStatement.java:109) at VerdictTutorialMain.createScrambleTable(VerdictTutorialMain.java:179) at VerdictTutorialMain.main(VerdictTutorialMain.java:70) Caused by: org.verdictdb.exception.VerdictDBDbmsException: Issued the following query: create table
tpch1g.
lineitem_scramble (PRIMARY KEY (
L_ORDERKEY,
L_PARTKEY,
verdictdbblock)) partition by list columns (
verdictdbblock) (partition p0 values in (0), partition p1 values in (1), partition p2 values in (2), partition p3 values in (3), partition p4 values in (4), partition p5 values in (5), partition p6 values in (6)) select t.
L_ORDERKEY, t.
L_PARTKEY, t.
L_SUPPKEY, t.
L_LINENUMBER, t.
L_QUANTITY, t.
L_EXTENDEDPRICE, t.
L_DISCOUNT, t.
L_TAX, t.
L_RETURNFLAG, t.
L_LINESTATUS, t.
L_SHIPDATE, t.
L_COMMITDATE, t.
L_RECEIPTDATE, t.
L_SHIPINSTRUCT, t.
L_SHIPMODE, t.
L_COMMENT, 0 as
verdictdbtier, cast(floor(rand() * 7) as unsigned) as
verdictdbblockfrom
tpch1g.
lineitemas t Duplicate entry '2080546-112379-0' for key 'PRIMARY' at org.verdictdb.connection.JdbcConnection.executeSingle(JdbcConnection.java:210) at org.verdictdb.connection.JdbcConnection.execute(JdbcConnection.java:170) at org.verdictdb.connection.CachedDbmsConnection.execute(CachedDbmsConnection.java:49) at org.verdictdb.core.execplan.ExecutableNodeRunner.execute(ExecutableNodeRunner.java:424) at org.verdictdb.core.execplan.ExecutableNodeRunner.executeAndBroadcast(ExecutableNodeRunner.java:407) at org.verdictdb.core.execplan.ExecutableNodeRunner.run(ExecutableNodeRunner.java:335) at java.lang.Thread.run(Thread.java:748)
Not is more simple run the verdictDB in python ? If yes, what changes do I have to make to use 0.5.7-SNAPSHOT ?
Hmm. I think either your lineitem table had declared (L_ORDERKEY, L_PARTKEY) as a primary key or VerdictDB's primary key retrieval is not correctly working for your mysql version... I think the latter is the case.
Can you tell me your MySQL version?
@Beastjoe Please find a docker image that matches the version of @solangepaz, and check the correctness. Btw, if you added a unit test, I think it should be checked with MySQL 5.5 by CircleCI.
@solangepaz I just uploaded a latest version of pyverdict to pypi, which you can install as follows:
pip install pyverdict --upgrade
Please see https://verdictdb.org/documentation/quick_start/ for some examples. (I was working on documentation, but haven't finished it yet.) The only thing is that it requires Python 3.7 (but installing it is easy with miniconda).
I have MySQL in version 5.7.24
The error of primary key was from my table creation and not from the mysql version.
Now i try VerdictDB with python. I can create a scramble table and obtain a result, but a result of query is very differente of the original result.
For example, for this, i have a result 44141243552.3543:
verdict.sql(""" bypass SELECT SUM(l_extendedprice * (1 - l_discount)) FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey """)
but if I run the same query in MySQL, i have a result 53741292684.6040
It’s probably referring to a different schema or so. If bypass is specified, verdict really doesn’t do anything.