CRDB: unable to run TPC-H
timveil opened this issue · 3 comments
timveil commented
15:29:42,503 (Worker.java:417) WARN - PSQLException thrown when executing 'Q9/09' on 'TPCHWorker<000>' [Message='An I/O error occurred while sending to the backend.', ErrorCode='0', SQLState='08006']
org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:335)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:106)
at com.oltpbenchmark.benchmarks.tpch.procedures.GenericQuery.run(GenericQuery.java:54)
at com.oltpbenchmark.benchmarks.tpch.TPCHWorker.executeWork(TPCHWorker.java:43)
at com.oltpbenchmark.api.Worker.doWork(Worker.java:386)
at com.oltpbenchmark.api.Worker.run(Worker.java:294)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at org.postgresql.core.PGStream.receiveChar(PGStream.java:308)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1952)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
... 9 more
15:29:42,504 (Worker.java:511) DEBUG - TPCHWorker<000> Q9/09 Result: UNKNOWN
15:29:42,504 (ThreadBench.java:769) ERROR - Unexpected fatal, error in 'TPCHWorker<000>' when executing 'Q9/09' [COCKROACHDB]
java.lang.RuntimeException: Unexpected fatal, error in 'TPCHWorker<000>' when executing 'Q9/09' [COCKROACHDB]
at com.oltpbenchmark.api.Worker.doWork(Worker.java:545)
at com.oltpbenchmark.api.Worker.run(Worker.java:294)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:783)
at org.postgresql.jdbc.PgConnection.rollback(PgConnection.java:791)
at com.oltpbenchmark.api.Worker.doWork(Worker.java:428)
... 2 more
timveil commented
this is the problematic query. executing it over CLI and it runs for minutes, does not finish.
SELECT
nation, o_year, sum(amount) AS sum_profit
FROM
(
SELECT
n_name AS nation,
extract('year', o_orderdate) AS o_year,
l_extendedprice * (1 - l_discount)
- ps_supplycost * l_quantity
AS amount
FROM
oltpbench.part,
oltpbench.supplier,
oltpbench.lineitem,
oltpbench.partsupp,
oltpbench.orders,
oltpbench.nation
WHERE
s_suppkey = l_suppkey
AND ps_suppkey = l_suppkey
AND ps_partkey = l_partkey
AND p_partkey = l_partkey
AND o_orderkey = l_orderkey
AND s_nationkey = n_nationkey
AND p_name LIKE '%olive%'
)
AS profit
GROUP BY
nation, o_year
ORDER BY
nation, o_year DESC
explain without tpch indexes...
tree | field | description
+----------------------------------------------+-------------+---------------------------------------------------------------------------+
sort | |
│ | order | +nation,-o_year
└── group | |
│ | aggregate 0 | n_name
│ | aggregate 1 | o_year
│ | aggregate 2 | sum(amount)
│ | group by | @3,@1
└── render | |
└── join | |
│ | type | inner
│ | equality | (l_partkey) = (p_partkey)
├── join | |
│ │ | type | inner
│ │ | equality | (l_suppkey, n_nationkey) = (s_suppkey, s_nationkey)
│ ├── join | |
│ │ │ | type | inner
│ │ │ | equality | (ps_suppkey, ps_partkey, o_orderkey) = (l_suppkey, l_partkey, l_orderkey)
│ │ ├── join | |
│ │ │ │ | type | cross
│ │ │ ├── join | |
│ │ │ │ │ | type | cross
│ │ │ │ ├── scan | |
│ │ │ │ │ | table | orders@primary
│ │ │ │ │ | spans | ALL
│ │ │ │ └── scan | |
│ │ │ │ | table | nation@primary
│ │ │ │ | spans | ALL
│ │ │ └── scan | |
│ │ │ | table | partsupp@primary
│ │ │ | spans | ALL
│ │ └── scan | |
│ │ | table | lineitem@primary
│ │ | spans | ALL
│ └── scan | |
│ | table | supplier@primary
│ | spans | ALL
└── scan | |
| table | part@primary
| spans | ALL
| filter | p_name LIKE '%olive%'
(40 rows)
explain after indexes added...
tree | field | description
+--------------------------------------------------------+-------------+---------------------------------------------------------------------------+
sort | |
│ | order | +nation,-o_year
└── group | |
│ | aggregate 0 | n_name
│ | aggregate 1 | o_year
│ | aggregate 2 | sum(amount)
│ | group by | @3,@1
└── render | |
└── render | |
└── lookup-join | |
│ | type | inner
│ | pred | @18 LIKE '%olive%'
├── lookup-join | |
│ │ | type | inner
│ ├── join | |
│ │ │ | type | inner
│ │ │ | equality | (l_suppkey, n_nationkey) = (s_suppkey, s_nationkey)
│ │ ├── join | |
│ │ │ │ | type | inner
│ │ │ │ | equality | (ps_suppkey, ps_partkey, o_orderkey) = (l_suppkey, l_partkey, l_orderkey)
│ │ │ ├── join | |
│ │ │ │ │ | type | cross
│ │ │ │ ├── join | |
│ │ │ │ │ │ | type | cross
│ │ │ │ │ ├── scan | |
│ │ │ │ │ │ | table | orders@primary
│ │ │ │ │ │ | spans | ALL
│ │ │ │ │ └── scan | |
│ │ │ │ │ | table | nation@primary
│ │ │ │ │ | spans | ALL
│ │ │ │ └── scan | |
│ │ │ │ | table | partsupp@primary
│ │ │ │ | spans | ALL
│ │ │ └── scan | |
│ │ │ | table | lineitem@primary
│ │ │ | spans | ALL
│ │ └── scan | |
│ │ | table | supplier@primary
│ │ | spans | ALL
│ └── scan | |
│ | table | part@p_pk
└── scan | |
| table | part@primary
(43 rows)
timveil commented
the closed/broken connection is actually caused by a 1m
timeout configured in haproxy. that again means the underlying issue is query duration exacerbated by overly aggressive timeout in haproxy
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:783)
at org.postgresql.jdbc.PgConnection.rollback(PgConnection.java:791)
at com.oltpbenchmark.api.Worker.doWork(Worker.java:428)
... 2 more
timveil commented
resolved in v2.2.0-alpha
Also made the following configuration changes:
- increased
haproxy
timeout settings to1h
to be safe - set
-max-sql-memory=4GB
on each node