probcomp/bayeslite

Loom has slow performance in `_store_kind_partition` after estimating CrossCat model

versar opened this issue · 2 comments

After estimating the CrossCat partition, the implementation of loom in bayeslite stores the CrossCat views, and for each view the row partitions, into a BQL table. This is implemented in a nested loop (over models, over views, and over rows) leading to a high number of SQL insertions.

for kind_id in row_partition.keys():
for rowid, partition_id in zip(
range(1, len(row_partition[kind_id])+1),
row_partition[kind_id]):
bdb.sql_execute('''
INSERT OR REPLACE INTO bayesdb_loom_row_kind_partition
(generator_id, modelno, rowid, kind_id, partition_id)
VALUES (?, ?, ?, ?, ?)
''', (generator_id, modelno, rowid, kind_id, partition_id))

Some possibilities for addressing this issue include:

  • constructing a Pandas dataframe and using df.to_sql to store the dataframe into the .bdb object
  • construct a big string for one bulk SQL INSERT
  • add query types that exist in bayeslite but not Loom to the Loom query server
fsaad commented

Also consider wrapping inserts in a transaction.

Fixed by @fsaad using a bulk INSERT operation.