replikativ/datahike

max-tx not correct after import-db

Closed this issue · 5 comments

Hi!
Import-db works in batches of 1000 datoms.
It seems that after import max-tx is tx0 incremented by the number of batches and not the biggest imported tx.
This means that new transactions reuse old tx ids.
This conceptually merges old transactions with new ones and messes up history.

As far as I can see the db is a record and :max-tx is simply a field on this record that can be manipulated on import.
Please tell me if I am wrong with this assumption.

The solution I am cautiously trying for the time being is this:

(defn export-db
  "Export the database in a flat-file of datoms at path."
  [db path]
  (with-open [f (io/output-stream path)
              w (io/writer f)]
    (binding [*out* w]
      (doseq [d (datahike.db/-datoms db :eavt [])]
        (prn d)))))

(defn update-max-tx
  "Find biggest tx in index and update max-tx."
  [db]
  (let [max-tx (reduce #(max %1 (nth %2 3))
                       0
                       (api/datoms db :eavt))]
    (assoc db :max-tx max-tx)))

(defn import-db
  "Import a flat-file of datoms at path into your database."
  [conn path]
  (time
   (doseq  [datoms (->> (line-seq (io/reader path))
                        (map read-string)
                        (partition 1000 1000 nil))]
     (api/transact conn (vec datoms))))
  (swap! conn update-max-tx))

Please tell me if I am missing something and doing something stupid here.

The import itself already overwrites existing txs. max-tx will have to be set before the import!

This seems to work correctly:

(defn update-max-tx-from-file
  "Find bigest tx in file and update max-tx of db.
  Note: the last tx might not be the biggest if the db
  has been imported before."
  [db file]
  (let [max-tx (->> (line-seq (io/reader file))
                    (map read-string)
                    (reduce #(max %1 (nth %2 3)) 0))]
    (assoc db :max-tx max-tx)))

(defn import-db
  "Import a flat-file of datoms at path into your database."
  [conn path]
  (swap! conn update-max-tx-from-file path)
  (time
   (doseq  [datoms (->> (line-seq (io/reader path))
                        (map read-string)
                        (partition 1000 1000 nil))]
     (api/transact conn (vec datoms)))))

Thanks for your time on the import/export @markusalbertgraf! Could you create a PR for that? For the next versions we are working on a more extensive migration tool for the CLI that supports version and backend migrations in Datahike. With your permission we would re-use your code there as well.

Great!
Feel free to use my code.
I created a PR.

Thank you!