replikativ/datahike

[Bug]: delete-database throws NPE for :file backend

Opened this issue · 8 comments

jjttjj commented

What version of Datahike are you using?

io.replikativ/datahike {:mvn/version "0.6.1539"}

What version of Java are you using?

17

What operating system are you using?

Ubuntu

What database EDN configuration are you using?

(def cfg {:store              {:backend :file
                                :id "db"
                                :path    "datahike-cfg"}
           :name               "db"
           :schema-flexibility :write
           :keep-history?      true
           :attribute-refs?    true})

Describe the bug

delete-database throws a null pointer exception. This seems to happen regardless of if the database was previously created or not, or transacted against and used.

{
 :cause nil
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "java.lang.NullPointerException"
   :data {}
   :at [superv.async$throw_if_exception_ invokeStatic "async.cljc" 93]}
  {:type java.lang.NullPointerException
   :message nil
   :at [java.util.Objects requireNonNull "Objects.java" 208]}]
 :trace
 [[java.util.Objects requireNonNull "Objects.java" 208]
  [sun.nio.fs.UnixFileSystem getPath "UnixFileSystem.java" 263]
  [konserve.filestore$sync_base invokeStatic "filestore.clj" 34]
  [konserve.filestore$sync_base invoke "filestore.clj" 31]
  [konserve.filestore$delete_store invokeStatic "filestore.clj" 60]
  [konserve.filestore$delete_store invoke "filestore.clj" 51]
  [datahike.store$eval47429$fn__47431 invoke "store.cljc" 116]
  [clojure.lang.MultiFn invoke "MultiFn.java" 229]
  [datahike.writing$eval52657$fn__52666 invoke "writing.cljc" 207]
  [datahike.writing$eval52608$fn__52609$G__52597__52614 invoke "writing.cljc" 136]
  [datahike.writing$delete_database invokeStatic "writing.cljc" 223]
  [datahike.writing$delete_database invoke "writing.cljc" 217]
  [clojure.lang.AFn applyToHelper "AFn.java" 154]
  [clojure.lang.AFn applyTo "AFn.java" 144]
  [clojure.core$apply invokeStatic "core.clj" 667]
  [clojure.core$apply invoke "core.clj" 662]
  [datahike.writer$eval52982$fn__52983 doInvoke "writer.cljc" 99]
  [clojure.lang.RestFn invoke "RestFn.java" 408]
  [clojure.lang.MultiFn invoke "MultiFn.java" 229]
  [clojure.lang.AFn applyToHelper "AFn.java" 154]
  [clojure.lang.AFn applyTo "AFn.java" 144]
  [clojure.core$apply invokeStatic "core.clj" 667]
  [clojure.core$apply invoke "core.clj" 662]
  [datahike.api$delete_database invokeStatic "api.cljc" 130]
  [datahike.api$delete_database doInvoke "api.cljc" 129]
  [clojure.lang.RestFn invoke "RestFn.java" 408]
...

What is the expected behaviour?

The database should be deleted,

How can the behaviour be reproduced?


  (def cfg {:store              {:backend :file
                                  :id "db"
                                  :path    "datahike-cfg"}
             :name               "db"
             :schema-flexibility :write
             :keep-history?      true
             :attribute-refs?    true})

  (d/create-database cfg)
  (d/delete-database cfg) ;; NPE 
whilo commented

As discussed on Slack, I think it is best for us to require absolute paths in general, since we want to be able to address stores globally if possible. Currently we infer the hostname for the file store backend automatically, but we could also require to set it manually to fully identify the filesystem. That might be a bit less comfortable and still leaves room to use local hostnames that are not uniquely globally identified.

Sorry that this happened. There should be at least a proper exception message that describes the problem and some documentation.

whilo commented

We just need to check whether the user has provided an absolute path to the file store and otherwise throw an informative error message.

I think it would be nicer though to accept relative paths as this is what I would expect and apparently I am not alone with this. Btw, ./store-dir works already for deletion, but even if we leave it like this, we should check the path on store creation.

So I would say, we have following options:

  1. Disallow creation of relative paths
  2. Accept relative paths prefixed with ./
  3. Accept relative paths generally
  4. Accept relative paths but throw a warning if a relative path is being used
    • in this case possibly provide a way to silence the warning, e.g. with a dynamic var or config option
  5. Have different store creation functions for stores that are save for distributed use which only accept absolute paths and simple stores that accept relative paths

None of the solutions above require much work, it is merely a design decision.

Which one would you prefer?

I don't get why we need absolute paths to globally address stores. But I would expect to be able to use relative paths.

So I tend to option 3 because relative paths for java.io.File are not prefixed.

I wouldn't mind only using absolute paths with a proper error message though I currently don't see an easy way to know if a path is absolute or not. We would have to check if the first char of the path is either a slash or a letter and if it's a letter then it needs to be followed by a colon. And what if someone wants to use a network drive like a nas or something, is that possible?

But maybe you've got a better way to solve that depending on the implementation details of the filestore.

The filestore simply uses the Java classes File and Path, so there is actually a method to identify relative paths, but it is just as simple getting the absolute path from a relative path. The big question, which we talked about today, is where it would be best to perform the translation.

whilo commented

The problem is that the database stores the store configuration in its store and it identifies itself through the path for the filestore backend (see store-identity). Relative paths are not uniquely determined even on the same machine and hence can not be disambiguated unless the path is resolved first. This can create problems now if you try to connect to the relative path database from another relative or absolute path configuration. A simple solution is what @jsmassa described, to replace the relative path in the config before Datahike operates on it.

In general we should aim for having DB configurations that allow to connect from anywhere to the database if you have credentials, then we can store the config in Datahike itself and run recursive queries with Datalog over the databases.

whilo commented

The filestore simply uses the Java classes File and Path, so there is actually a method to identify relative paths, but it is just as simple getting the absolute path from a relative path. The big question, which we talked about today, is where it would be best to perform the translation.

I think it is best to resolve the paths when the config is loaded, e.g. in load-config. konserve can still support relative paths as is, but Datahike should internally operate on fully resolved configurations.