gadget-framework/mfdb

Importing logbook data

Closed this issue · 8 comments

bthe commented

I think I am running into RAM issues when trying to add logbook data to our mfdb now when we link the catches to trips, vessels and tows. Whenever I attempt to add it I get:

 mfdb_import_survey(mdb,data_source = 'icelandic.logbooks',
+                    catch %>% filter(!grepl('^0-',trip)))
Error: Failed to fetch row: ERROR:  out of memory
DETAIL:  Failed on request of size 67108864.

Any suggestion on how to proceed? Should I split the data into smaller chunks?

Can you persuade a stacktrace out of it with entrace() or something similar?

Worth seeing if we can fix it rather than having to split it up, although that would help.

bthe commented

Something like this:

rlang::entrace(mfdb_import_survey(mdb,data_source = 'icelandic.logbooks',
+                    catch %>% filter(!grepl('^0-',trip))))
Error: Failed to fetch row: ERROR:  out of memory
DETAIL:  Failed on request of size 67108864.

Is there an output somewhere I should be looking at?

You need to do options(error = rlang::entrace) as in the example at the bottom, then you should get to do rlang::last_error() once it falls over.

Doing logging::setLevel('FINEST') might also be interesting, possibly more interesting, since then you'll get the SQL at the point it falls over.

bthe commented

I had a hard time replicating the error as the server was constantly crashing but here is the trace:

rlang::last_error()
<error/rlang_error>
Failed to fetch row: ERROR:  out of memory
DETAIL:  Failed on request of size 134217728.
Backtrace:
  1. mfdb::mfdb_import_survey(...)
  3. mfdb:::sanitise_col(mdb, data_in, "tow", lookup = "tow", default = c(NA))
  4. mfdb:::mfdb_fetch(...)
  5. mfdb:::mfdb_send(mdb, ..., result = "rows")
  7. RPostgres::dbSendQuery(mdb$db, query)
  8. RPostgres:::.local(conn, statement, ...)
 12. RPostgres:::result_create(conn@ptr, statement)
Run `rlang::last_trace()` to see the full context.

and

rlang::last_trace()
<error/rlang_error>
Failed to fetch row: ERROR:  out of memory
DETAIL:  Failed on request of size 134217728.
Backtrace:
     █
  1. └─mfdb::mfdb_import_survey(...)
  2.   ├─base::data.frame(...)
  3.   └─mfdb:::sanitise_col(mdb, data_in, "tow", lookup = "tow", default = c(NA))
  4.     └─mfdb:::mfdb_fetch(...)
  5.       └─mfdb:::mfdb_send(mdb, ..., result = "rows")
  6.         ├─DBI::dbSendQuery(mdb$db, query)
  7.         └─RPostgres::dbSendQuery(mdb$db, query)
  8.           └─RPostgres:::.local(conn, statement, ...)
  9.             ├─methods::new(...)
 10.             │ ├─methods::initialize(value, ...)
 11.             │ └─methods::initialize(value, ...)
 12.             └─RPostgres:::result_create(conn@ptr, statement)
<error/Rcpp::exception>
Failed to fetch row: ERROR:  out of memory
DETAIL:  Failed on request of size 134217728.

and unsurprisingly the last bit that was printed out when the logging was set to 'FINEST' was the tow name SQL.

Yup, so this bit is converting the tow column into a factor with tow name and tow IDs. So the fetch is trying to get the name -> numeric ID mapping for all 6m tows. It could well be that parsing the SQL is failing, rather than the outputting.

The patch above means we don't try and filter the tows we select, which should improve matters.

bthe commented

This solved it, thanks:D

Good stuff. The next option would be I think to move the factor conversion into the database, where it'd be easier for it to happen in chunks. But that'd be a fair bit of fiddling and probably end up with lost performance in the general case.