hugheylab/pmparser

xml_filename column in pmid_status table created as varchar(1)

Closed this issue · 3 comments

I'm attempting to create MySQL tables with the following:

library(pmparser)
library(RMariaDB)
modifyPubmedDb(
  localDir = ".", 
  dbname = "pubmed22", 
  dbtype = c("mysql"),
  nFiles = Inf,
  retry = TRUE,
  nCitations = Inf,
  mode = c("create"),
  # database connection details
  user = "root",
  password = "password",
  host = "127.0.0.1", 
  port = "3306"
)

But I noticed the following error in the log file is being thrown for every xml file:
datetime xml_filename step status message
2022-08-21T02:11:02.252725Z all start 0
2022-08-21T02:11:02.259140Z pubmed22n0001.xml.gz start 0
2022-08-21T02:11:06.295477Z pubmed22n0001.xml.gz read_xml 0
2022-08-21T02:11:08.865211Z pubmed22n0001.xml.gz pmid_status 1 Error: Data too long for column 'xml_filename' at row 1

It looks liek the issue is that the xml_filename column is being created as a varchar(1) rather than something like varchar(50) or varchar(100).

Oh, interesting. I guess we've tested the package most extensively and recently on postgres and sqlite. Does it work for you with either of those, or with MariaDB?

I'll look into fixing this issue. I'm surprised because the package is just using DBI::dbCreateTable() regardless of db type.

You're exactly right. I've got it running in the background using postgres right now and it's chugging through the baseline xml files without issue so far. So, it does seem to be specific to MySQL.

Ok, well that's good. Worst case, this might have to stay a known issue with MySQL. Yet another reason to prefer Postgres, in my opinion.