sherrillmix/taxonomizr

error: 'Query and SQL mismatch'

Closed this issue · 6 comments

Hi,

This is a follow-up from issue no.43 (get taxID for a huge list of accession numbers #43), my apologies for the delayed reply. Thank you for the prompt response, it was very helpful. I am very new to coding and R itself, so still trying to get a hang of it.

I managed to execute the cmd line (taxaId<-accessionToTaxa(myCsv[,1],"accessionTaxa.sql") for my .csv file however, i ended up with an error message which says 'Query and SQL mismatch'. Not sure where I'm going wrong. I've attached a file with first 10 lines of my .csv file.
accesion_first10.csv

Thank you so much.

Hmm. That error is more of a sanity check to make sure nothing funny is going on internally inside the package. You shouldn't really be able to generate that yourself so it seems like something odd is going on with how your data is being processed and could indicate some failure of the package code. Unfortunately, I can't seem to replicate it, at least with the .csv you included:

> myCsv<-read.csv('accesion_first10.csv',header=FALSE)
> taxaId<-accessionToTaxa(myCsv[,1],"accessionTaxa.sql")
> taxaId
 [1]  39947    573    573 375757    738    680 442491 442491   4498 112509

Do you get an error when running your code on that subset of the data? Or does the error only arise on the full dataset? If the latter then it'd be helpful to look at that or a subset that does generate the error.

Also it might be useful to post the results of sessionInfo() to start narrowing down what OS and R version we're looking at.

I get the error message when i run the code with the same subset of data (accesion_first10.csv) as well as the full dataset. And here's my output to sessionInfo()

sessionInfo()
Screenshot from 2022-06-21 12-22-23

Ubuntu, an up to date to RSQLite and the taxonomizr version on CRAN. Seems like everything should be good there. Oh the R version is old. Old R read in strings in a csv as factors and it was a constant pain. I bet that's it. Yep:

> taxonomizr::accessionToTaxa(myCsv[,1],"accessionTaxa.sql")
 [1]  39947    573    573 375757    738    680 442491 442491   4498 112509
> taxonomizr::accessionToTaxa(as.factor(myCsv[,1]),"accessionTaxa.sql")
Error: Query and SQL mismatch

I should fix that on my side by adding an as.character to the function to force factors into normal strings. I'll get that up on github in the next couple days and CRAN in a week or so.

For now, you could read in with stringsAsFactors=FALSE:

myCsv<-read.csv('accesion_first10.csv',header=FALSE,stringsAsFactors=FALSE)

or convert the factor to character when you pass it in:

taxaId<-accessionToTaxa(as.character(myCsv[,1]),"accessionTaxa.sql")

This should be fixed on the github version of the package now.

> devtools::install_github('sherrillmix/taxonomizr')
> taxonomizr::accessionToTaxa(as.factor(myCsv[,1]),"accessionTaxa.sql")
 [1]  39947    573    573 375757    738    680 442491 442491   4498 112509

I'll try to submit to CRAN in the next couple days. Thanks for catching the bug and let me know if you have any other difficulties.

I managed to run in it now. Thanks a lot for fixing out the error, it worked like a charm.

Glad to hear it's working well for you. Thanks for catching that bug. The updated version should be up on CRAN now. Good luck with your research.