mjwestgate/revtools

.bib still failing - consider some other method of reading bibfiles

Opened this issue · 6 comments

(Hi @mjwestgate - rather than reopen issue #2 I thought I'd start a new one, reopen if you want)

bibfiles are tricky beasts! I wanted to try out your package for a new project, but can't get the bibfiles in. RefManageR reads them in ok as far as I can tell. Can you use that package (or a different one) to read them in, then parse from there into the format you require?

I tried using the github version, with the following bib entry:

@Article{Grigg2004-tr,
  title = {{An overview of risk-adjusted charts}},
  author = {O Grigg and V Farewell},
  journal = {Journal of the Royal Statistical Society: Series A (Statistics in
             Society)},
  volume = {167},
  number = {3},
  pages = {523--539},
  month = {aug},
  year = {2004},
  url = {http://doi.wiley.com/10.1111/j.1467-985X.2004.0apm2.x},
  issn = {0964-1998, 1467-985X},
  doi = {10.1111/j.1467-985X.2004.0apm2.x},
}

I should note, that I tried directly using revtools:::read_bib (as read_bibliography wouldn't work with one citation, I assume due to the ris/bib checking).

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C              
 [3] LC_TIME=C                  LC_COLLATE=C              
 [5] LC_MONETARY=C              LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] revtools_0.2.2     RefManageR_0.14.12

loaded via a namespace (and not attached):
 [1] NLP_0.1-10           Rcpp_0.12.11         compiler_3.4.1      
 [4] plyr_1.8.4           bindr_0.1            tools_3.4.1         
 [7] digest_0.6.12        memoise_1.1.0        lubridate_1.6.0     
[10] jsonlite_1.5         tibble_1.3.3         gtable_0.2.0        
[13] viridisLite_0.2.0    pkgconfig_2.0.1      rlang_0.1.1         
[16] bibtex_0.4.2         shiny_1.0.3          parallel_3.4.1      
[19] bindrcpp_0.2         withr_1.0.2          dplyr_0.7.1         
[22] httr_1.2.1           stringr_1.2.0        xml2_1.1.1          
[25] devtools_1.13.2      topicmodels_0.2-6    htmlwidgets_0.8     
[28] shinydashboard_0.6.1 stats4_3.4.1         ade4_1.7-6          
[31] grid_3.4.1           glue_1.1.1           data.table_1.10.4   
[34] R6_2.2.2             plotly_4.7.0         ggplot2_2.2.1       
[37] purrr_0.2.2.2        tidyr_0.6.3          magrittr_1.5        
[40] scales_0.4.1         modeltools_0.2-21    htmltools_0.3.6     
[43] assertthat_0.2.0     xtable_1.8-2         mime_0.5            
[46] colorspace_1.3-2     httpuv_1.3.5         stringi_1.1.5       
[49] lazyeval_0.2.0       munsell_0.4.3        slam_0.1-40         
[52] tm_0.7-1

Thanks Steve. You're right that this is a tough problem. I can get to this in a few days, but in the meantime, revtools::start_review_window also accepts a data.frame, so you could import using a different method and just use revtools for visualisation. The columns you would need to include in your data.frame are:

  • 'label' (a unique ID for each row)
  • 'author' (all authors in a single string, separated by ' and ')
  • 'year' (accepts numeric or character)
  • 'title'
  • 'journal'
  • 'abstract' (if available)
    Hope this helps for now - more to follow.

No rush from me, was mainly playing. I think it's a great idea, and I look forward to updates.

Thanks for the tip, I'll give it a go.

Hi Steve - this took me a while to get back to, but I've updated this so that 1. read_bibliography detects .bib files more reliably, and 2. read_bib actually functions for the (fairly basic) cases that I've tried. If you get time to check it out and find more bugs then let me know. I'm going to keep checking this over the next week or so, so I won't close this issue just yet.

Hi Martin, Thanks for great seminar yesterday and exciting package. I also encountered an error (using cran version) reading in bib files, but after seeing this issue installed latest from GH and was able to read in a bib file and start an analysis.

However, the read_bibliography function failed on another bib file I tried. This one had some custom sections and text in it. I looked into the failure and the parsing of the file via your regex expressions may have produced some unexpected results. This made me wonder: can you use the results of bibtex::read.bib and work with that? As you may know, the resulting bibentry has fields you can extract, e.g. bib[[1]]$title etc:

> str(bib[[1]])
Class 'bibentry'  hidden list of 1
 $ Bruna-2010:List of 7
  ..$ title  : chr "Scientific Journals Can Advance Tropical Biology and Conservation by Requiring Data Archiving"
  ..$ volume : chr "42"
  ..$ doi    : chr "10.1111/j.1744-7429.2010.00652.x"
  ..$ journal: chr "Biotropica"
  ..$ author :Class 'person'  hidden list of 1
  .. ..$ :List of 5
  .. .. ..$ given  : chr [1:2] "Emilio" "M."
  .. .. ..$ family : chr "Bruna"
  .. .. ..$ role   : NULL
  .. .. ..$ email  : NULL
  .. .. ..$ comment: NULL
  ..$ year   : chr "2010"
  ..$ pages  : chr "399--401"
  ..- attr(*, "bibtype")= chr "Article"
  ..- attr(*, "key")= chr "Bruna-2010"

At least then you could offload the challenge of firstly reading in a bibfile?

I think there's going to be issues no matter what method is used to read in the bibfiles...

For example, I tried each of read_bibliography, bibtex::read.bib and RefManageR::ReadBib to read in the following bibliography, and none of them could get the 'author' correct:

@MISC{biosec-act-2015,
  title  = "{Biosecurity Act 2015}",
  author = "{Department of Agriculture and Water Resources}",
  month  =  jun,
  year   =  2015,
  url    = "https://www.legislation.gov.au/Details/C2015A00061"
}

I'm currently running into this issue as well ... or a variant of it ... seems like my .bib file is pulling a function error? Error in if (any(col_n < 3)) { : missing value where TRUE/FALSE needed

That might be just due to an ugly .bib but I'm not really sure ... gonna try and just switch my data exports to .csv or .ris