.bib still failing - consider some other method of reading bibfiles
Opened this issue · 6 comments
(Hi @mjwestgate - rather than reopen issue #2 I thought I'd start a new one, reopen if you want)
bibfiles are tricky beasts! I wanted to try out your package for a new project, but can't get the bibfiles in. RefManageR
reads them in ok as far as I can tell. Can you use that package (or a different one) to read them in, then parse from there into the format you require?
I tried using the github version, with the following bib entry:
@Article{Grigg2004-tr,
title = {{An overview of risk-adjusted charts}},
author = {O Grigg and V Farewell},
journal = {Journal of the Royal Statistical Society: Series A (Statistics in
Society)},
volume = {167},
number = {3},
pages = {523--539},
month = {aug},
year = {2004},
url = {http://doi.wiley.com/10.1111/j.1467-985X.2004.0apm2.x},
issn = {0964-1998, 1467-985X},
doi = {10.1111/j.1467-985X.2004.0apm2.x},
}
I should note, that I tried directly using revtools:::read_bib
(as read_bibliography
wouldn't work with one citation, I assume due to the ris/bib checking).
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=C LC_COLLATE=C
[5] LC_MONETARY=C LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] revtools_0.2.2 RefManageR_0.14.12
loaded via a namespace (and not attached):
[1] NLP_0.1-10 Rcpp_0.12.11 compiler_3.4.1
[4] plyr_1.8.4 bindr_0.1 tools_3.4.1
[7] digest_0.6.12 memoise_1.1.0 lubridate_1.6.0
[10] jsonlite_1.5 tibble_1.3.3 gtable_0.2.0
[13] viridisLite_0.2.0 pkgconfig_2.0.1 rlang_0.1.1
[16] bibtex_0.4.2 shiny_1.0.3 parallel_3.4.1
[19] bindrcpp_0.2 withr_1.0.2 dplyr_0.7.1
[22] httr_1.2.1 stringr_1.2.0 xml2_1.1.1
[25] devtools_1.13.2 topicmodels_0.2-6 htmlwidgets_0.8
[28] shinydashboard_0.6.1 stats4_3.4.1 ade4_1.7-6
[31] grid_3.4.1 glue_1.1.1 data.table_1.10.4
[34] R6_2.2.2 plotly_4.7.0 ggplot2_2.2.1
[37] purrr_0.2.2.2 tidyr_0.6.3 magrittr_1.5
[40] scales_0.4.1 modeltools_0.2-21 htmltools_0.3.6
[43] assertthat_0.2.0 xtable_1.8-2 mime_0.5
[46] colorspace_1.3-2 httpuv_1.3.5 stringi_1.1.5
[49] lazyeval_0.2.0 munsell_0.4.3 slam_0.1-40
[52] tm_0.7-1
Thanks Steve. You're right that this is a tough problem. I can get to this in a few days, but in the meantime, revtools::start_review_window also accepts a data.frame, so you could import using a different method and just use revtools for visualisation. The columns you would need to include in your data.frame are:
- 'label' (a unique ID for each row)
- 'author' (all authors in a single string, separated by ' and ')
- 'year' (accepts numeric or character)
- 'title'
- 'journal'
- 'abstract' (if available)
Hope this helps for now - more to follow.
No rush from me, was mainly playing. I think it's a great idea, and I look forward to updates.
Thanks for the tip, I'll give it a go.
Hi Steve - this took me a while to get back to, but I've updated this so that 1. read_bibliography detects .bib files more reliably, and 2. read_bib actually functions for the (fairly basic) cases that I've tried. If you get time to check it out and find more bugs then let me know. I'm going to keep checking this over the next week or so, so I won't close this issue just yet.
Hi Martin, Thanks for great seminar yesterday and exciting package. I also encountered an error (using cran version) reading in bib files, but after seeing this issue installed latest from GH and was able to read in a bib file and start an analysis.
However, the read_bibliography
function failed on another bib file I tried. This one had some custom sections and text in it. I looked into the failure and the parsing of the file via your regex expressions may have produced some unexpected results. This made me wonder: can you use the results of bibtex::read.bib
and work with that? As you may know, the resulting bibentry
has fields you can extract, e.g. bib[[1]]$title
etc:
> str(bib[[1]])
Class 'bibentry' hidden list of 1
$ Bruna-2010:List of 7
..$ title : chr "Scientific Journals Can Advance Tropical Biology and Conservation by Requiring Data Archiving"
..$ volume : chr "42"
..$ doi : chr "10.1111/j.1744-7429.2010.00652.x"
..$ journal: chr "Biotropica"
..$ author :Class 'person' hidden list of 1
.. ..$ :List of 5
.. .. ..$ given : chr [1:2] "Emilio" "M."
.. .. ..$ family : chr "Bruna"
.. .. ..$ role : NULL
.. .. ..$ email : NULL
.. .. ..$ comment: NULL
..$ year : chr "2010"
..$ pages : chr "399--401"
..- attr(*, "bibtype")= chr "Article"
..- attr(*, "key")= chr "Bruna-2010"
At least then you could offload the challenge of firstly reading in a bibfile?
I think there's going to be issues no matter what method is used to read in the bibfiles...
For example, I tried each of read_bibliography
, bibtex::read.bib
and RefManageR::ReadBib
to read in the following bibliography, and none of them could get the 'author' correct:
@MISC{biosec-act-2015,
title = "{Biosecurity Act 2015}",
author = "{Department of Agriculture and Water Resources}",
month = jun,
year = 2015,
url = "https://www.legislation.gov.au/Details/C2015A00061"
}
I'm currently running into this issue as well ... or a variant of it ... seems like my .bib file is pulling a function error? Error in if (any(col_n < 3)) { : missing value where TRUE/FALSE needed
That might be just due to an ugly .bib but I'm not really sure ... gonna try and just switch my data exports to .csv or .ris