aberHRML/classyfireR

InChIKeys give errors

meier-rene opened this issue · 6 comments

I tried to batch process a bigger number of InChI-keys and found some which give errors

bad_key <- 
c('QEVGZEDELICMKH-UHFFFAOYSA-N',
'SYLAFCZSYRXBJF-UHFFFAOYSA-N',
'BOPPPUCSDSHZEZ-UHFFFAOYSA-N')

> get_classification(bad_key[1])
✔ QEVGZEDELICMKH-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace. 
> get_classification(bad_key[2])
✔ SYLAFCZSYRXBJF-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace. 
> get_classification(bad_key[3])
✔ BOPPPUCSDSHZEZ-UHFFFAOYSA-N
Error: Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Call `rlang::last_error()` to see a backtrace.

Using web browser is working fine:
http://classyfire.wishartlab.com/entities/QEVGZEDELICMKH-UHFFFAOYSA-N
http://classyfire.wishartlab.com/entities/SYLAFCZSYRXBJF-UHFFFAOYSA-N
http://classyfire.wishartlab.com/entities/BOPPPUCSDSHZEZ-UHFFFAOYSA-N

Could you please have a look?

and here comes the stacktrace as suggested by @sneumann

rlang::last_trace()
<error/rlang_error>
Columns `source`, `source_id`, `annotations` must be 1d atomic vectors or lists
Backtrace:
     █
  1. └─base::sapply(inchikeys, get_classification)
  2.   └─base::lapply(X = X, FUN = FUN, ...)
  3.     └─classyfireR:::FUN(X[[i]], ...)
  4.       └─classyfireR:::parse_external_desc(json_res)
  5.         └─tibble::tibble(...)
  6.           ├─tibble::as_tibble(lst_quos(xs, expand = TRUE))
  7.           └─tibble:::as_tibble.list(lst_quos(xs, expand = TRUE))
  8.             └─tibble:::list_to_tibble(x, validate)
  9.               └─tibble:::check_tibble(x)
 10.                 └─tibble:::invalid_df(...)
 11.                   └─tibble:::stopc(...)

Working InChI-keys are for example:

good_key <- 
c('JIVPVXMEBJLZRO-UHFFFAOYSA-N',
'ZZUFCTLCJUWOSV-UHFFFAOYSA-N',
'QZTKDVCDBIDYMD-UHFFFAOYSA-N')

This is the pure JSON: QEVGZEDELICMKH-UHFFFAOYSA-N.json.txt
returned for one of them. The error is thrown in

parse_external_desc <- function(x)

I am on a current snapshot of R-devel, and get a slightly different error message
from certainly the same underlying issue. Checking the JSON I see that there are no source, source_id nor annotations.

> get_classification(bad_key[1])
✔ QEVGZEDELICMKH-UHFFFAOYSA-N
Error: All columns in a tibble must be 1d or 2d objects:
* Column `source` is NULL
* Column `source_id` is NULL
* Column `annotations` is NULL
Call `rlang::last_error()` to see a backtrace

Doing things manually

response <- httr::GET("http://classyfire.wishartlab.com/entities/QEVGZEDELICMKH-UHFFFAOYSA-N.json")
text_content <- httr::content(response, 'text')
json_res <- jsonlite::fromJSON(text_content)
classification <- classyfireR:::parse_json_output(json_res)

I get

> classification
# A tibble: 4 x 3
  Level      Classification                     CHEMONT          
  <chr>      <chr>                              <chr>            
1 kingdom    Organic compounds                  CHEMONTID:0000000
2 superclass Organic acids and derivatives      CHEMONTID:0000264
3 class      Carboxylic acids and derivatives   CHEMONTID:0000265
4 subclass   Dicarboxylic acids and derivatives CHEMONTID:0000346

Checking one of the working InChIkeys: http://classyfire.wishartlab.com/entities/JIVPVXMEBJLZRO-UHFFFAOYSA-N.json
they do have

...
"external_descriptors":[
{"source":"CHEBI",
"source_id":"CHEBI:3654",
"annotations":["sulfonamide","monochlorobenzenes","isoindoles"]
}
]
...

while the bad keys have "external_descriptors":[].
So, in

object@external_descriptors <- parse_external_desc(json_res)

we need a check for length(json_res$external_descriptors)>0

Yours, Steffen

Yeah, this is caused when there are no external descriptors are present. I will add a length check in and push a new version to the devel branch

Tom

This is fixed now on the devel branch, if you install using;

remotes::install_github('aberHRML/classyfireR', ref = 'devel')

I will add length checks for all the other components, in-case there are further InChIKeys missing elements of the json output. Should be able to get the fixed version onto CRAN by Monday.

Thanks

Tom

Thanks for the fast fix. Its working now.