Create empty columns in query results
carlganz opened this issue · 4 comments
Issue submission checklist
When filing your issue please make an attempt to understand your query and debug a little bit on your own. Below are a few suggestions on how to troubleshoot and document your issue. You may also refer to the Troubleshooting section of the Supported Queries vignette.
-
I have set
verbose=TRUE
insf_query()
. -
I have tried a few different function call arguments to see if I can workaround and/or isolate the issue (e.g. reviewing the output from the "SOAP" vs "REST" or the "Bulk 1.0" vs "Bulk 2.0" or tinkering with the
control
argument in the function call). -
I have taken a look at the query unit tests test-query.R to see if my type of query has been documented and tested.
-
I have considered making a minimal reproducible example using the reprex package. Details on how to create a reprex are available here: https://www.tidyverse.org/help/#reprex.
-
I have included the version of R and any packages that are used (Hint: Simply copy/paste the result of
devtools::session_info()
at the bottom of your issue).
Thank you for considering these steps. It will speed up the process of resolving your issue.
Issue description
First of all thank you for the amazing package. In general, null values in Salesforce are converted to NAs in R. However, if a field in a query is null for all returned records the column doesn't appear at all instead of appearing as all NAs.
reprex
# hard to reproduce since everyone has their own Salesforce data
sf_query("select id, phone from account limit 10")
# assuming you have some non-null phone values a phone field will appear
sf_query("select id, phone from account where phone=null limit 10")
# no phone field appears
Obviously this is contrived example, but in cases where I query a single record I expect to get an explicit NA instead of a missing field when a value is NULL in Salesforce. I'm guessing this issue has something to do with JSON parsing and dplyr::bind_rows
Session Info
Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os macOS Catalina 10.15.6
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Los_Angeles
date 2020-07-30
─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! package * version date lib source
anytime 0.3.8 2020-07-24 [1] CRAN (R 4.0.2)
askpass 1.1 2019-01-13 [1] CRAN (R 4.0.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.0)
base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.0)
callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.0)
class 7.3-17 2020-04-26 [1] CRAN (R 4.0.2)
classInt 0.4-3 2020-04-07 [1] CRAN (R 4.0.0)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
clipr 0.7.0 2019-07-23 [1] CRAN (R 4.0.0)
clisymbols 1.2.0 2017-05-21 [1] CRAN (R 4.0.0)
codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.0)
crayon 1.3.4.9000 2020-05-30 [1] Github (r-lib/crayon@dcf6d44)
crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 4.0.0)
curl 4.3 2019-12-02 [1] CRAN (R 4.0.0)
data.table 1.13.0 2020-07-24 [1] CRAN (R 4.0.2)
DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
dplyr * 1.0.0 2020-05-29 [1] CRAN (R 4.0.0)
DT 0.14 2020-06-24 [1] CRAN (R 4.0.0)
dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 4.0.0)
e1071 1.7-3 2019-11-26 [1] CRAN (R 4.0.0)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
fastmap 1.0.1 2019-10-08 [1] CRAN (R 4.0.0)
fs 1.4.2 2020-06-30 [1] CRAN (R 4.0.0)
furrr 0.1.0 2018-05-16 [1] CRAN (R 4.0.0)
future 1.18.0 2020-07-09 [1] CRAN (R 4.0.0)
generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
ggplot2 3.3.2 2020-06-19 [1] CRAN (R 4.0.0)
globals 0.12.5 2019-12-07 [1] CRAN (R 4.0.0)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
P GTSF * 0.1.0 2020-07-03 [?] local
hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.0)
htmlTable 2.0.1 2020-07-05 [1] CRAN (R 4.0.0)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 4.0.0)
httpuv 1.5.4 2020-06-06 [1] CRAN (R 4.0.0)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.0)
KernSmooth 2.23-17 2020-04-26 [1] CRAN (R 4.0.2)
knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
later 1.1.0.1 2020-06-05 [1] CRAN (R 4.0.0)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.0)
leaflet 2.0.3 2019-11-16 [1] CRAN (R 4.0.0)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.0)
lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
mime 0.9 2020-02-04 [1] CRAN (R 4.0.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
openssl 1.4.2 2020-06-27 [1] CRAN (R 4.0.0)
packrat 0.5.0 2018-11-14 [1] CRAN (R 4.0.0)
pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.0)
pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
plotly 4.9.2.1 2020-04-04 [1] CRAN (R 4.0.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.0)
promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.0)
ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 4.0.0)
remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
reprex * 0.3.0 2019-05-16 [1] CRAN (R 4.0.2)
rintrojs 0.2.2 2019-05-29 [1] CRAN (R 4.0.0)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rlist 0.4.6.1 2016-04-04 [1] CRAN (R 4.0.2)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0)
salesforcer * 0.2.0 2020-07-21 [1] CRAN (R 4.0.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
sendgridr 0.1.0 2020-04-24 [1] Github (mrchypark/sendgridr@8da75ba)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
sf 0.9-5 2020-07-14 [1] CRAN (R 4.0.2)
shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.0)
shinydashboard 0.7.1 2018-10-17 [1] CRAN (R 4.0.0)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
stringr 1.4.0.9000 2020-05-30 [1] local
testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.0)
tidyr 1.1.0 2020-05-20 [1] CRAN (R 4.0.0)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
tsibble 0.9.2 2020-07-24 [1] CRAN (R 4.0.2)
twilio 0.1.0 2020-04-24 [1] Github (carlganz/twilio@ee0cbc5)
units 0.6-7 2020-06-13 [1] CRAN (R 4.0.0)
usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 4.0.0)
whisker 0.4 2019-08-28 [1] CRAN (R 4.0.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2)
XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.0)
xts 0.12-0 2020-01-19 [1] CRAN (R 4.0.0)
zip 2.0.4 2019-09-01 [1] CRAN (R 4.0.0)
zoo 1.8-8 2020-05-02 [1] CRAN (R 4.0.0)
Hi @carlganz – Thanks for noting. At the moment if you need all the columns, then use the Bulk APIs. They will return all columns. For the REST and SOAP APIs, I will look into the XML/JSON. I think it's possible because null elements (at least in the SOAP API) come back as <FieldName xsi:nil=true />
. Currently, they're dropped because the XML element doesn't have a value, but I can use an XPath query to pull out the elements with the attribute xsi:nil
. Admittedly, I was aware, but concerned about parsing performance so haven't implemented. I'm not as sure about the JSON at the moment, but I'll be digging into it some more soon.
Per the Supported Queries Vignette:
Sometimes you may notice that the requested relationship fields do not appear in the query results. This is because the SOAP and REST APIs do not return any related object information if it does not exist on the record and there is no reliable way to extract and rebuild the empty columns based on the query string.
@carlganz I think this can be done with fields on the object. I'd love your feedback on what you might prefer to have returned in a couple more complex scenarios. For example, if a relationship query returns some Contact records with a related Account and some without, then you won't see Account.Name
in the JSON or XML for some records. They contain just a NULL value for the Account
field. (see the raw JSON and XML below).
SOQL: SELECT Id, FirstName, Account.Name FROM Contact
JSON
{
"attributes": {
"type": "Contact",
"url": "/services/data/v48.0/sobjects/Contact/0033s000013ZGRUAA4"
},
"Id": "0033s000013ZGRUAA4",
"FirstName": "Test",
"Account": null
}
XML
<records xsi:type="sf:sObject">
<sf:type>Contact</sf:type>
<sf:Id>0033s000013ZGRUAA4</sf:Id>
<sf:Id>0033s000013ZGRUAA4</sf:Id>
<sf:FirstName>Test</sf:FirstName>
<sf:Account xsi:nil="true"/>
</records>
In cases where all of the Contact records have a null
Account value (no related Account). The parsed result would look like below. I can't return an Account.Name
column unless I parsed the SOQL, which I'm not sure is really a reliable solution.
#> # A tibble: 3 x 3
#> Id Account FirstName
#> <chr> <chr> <chr>
#> 1 0033s000013ZPWmAAO NA Bob
#> 2 0033s000013ZP78AAG NA Jill
#> 3 0033s000013ZBSzAAO NA Sam
In the more common case, some of the Contact records do not have a null
Account value, then I can pull the Account.Name
name field. The only problem is you'll see an extra column called Account
from the records that didn't have an Account and I'm hesitant to remove because there isn't a great way to determine if that is an object reference that's empty or a field. If could remove columns because they contain all NA
values, then that seems like that would defeat the purpose of the change.
#> # A tibble: 3 x 4
#> Id Account FirstName Account.Name
#> <chr> <chr> <chr> <chr>
#> 1 0033s000013ZPWmAAO NA Bob NA
#> 2 0033s000013ZP78AAG NA Jill NA
#> 3 0033s000013ZBSzAAO NA Sam Be$t Account
Ah I see this more complicated than I imagined. For some reason I thought Salesforce API always returned explicit NULL for related fields but as you point out it just gives NULL for the related object. Parsing the SOQL is definitely outside scope so I suspect there isn't a proper solution so feel free to close issue.
@carlganz Thanks for the feedback and looking through the details there. I've grappled with the JSON/XML returned by queries but also the other operations and sometimes I just scratch my head at what the developers were thinking with some of the APIs...
I'll keep this open as a reminder to revisit if there ever is a good solution. Dropping the columns has always bothered me, but the workaround I recommend is just checking that the columns exist before further processing query recordsets in your scripts. If it doesn't exist then create and set to NA
.