StevenMMortimer/salesforcer

Create empty columns in query results

carlganz opened this issue · 4 comments

Issue submission checklist

When filing your issue please make an attempt to understand your query and debug a little bit on your own. Below are a few suggestions on how to troubleshoot and document your issue. You may also refer to the Troubleshooting section of the Supported Queries vignette.

  • I have set verbose=TRUE in sf_query().

  • I have tried a few different function call arguments to see if I can workaround and/or isolate the issue (e.g. reviewing the output from the "SOAP" vs "REST" or the "Bulk 1.0" vs "Bulk 2.0" or tinkering with the control argument in the function call).

  • I have taken a look at the query unit tests test-query.R to see if my type of query has been documented and tested.

  • I have considered making a minimal reproducible example using the reprex package. Details on how to create a reprex are available here: https://www.tidyverse.org/help/#reprex.

  • I have included the version of R and any packages that are used (Hint: Simply copy/paste the result of devtools::session_info() at the bottom of your issue).

Thank you for considering these steps. It will speed up the process of resolving your issue.


Issue description

First of all thank you for the amazing package. In general, null values in Salesforce are converted to NAs in R. However, if a field in a query is null for all returned records the column doesn't appear at all instead of appearing as all NAs.

reprex

# hard to reproduce since everyone has their own Salesforce data
sf_query("select id, phone from account limit 10")
# assuming you have some non-null phone values a phone field will appear
sf_query("select id, phone from account where phone=null limit 10")
# no phone field appears

Obviously this is contrived example, but in cases where I query a single record I expect to get an explicit NA instead of a missing field when a value is NULL in Salesforce. I'm guessing this issue has something to do with JSON parsing and dplyr::bind_rows


Session Info

Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       macOS Catalina 10.15.6      
 system   x86_64, darwin17.0          
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2020-07-30Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 ! package        * version    date       lib source                              
   anytime          0.3.8      2020-07-24 [1] CRAN (R 4.0.2)                      
   askpass          1.1        2019-01-13 [1] CRAN (R 4.0.0)                      
   assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                      
   backports        1.1.8      2020-06-17 [1] CRAN (R 4.0.0)                      
   base64enc        0.1-3      2015-07-28 [1] CRAN (R 4.0.0)                      
   callr            3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                      
   checkmate        2.0.0      2020-02-06 [1] CRAN (R 4.0.0)                      
   class            7.3-17     2020-04-26 [1] CRAN (R 4.0.2)                      
   classInt         0.4-3      2020-04-07 [1] CRAN (R 4.0.0)                      
   cli              2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                      
   clipr            0.7.0      2019-07-23 [1] CRAN (R 4.0.0)                      
   clisymbols       1.2.0      2017-05-21 [1] CRAN (R 4.0.0)                      
   codetools        0.2-16     2018-12-24 [1] CRAN (R 4.0.2)                      
   colorspace       1.4-1      2019-03-18 [1] CRAN (R 4.0.0)                      
   crayon           1.3.4.9000 2020-05-30 [1] Github (r-lib/crayon@dcf6d44)       
   crosstalk        1.1.0.1    2020-03-13 [1] CRAN (R 4.0.0)                      
   curl             4.3        2019-12-02 [1] CRAN (R 4.0.0)                      
   data.table       1.13.0     2020-07-24 [1] CRAN (R 4.0.2)                      
   DBI              1.1.0      2019-12-15 [1] CRAN (R 4.0.0)                      
   desc             1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                      
   devtools         2.3.1      2020-07-21 [1] CRAN (R 4.0.2)                      
   digest           0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                      
   dplyr          * 1.0.0      2020-05-29 [1] CRAN (R 4.0.0)                      
   DT               0.14       2020-06-24 [1] CRAN (R 4.0.0)                      
   dygraphs         1.1.1.6    2018-07-11 [1] CRAN (R 4.0.0)                      
   e1071            1.7-3      2019-11-26 [1] CRAN (R 4.0.0)                      
   ellipsis         0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                      
   evaluate         0.14       2019-05-28 [1] CRAN (R 4.0.0)                      
   fansi            0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                      
   fastmap          1.0.1      2019-10-08 [1] CRAN (R 4.0.0)                      
   fs               1.4.2      2020-06-30 [1] CRAN (R 4.0.0)                      
   furrr            0.1.0      2018-05-16 [1] CRAN (R 4.0.0)                      
   future           1.18.0     2020-07-09 [1] CRAN (R 4.0.0)                      
   generics         0.0.2      2018-11-29 [1] CRAN (R 4.0.0)                      
   ggplot2          3.3.2      2020-06-19 [1] CRAN (R 4.0.0)                      
   globals          0.12.5     2019-12-07 [1] CRAN (R 4.0.0)                      
   glue             1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                      
   gtable           0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                      
 P GTSF           * 0.1.0      2020-07-03 [?] local                               
   hms              0.5.3      2020-01-08 [1] CRAN (R 4.0.0)                      
   htmlTable        2.0.1      2020-07-05 [1] CRAN (R 4.0.0)                      
   htmltools        0.5.0      2020-06-16 [1] CRAN (R 4.0.0)                      
   htmlwidgets      1.5.1      2019-10-08 [1] CRAN (R 4.0.0)                      
   httpuv           1.5.4      2020-06-06 [1] CRAN (R 4.0.0)                      
   httr             1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                      
   jsonlite         1.7.0      2020-06-25 [1] CRAN (R 4.0.0)                      
   KernSmooth       2.23-17    2020-04-26 [1] CRAN (R 4.0.2)                      
   knitr            1.29       2020-06-23 [1] CRAN (R 4.0.0)                      
   later            1.1.0.1    2020-06-05 [1] CRAN (R 4.0.0)                      
   lattice          0.20-41    2020-04-02 [1] CRAN (R 4.0.2)                      
   lazyeval         0.2.2      2019-03-15 [1] CRAN (R 4.0.0)                      
   leaflet          2.0.3      2019-11-16 [1] CRAN (R 4.0.0)                      
   lifecycle        0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                      
   listenv          0.8.0      2019-12-05 [1] CRAN (R 4.0.0)                      
   lubridate        1.7.9      2020-06-08 [1] CRAN (R 4.0.0)                      
   magrittr         1.5        2014-11-22 [1] CRAN (R 4.0.0)                      
   memoise          1.1.0      2017-04-21 [1] CRAN (R 4.0.0)                      
   mime             0.9        2020-02-04 [1] CRAN (R 4.0.0)                      
   munsell          0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                      
   openssl          1.4.2      2020-06-27 [1] CRAN (R 4.0.0)                      
   packrat          0.5.0      2018-11-14 [1] CRAN (R 4.0.0)                      
   pillar           1.4.6      2020-07-10 [1] CRAN (R 4.0.0)                      
   pkgbuild         1.1.0      2020-07-13 [1] CRAN (R 4.0.2)                      
   pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                      
   pkgload          1.1.0      2020-05-29 [1] CRAN (R 4.0.0)                      
   plotly           4.9.2.1    2020-04-04 [1] CRAN (R 4.0.0)                      
   prettyunits      1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                      
   processx         3.4.3      2020-07-05 [1] CRAN (R 4.0.0)                      
   promises         1.1.1      2020-06-09 [1] CRAN (R 4.0.0)                      
   ps               1.3.3      2020-05-08 [1] CRAN (R 4.0.0)                      
   purrr            0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                      
   R6               2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                      
   Rcpp             1.0.5      2020-07-06 [1] CRAN (R 4.0.0)                      
   readr            1.3.1      2018-12-21 [1] CRAN (R 4.0.0)                      
   remotes          2.2.0      2020-07-21 [1] CRAN (R 4.0.2)                      
   reprex         * 0.3.0      2019-05-16 [1] CRAN (R 4.0.2)                      
   rintrojs         0.2.2      2019-05-29 [1] CRAN (R 4.0.0)                      
   rlang            0.4.7      2020-07-09 [1] CRAN (R 4.0.2)                      
   rlist            0.4.6.1    2016-04-04 [1] CRAN (R 4.0.2)                      
   rmarkdown        2.3        2020-06-18 [1] CRAN (R 4.0.0)                      
   rprojroot        1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                      
   rstudioapi       0.11       2020-02-07 [1] CRAN (R 4.0.0)                      
   salesforcer    * 0.2.0      2020-07-21 [1] CRAN (R 4.0.2)                      
   scales           1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                      
   sendgridr        0.1.0      2020-04-24 [1] Github (mrchypark/sendgridr@8da75ba)
   sessioninfo      1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                      
   sf               0.9-5      2020-07-14 [1] CRAN (R 4.0.2)                      
   shiny            1.5.0      2020-06-23 [1] CRAN (R 4.0.0)                      
   shinydashboard   0.7.1      2018-10-17 [1] CRAN (R 4.0.0)                      
   stringi          1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                      
   stringr          1.4.0.9000 2020-05-30 [1] local                               
   testthat         2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                      
   tibble           3.0.3      2020-07-10 [1] CRAN (R 4.0.0)                      
   tidyr            1.1.0      2020-05-20 [1] CRAN (R 4.0.0)                      
   tidyselect       1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                      
   tsibble          0.9.2      2020-07-24 [1] CRAN (R 4.0.2)                      
   twilio           0.1.0      2020-04-24 [1] Github (carlganz/twilio@ee0cbc5)    
   units            0.6-7      2020-06-13 [1] CRAN (R 4.0.0)                      
   usethis          1.6.1      2020-04-29 [1] CRAN (R 4.0.0)                      
   utf8             1.1.4      2018-05-24 [1] CRAN (R 4.0.0)                      
   vctrs            0.3.2      2020-07-15 [1] CRAN (R 4.0.2)                      
   viridisLite      0.3.0      2018-02-01 [1] CRAN (R 4.0.0)                      
   whisker          0.4        2019-08-28 [1] CRAN (R 4.0.0)                      
   withr            2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                      
   xfun             0.16       2020-07-24 [1] CRAN (R 4.0.2)                      
   XML              3.99-0.5   2020-07-23 [1] CRAN (R 4.0.2)                      
   xml2             1.3.2      2020-04-23 [1] CRAN (R 4.0.0)                      
   xtable           1.8-4      2019-04-21 [1] CRAN (R 4.0.0)                      
   xts              0.12-0     2020-01-19 [1] CRAN (R 4.0.0)                      
   zip              2.0.4      2019-09-01 [1] CRAN (R 4.0.0)                      
   zoo              1.8-8      2020-05-02 [1] CRAN (R 4.0.0)                      

Hi @carlganz – Thanks for noting. At the moment if you need all the columns, then use the Bulk APIs. They will return all columns. For the REST and SOAP APIs, I will look into the XML/JSON. I think it's possible because null elements (at least in the SOAP API) come back as <FieldName xsi:nil=true />. Currently, they're dropped because the XML element doesn't have a value, but I can use an XPath query to pull out the elements with the attribute xsi:nil. Admittedly, I was aware, but concerned about parsing performance so haven't implemented. I'm not as sure about the JSON at the moment, but I'll be digging into it some more soon.

Per the Supported Queries Vignette:

Sometimes you may notice that the requested relationship fields do not appear in the query results. This is because the SOAP and REST APIs do not return any related object information if it does not exist on the record and there is no reliable way to extract and rebuild the empty columns based on the query string.

@carlganz I think this can be done with fields on the object. I'd love your feedback on what you might prefer to have returned in a couple more complex scenarios. For example, if a relationship query returns some Contact records with a related Account and some without, then you won't see Account.Name in the JSON or XML for some records. They contain just a NULL value for the Account field. (see the raw JSON and XML below).

SOQL: SELECT Id, FirstName, Account.Name FROM Contact
JSON

{
    "attributes": {
        "type": "Contact",
        "url": "/services/data/v48.0/sobjects/Contact/0033s000013ZGRUAA4"
    },
    "Id": "0033s000013ZGRUAA4",
    "FirstName": "Test",
    "Account": null
}

XML

<records xsi:type="sf:sObject">
    <sf:type>Contact</sf:type>
    <sf:Id>0033s000013ZGRUAA4</sf:Id>
    <sf:Id>0033s000013ZGRUAA4</sf:Id>
    <sf:FirstName>Test</sf:FirstName>
    <sf:Account xsi:nil="true"/>
</records>

In cases where all of the Contact records have a null Account value (no related Account). The parsed result would look like below. I can't return an Account.Name column unless I parsed the SOQL, which I'm not sure is really a reliable solution.

#> # A tibble: 3 x 3
#>   Id                   Account     FirstName
#>   <chr>                <chr>       <chr>
#> 1 0033s000013ZPWmAAO   NA          Bob
#> 2 0033s000013ZP78AAG   NA          Jill
#> 3 0033s000013ZBSzAAO   NA          Sam

In the more common case, some of the Contact records do not have a null Account value, then I can pull the Account.Name name field. The only problem is you'll see an extra column called Account from the records that didn't have an Account and I'm hesitant to remove because there isn't a great way to determine if that is an object reference that's empty or a field. If could remove columns because they contain all NA values, then that seems like that would defeat the purpose of the change.

#> # A tibble: 3 x 4
#>   Id                   Account     FirstName       Account.Name
#>   <chr>                <chr>       <chr>           <chr>        
#> 1 0033s000013ZPWmAAO   NA          Bob             NA
#> 2 0033s000013ZP78AAG   NA          Jill            NA
#> 3 0033s000013ZBSzAAO   NA          Sam             Be$t Account

Ah I see this more complicated than I imagined. For some reason I thought Salesforce API always returned explicit NULL for related fields but as you point out it just gives NULL for the related object. Parsing the SOQL is definitely outside scope so I suspect there isn't a proper solution so feel free to close issue.

@carlganz Thanks for the feedback and looking through the details there. I've grappled with the JSON/XML returned by queries but also the other operations and sometimes I just scratch my head at what the developers were thinking with some of the APIs...

I'll keep this open as a reminder to revisit if there ever is a good solution. Dropping the columns has always bothered me, but the workaround I recommend is just checking that the columns exist before further processing query recordsets in your scripts. If it doesn't exist then create and set to NA.