table identifier escaping causing dplyr interface errors

Question

table identifier escaping causing dplyr interface errors

alistaire47 opened this issue 4 years ago · 1 comments

Not sure exactly what changed, but some version somewhere changed so I'm now getting errors when using the dplyr interface because it's putting double-quotes around the table identifier. Reprex:

> library(sergeant)
> tbl(src_drill('localhost'), 'cp.`employee.json`')
Request failed [500]. Retrying in 1 seconds...
Request failed [500]. Retrying in 2 seconds...
Request failed [500]. Retrying in 1.8 seconds...
Request failed [500]. Retrying in 1.7 seconds...
# Source:   table<cp.`employee.json`> [?? x 8]
# Database: DrillConnection
Warning message:
PARSE ERROR: Encountered "FROM \"" at line 1, column 1.
SQL Query: SELECT *
FROM "cp.`employee.json`"
^
LIMIT 11

Original Query:

  1: SELECT *
  2: FROM "cp.`employee.json`"
  3: LIMIT 11

Query Profile Error Link:
http://localhost:8047/profiles/772b3c22-8d04-43d5-b733-9e2ae0bcbb50
> con <- drill_connection('localhost')

If I try to run that query directly, it fails:

> drill_query(
+     con,
+     'SELECT *
+      FROM "cp.`employee.json`"
+      LIMIT 11'
+ )
  |======================================================================| 100%
Request failed [500]. Retrying in 1.4 seconds...
  |======================================================================| 100%
Request failed [500]. Retrying in 1.9 seconds...
  |======================================================================| 100%
Query ==> SELECT *      FROM "cp.`employee.json`"      LIMIT 11
PARSE ERROR: Encountered "FROM \"" at line 1, column 6.

SQL Query: SELECT *
     FROM "cp.`employee.json`"
     ^
     LIMIT 11

[Error Id: dc495019-3219-4b88-b5f8-9a4fd657dc0f ]

whereas if I remove the quotes around cp.`employee.json` it works:

> drill_query(
+     con,
+     'SELECT *
+      FROM cp.`employee.json`
+      LIMIT 11'
+ )
  |======================================================================| 100%
# A tibble: 11 x 16
   employee_id full_name first_name last_name position_id position_title
         <dbl> <chr>     <chr>      <chr>           <dbl> <chr>
 1           1 Sheri No… Sheri      Nowmer              1 President
 2           2 Derrick … Derrick    Whelply             2 VP Country Ma…
 3           4 Michael … Michael    Spence              2 VP Country Ma…
 4           5 Maya Gut… Maya       Gutierrez           2 VP Country Ma…
 5           6 Roberta … Roberta    Damstra             3 VP Informatio…
 6           7 Rebecca … Rebecca    Kanagaki            4 VP Human Reso…
 7           8 Kim Brun… Kim        Brunner            11 Store Manager
 8           9 Brenda B… Brenda     Blumberg           11 Store Manager
 9          10 Darren S… Darren     Stanz               5 VP Finance
10          11 Jonathan… Jonathan   Murraiin           11 Store Manager
11          12 Jewel Cr… Jewel      Creek              11 Store Manager
# … with 10 more variables: store_id <dbl>, department_id <dbl>,
#   birth_date <chr>, hire_date <chr>, salary <dbl>, supervisor_id <dbl>,
#   education_level <chr>, marital_status <chr>, gender <chr>,
#   management_role <chr>

Results are the same in the Drill shell. I assume this is a result of a version or setting changing somewhere, but so far I can't locate it. What I'm running:

software	version
MacOS	10.14.6
Apache Drill	1.18.0
R	4.0.3
sergeant	0.9.0
dbplyr	2.0.0
dplyr	1.0.2

Happy to write a PR to fix this if this is something that can be fixed here, but so far I can't figure out what has to change.

Answer 1 · 2021-02-10T17:17:25.000Z

Thank you for this! I use the caffeinated {sergeant} one b/c I have large resultsets and neglected to fix this {dplyr} regression in the REST package.