hrbrmstr/sergeant

drill_version(ds) failed

hermandr opened this issue · 6 comments

Environment

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Singapore.1252  LC_CTYPE=English_Singapore.1252    LC_MONETARY=English_Singapore.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Singapore.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.3.0   stringr_1.3.1   purrr_0.2.5     readr_1.1.1     tidyr_0.8.1     tibble_1.4.2   
 [7] ggplot2_3.0.0   tidyverse_1.2.1 sergeant_0.5.2  dbplyr_1.2.2    dplyr_0.7.6     DBI_1.0.0      
[13] rJava_0.9-10   

How to replicate

ds <- src_drill(drill_ip) 
ds
>src:  DrillConnection
>tbls: cp.default, dfs.default, dfs.root, dfs.tmp, INFORMATION_SCHEMA, postgres.information_schema,
postgres.pg_catalog, postgres.postgres, postgres.public, postgres, sys

db <- tbl(ds, "cp.`employee.json`")
db
store_id gender department_id birth_date supervisor_id last_name position_title hire_date          
<int> <chr>          <int> <date>             <int> <chr>     <chr>          <dttm>             
1        0 F                  1 1961-08-26             0 Nowmer    President      1994-12-01 00:00:00
2        0 M                  1 1915-07-03             1 Whelply   VP Country Ma~ 1994-12-01 00:00:00
3        0 M                  1 1969-06-20             1 Spence    VP Country Ma~ 1998-01-01 00:00:00
4        0 F                  1 1951-05-10             1 Gutierrez VP Country Ma~ 1998-01-01 00:00:00
5        0 F                  2 1942-10-08             1 Damstra   VP Informatio~ 1994-12-01 00:00:00
6        0 F                  3 1949-03-27             1 Kanagaki  VP Human Reso~ 1994-12-01 00:00:00
7        9 F                 11 1922-08-10             5 Brunner   Store Manager  1998-01-01 00:00:00
8       21 F                 11 1979-06-23             5 Blumberg  Store Manager  1998-01-01 00:00:00
9        0 M                  5 1949-08-26             1 Stanz     VP Finance     1994-12-01 00:00:00
10        1 M                 11 1967-06-20             5 Murraiin  Store Manager  1998-01-01 00:00:00
# ... with more rows, and 8 more variables: management_role <chr>, salary <dbl>, marital_status <chr>,
#   full_name <chr>, employee_id <int>, education_level <chr>, first_name <chr>, position_id <int>


drill_version(ds)
Error in is.url(url) : length(url) == 1 is not TRUE

Investigation

sergeant:::make_server
function (drill_con) 
{
  sprintf("%s://%s:%s", ifelse(drill_con$ssl[1], "https", "http"), 
          drill_con$host, drill_con$port)
}
<bytecode: 0x0000000022b58b98>
<environment: namespace:sergeant>
str(ds)  
List of 1
$ con:Formal class 'DrillConnection' [package "sergeant"] with 5 slots
.. ..@ host    : chr "172.27.141.128"
.. ..@ port    : int 8047
.. ..@ ssl     : logi FALSE
.. ..@ username: chr(0) 
.. ..@ password: chr(0) 
- attr(*, "class")= chr [1:3] "src_drill" "src_sql" "src"

Addressing the ssl variable has a problem:

ds$ssl[1]
NULL

# Correct way
ds$con@ssl
FALSE

This code need to be fixed in make_server function.

function (drill_con) 
{
  sprintf("%s://%s:%s", ifelse(drill_con$con@ssl, "https", "http"), 
          drill_con$con@host, drill_con$con@port)
}

Looks like this creates error in other functions.
drill_show_files - failed
drill_stats - failed
drill_storage - failed
drill_status - failed

yep. b/c you're using it improperly.

There are 3 distinct interfaces in the package:

  • direct REST API (setup with drill_connection())
  • dplyr (via src_drill())
  • JDBC (via drill_jdbc() or more granular use of DrillJDBC() if one is an expert in JDBC)

From TFM on drill_version():

image

So, drill_version() is part of the REST API functions. I believe 2-years-ago "me" figured that it taking a drill_connection created object was sufficient docs but I think 2-years-ago "me" just didn't want to write more documentation at the time :-) I'll see what I can do to remedy that before the official 7.0 release.

However, it was pretty straightforward to modify it to return this information. I'm not convinced it should and it's highly unlikely any other REST API wrapper functions will be augmented to support dual-use (I don't know of any other dplyr-DB wrapper packages that veer so wildly from the expectations of a dplyr interface; messing with idiom expectations is not a great idea IMO).

dplyr::tbl(src_drill(), dplyr::sql("(SELECT version FROM sys.version)"))

FWIW would be the idiomatic way of getting the Drill version via the dplyr interface.

If you're OK with your name being in the DESCRIPTION (this was a very helpful issue #ty) would you (when you have time) add a response to the issue with your first/last names and (optional) email so I can add you as a contributor (or feel free to make the change yourself and file a PR…I can walk you through that if you've not done it before).

I made a quick hack to the documentation to group the each family of functions together. So, for drill_status() it now shows:

image

(I omitted the Examples section in the screen capture for brevity)

Hopefully that's a decent interim clarification solution.

Thanks for the info.
Would like to help as I am using drill and recommending drill to students.

Herman Tan
hermandr@gmail.com

definitely send me any feedback they or you have. will be glad to tailor the package and/or ancillary resources to make it easier for folks to adopt Drill.

Not sure if they're all useful but I also tag any blog post that are about or use Apache Drill with a separate tag and they're accumulated here: https://rud.is/b/category/apache-drill/