cloudyr/googleComputeEngineR

Cron does not work in Rstudio tidyverse build

axel-analyst opened this issue · 6 comments

Describe the bug
Can't find and launch cron as well as apt-get update after building image based on Dockerfile rocker/tidyverse

To Reproduce

  1. Saving Dockerfile from 'rocker/tidyverse' on Google Cloud Storage.
  2. Building Docker image via terminal command based on https://cloud.google.com/cloud-build/docs/building/build-containers#build_using_dockerfile :

gcloud builds submit --tag gcr.io/bigquery-for/crontab
3.

tag <- gce_tag_container("crontab", project = "bigquery-for")

vm2 <- gce_vm("rstudio-cron", 
              predefined_type = "n1-standard-1",
              template = "rstudio", 
              dynamic_image = tag, 
              username = "master", 
              password = "1234")

library(cronR)
cron_add(paste0("Rscript ", normalizePath("report.R")), frequency = "minutely", at='15:10')

report.R works well with all credentials and etc itself, without cron.

Expected behavior
Cron does not work. I see no log file, addin window is inactive (I cannot add, delete or manage jobs) and the job itself is not done at all.
Also, in SSH of VM cannot run apt-get update

Session Info

R version 3.6.2 (2019-12-12) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 10 (buster)  Matrix products: default BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so  locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8      [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                   [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       other attached packages: [1] shiny_1.4.0 cronR_0.4.0  loaded via a namespace (and not attached):  [1] Rcpp_1.0.3       crayon_1.3.4     digest_0.6.24    later_1.0.0      mime_0.8         R6_2.4.1         jsonlite_1.6      [8] xtable_1.8-4     magrittr_1.5     pillar_1.4.3     rlang_0.4.4      rstudioapi_0.10  fs_1.3.1         miniUI_0.1.1.1   [15] promises_1.1.0   shinyFiles_0.7.5 tools_3.6.2      httpuv_1.5.2     fastmap_1.0.1    compiler_3.6.2   pkgconfig_2.0.3  [22] htmltools_0.4.0  tibble_2.1.3
`

What is your Dockerfile you are building? You would need to install cron in that.

But I also suggest googleCloudRunner is a simpler way to schedule R scripts, if that is your goal. https://code.markedmondson.me/googleCloudRunner/articles/usecases.html#run-r-code-in-a-background-task

As I understood, Dockerfile have cron in it (I took it from tutorial for the package):

FROM rocker/tidyverse
MAINTAINER Mark Edmondson (r@sunholo.com)

# install cron and R package dependencies
RUN apt-get update && apt-get install -y \
    cron \
    nano \
    ## clean up
    && apt-get clean \ 
    && rm -rf /var/lib/apt/lists/ \ 
    && rm -rf /tmp/downloaded_packages/ /tmp/*.rds
    
## Install packages from CRAN
RUN install2.r --error \ 
    -r 'http://cran.rstudio.com' \
    googleAuthR shinyFiles googleCloudStorageR bigQueryR gmailr googleAnalyticsR \
    ## install Github packages
    && Rscript -e "devtools::install_github(c('bnosac/cronR'))" \
    ## clean up
    && rm -rf /tmp/downloaded_packages/ /tmp/*.rds \

RUN sudo service cron start

Yes, my goal just to have scheduled scripts, but ideally I want that I could activate particular VMs only for particular hours to launch scheduled scripts (I need one high CPU and one standard, honestly). What is the best solution?
Thanks in advance.

Ok it does look like cron should be there, and some debugging on the crontab logs could be done to debug why its not starting. Is there anything in the build log?

But for your use case, I do recommend googleCloudRunner as the best solution.

Thanks, but is it really simplier? Do I need to make a Docker build again for googleCloudRunner solution?
Because I made all steps for setup and tried it but got an error:

2020-02-19 17:38:29> Request Status Code: 400
2020-02-19 17:38:29> API returned error: invalid build: invalid .steps field: build step 0 arg 2 too long (max: 4000)

Hi Mark, thanks for the tip. However, I still face the problem when tried to run the code:

2020-02-20 14:31:22> Deploy R script cr_rscript_2020021582209082143122 to Cloud Build
2020-02-20 14:31:22> Scheduling R script on cron schedule: 6 5 * * *
2020-02-20 14:31:22> Request Status Code: 400
Error: API returned: Request contains an invalid argument.

I ran the script manually and everything is ok. How can I diagnost the error?
Thanks in advance