treeverse/lakeFS-samples

R notebooks fail with s3saveRDS error "Option seekfunction (20167) has unknown or unsupported type"

Closed this issue · 8 comments

rmoff commented

Two of the R notebooks have started failing:

  • R-weather.ipynb
  • R-nyc.ipynb

The error is when calling s3saveRDS:

aws.s3::s3saveRDS(x = nyc_data,
                  object = paste0(branch,"/nyc/","nyc_permits.R"), 
                  bucket = repo_name, 
                  region="",
                  use_https=useHTTPS)
Error in curl::handle_setopt(handle, .list = req$options): Option seekfunction (20167) has unknown or unsupported type.
Traceback:

1. s3saveRDS(x = df, bucket = "quickstart", object = paste0(branch, 
 .     "/weather/", "data.R"), base_url = baseurl, region = "", 
 .     use_https = FALSE)
2. put_object(file = tmp, bucket = bucket, object = object, ...)
3. s3HTTP(verb = "PUT", bucket = bucket, path = paste0("/", object), 
 .     headers = headers, request_body = file, verbose = verbose, 
 .     show_progress = show_progress, ...)
4. httr::PUT(url, H, body = httr::upload_file(request_body), query = query, 
 .     show_progress, ...)
5. request_perform(req, hu$handle$handle)
6. curl::handle_setopt(handle, .list = req$options)
rmoff commented

The error comes from curl here. I have no idea what it means or why it's happening now.

Can't reproduce it locally either. Different curl versions??

rmoff commented

These notebooks were working fine in these test runs against main (the failures were other notebooks):

The notebooks also ran fine in the PR https://github.com/treeverse/lakeFS-samples/actions/runs/5866557042.

So nothing's been merged to main that would account for this failure.

rmoff commented

Looking at the R pieces on the container that's built:

#12 [jupyter-notebook  7/11] RUN conda install --quiet --yes     'r-aws.s3'     'r-arrow'     'r-httr' &&     conda clean --all -f -y &&     fix-permissions "/opt/conda"
#12 1.919 Collecting package metadata (current_repodata.json): ...working... done
#12 21.86 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#12 198.4 Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
#12 440.9 Collecting package metadata (repodata.json): ...working... done
#12 539.7 Solving environment: ...working... done
#12 585.7 
#12 585.7 ## Package Plan ##
#12 585.7 
#12 585.7   environment location: /opt/conda
#12 585.7 
#12 585.7   added / updated specs:
#12 585.7     - r-arrow
#12 585.7     - r-aws.s3
#12 585.7     - r-httr
#12 585.7 
#12 585.7 
#12 585.7 The following packages will be downloaded:
#12 585.7 
#12 585.7     package                    |            build
#12 585.7     ---------------------------|-----------------
#12 585.7     ca-certificates-2023.7.22  |       hbcca054_0         146 KB  conda-forge
#12 585.7     certifi-2023.7.22          |     pyhd8ed1ab_0         150 KB  conda-forge
#12 585.7     openssl-3.1.2              |       hd590300_0         2.5 MB  conda-forge
#12 585.7     r-arrow-11.0.0             |    r42hcb278e6_1         3.4 MB  conda-forge
#12 585.7     r-aws.s3-0.3.22            |    r42hc72bb7e_2         210 KB  conda-forge
#12 585.7     r-aws.signature-0.6.0      |    r42hc72bb7e_2          89 KB  conda-forge
#12 585.7     r-bit-4.0.5                |    r42h57805ef_1         1.0 MB  conda-forge
#12 585.7     r-bit64-4.0.5              |    r42h57805ef_2         477 KB  conda-forge
#12 585.7     r-httr-1.4.7               |    r42hc72bb7e_0         460 KB  conda-forge
#12 585.7     ------------------------------------------------------------
#12 585.7                                            Total:         8.4 MB
#12 585.7 
#12 585.7 The following NEW packages will be INSTALLED:
#12 585.7 
#12 585.7   r-arrow            conda-forge/linux-64::r-arrow-11.0.0-r42hcb278e6_1 
#12 585.7   r-aws.s3           conda-forge/noarch::r-aws.s3-0.3.22-r42hc72bb7e_2 
#12 585.7   r-aws.signature    conda-forge/noarch::r-aws.signature-0.6.0-r42hc72bb7e_2 
#12 585.7   r-bit              conda-forge/linux-64::r-bit-4.0.5-r42h57805ef_1 
#12 585.7   r-bit64            conda-forge/linux-64::r-bit64-4.0.5-r42h57805ef_2 
#12 585.7 
#12 585.7 The following packages will be UPDATED:
#12 585.7 
#12 585.7   ca-certificates                      2022.12.7-ha878542_0 --> 2023.7.22-hbcca054_0 
#12 585.7   certifi                            2022.12.7-pyhd8ed1ab_0 --> 2023.7.22-pyhd8ed1ab_0 
#12 585.7   openssl                                  3.1.0-h0b41bf4_0 --> 3.1.2-hd590300_0 
#12 585.7   r-httr                                1.4.5-r42hc72bb7e_0 --> 1.4.7-r42hc72bb7e_0 
#12 585.7 
#12 585.7 
#12 585.7 Preparing transaction: ...working... done
#12 585.7 Verifying transaction: ...working... done
#12 585.7 Executing transaction: ...working... done
#12 593.3 Will remove 1 package cache(s).
#12 DONE 595.8s

The container is the same as is run locally and works fine for these notebooks (built on jupyter/all-spark-notebook:notebook-6.5.3


The notebook R.ipynb also uses s3saveRDS but doesn't throw this error.

rmoff commented

Courtesy of @alexellis I tried out Actuated and got a shell onto the GH runner itself, and an SSH port forward to try out Jupyter

ssh -L 18888:localhost:8888 -p 35809 runner@<actuated host>

From here I ran Jupyter Notebook interactively and R-nyc.ipynb ran just fine

CleanShot_2023-08-24_at_14 30 16


If the notebook runs ok, on both local and runner environments, then the attention turns to the test itself - papermill.

Here lies the manifestation of the problem (both local and GH runner):

docker exec $(docker ps -q --filter expose=8888) \
              papermill --cwd /home/jovyan/notebooks/ \
                        "/home/jovyan/notebooks/R-nyc.ipynb" \
                        "/home/jovyan/notebooks/papermill-out/R-nyc.ipynb"
Input Notebook:  /home/jovyan/notebooks/R-nyc.ipynb
Output Notebook: /home/jovyan/notebooks/papermill-out/R-nyc.ipynb
Working directory: /home/jovyan/notebooks/
Executing:   0%|          | 0/68 [00:00<?, ?cell/s]Executing notebook with kernel: ir
Executing:  41%|████      | 28/68 [00:01<00:01, 24.06cell/s]
Execution halted
Executing:  46%|████▌     | 31/68 [00:02<00:02, 14.95cell/s]
Traceback (most recent call last):
  File "/opt/conda/bin/papermill", line 8, in <module>
    sys.exit(papermill())
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/papermill/cli.py", line 250, in papermill
    execute_notebook(
  File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 128, in execute_notebook
    raise_for_execution_errors(nb, output_path)
  File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 232, in raise_for_execution_errors
    raise error
papermill.exceptions.PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [15]":
Error in curl::handle_setopt(handle, .list = req$options): Option seekfunction (20167) has unknown or unsupported type.
Traceback:

1. aws.s3::s3saveRDS(x = nyc_data, object = paste0(branch, "/nyc/",
 .     "nyc_permits.R"), bucket = repo_name, region = "", use_https = useHTTPS)
2. put_object(file = tmp, bucket = bucket, object = object, ...)
3. s3HTTP(verb = "PUT", bucket = bucket, path = paste0("/", object),
 .     headers = headers, request_body = file, verbose = verbose,
 .     show_progress = show_progress, ...)
4. httr::PUT(url, H, body = httr::upload_file(request_body), query = query,
 .     show_progress, ...)
5. request_perform(req, hu$handle$handle)
6. curl::handle_setopt(handle, .list = req$options)
rmoff commented

Looking at the container build logs for executions of the action when it worked and when it then didn't I spotted one difference in the R packages:

r-httr-1.4.6 build r42hc72bb7e_1
vs
r-httr-1.4.7 build r42hc72bb7e_0

Looking at the release log for httr I noticed this commit which includes seekfunction as named in this error.

So, something smells fishy - let's hope it's not a red herring … 🐟

(still unclear why it works interactively but not through papermill)

rmoff commented

Logged an issue on the httr project r-lib/httr#746

Glad actuated was of help to you @rmoff