r-lib/keyring

R can read keyrings set by Python, but not vice-versa

awong234 opened this issue · 15 comments

Problem

Python does not decode keys set by R correctly. I suspect it is because of the way R writes with charToRaw and UTF-8 encoding, but I'm not sure. I know you mentioned previously that using UTF-16 always will cause compatibility issues, but I was wondering if this might fix it and if it's still the case.

I am trying to convince my organization to make use of this package, but the argument is weaker when password retrieval is not interoperable. Might it be possible to replace charToRaw with iconv specifying 'utf-16le' in the source? Where might this be problematic?

Thank you very much!

Reprex

Steps to reproduce:

library(keyring)
library(reticulate)

use_condaenv("keyring", required = TRUE)

keyring::key_set_with_value(service = "testR", username = "testUser", password = "test123")
keyring::key_get(service = "testR", username = "testUser")

## [1] "test123"

pyring = reticulate::import("keyring")

# Confirm that python can read python
pyring$set_password("testPython", "testUser", "test123")
pyring$get_password("testPython", "testUser")

## [1] "test123"

# Now test the one set by R. No it cannot.
pyring$get_password(":testR:testUser", "testUser")

## Error in py_call_impl(callable, dots$args, dots$keywords) : 
##  UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x33 in position 6: truncated data 

In python REPL:

import keyring

keyring.get_password(":testR:testUser", "testUser")
##    ...: 
## ---------------------------------------------------------------------------
## UnicodeDecodeError                        Traceback (most recent call last)
## <ipython-input-2-2ec93bfc04c2> in <module>
## ----> 1 keyring.get_password(":testR:testUser", "testUser")
## 
## C:\Anaconda37\lib\site-packages\keyring\core.py in get_password(service_name, username)
##      55     """Get password from the specified service.
##      56     """
## ---> 57     return _keyring_backend.get_password(service_name, username)
##      58
##      59
## 
## C:\Anaconda37\lib\site-packages\keyring\backends\Windows.py in get_password(self, service, username)
##      69             return None
##      70         blob = res['CredentialBlob']
## ---> 71         return blob.decode('utf-16')
##      72
##      73     def _get_password(self, target):
## 
## UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x33 in position 6: truncated data

Fix (?)

A possible solution -- use iconv instead of charToRaw ?

# Encoding issues. In some instances the result is returned as kanji characters due to improper decoding.
utf8 = charToRaw('test')
utf_8_py = reticulate::r_to_py(utf8, convert = FALSE)
utf_8_py$decode(encoding="utf-8")
## test
utf_8_py$decode(encoding="utf-16-le")
## 整瑳

utf16le = iconv(x = 'test', from = '', to = 'utf-16le', toRaw = TRUE)[[1]]
utf16le_py = reticulate::r_to_py(utf16le, convert = FALSE)
utf16le_py$decode(encoding="utf-8")
## Error in py_str_impl(object) : Embedded NUL in string.
utf16le_py$decode(encoding="utf-16-le")
## test

R Configuration

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_1.16 keyring_1.1.0  

loaded via a namespace (and not attached):
[1] compiler_3.6.1   Matrix_1.2-18    assertthat_0.2.1 R6_2.4.1         tools_3.6.1      Rcpp_1.0.5       grid_3.6.1      
[8] jsonlite_1.7.0   lattice_0.20-41 

Conda env .yml

name: keyring
channels:
  - defaults
dependencies:
  - ca-certificates=2020.6.24=0
  - certifi=2020.6.20=py37_0
  - entrypoints=0.3=py37_0
  - importlib-metadata=1.7.0=py37_0
  - importlib_metadata=1.7.0=0
  - keyring=21.2.1=py37_0
  - openssl=1.1.1g=he774522_0
  - pip=20.1.1=py37_1
  - python=3.7.7=h81c818b_4
  - pywin32-ctypes=0.2.0=py37_1001
  - setuptools=49.2.0=py37_0
  - sqlite=3.32.3=h2a8f88b_0
  - vc=14.1=h0510ff6_4
  - vs2015_runtime=14.16.27012=hf0eaf9b_3
  - wheel=0.34.2=py37_0
  - wincertstore=0.2=py37_0
  - zipp=3.1.0=py_0
  - zlib=1.2.11=h62dcd97_4
prefix: C:\Anaconda37\envs\keyring

This is the current logic:

keyring/R/backend-wincred.R

Lines 240 to 248 in e24752d

if (any(password == 0)) {
password <- iconv(list(password), from = "UTF-16LE", to = "")
if (is.na(password)) {
stop("Key contains embedded null bytes, use get_raw()")
}
password
} else {
rawToChar(password)
}

So if there is a zero byte then we use UTF16-LE.

We could also add an encoding parameter to the get/set functions.

Another thing you might be interested is #84, to add functions that do the raw wincred (and then macos keychain) operations. These could also have an encoding argument, defaulting to UTF-16LE.

Interesting, thank you for clarifying -- I must have misunderstood the call order. I understood it as:

keyring::key_set_with_raw_value      calls default_backend()$set_with_raw_value
default_backend()$set_with_raw_value calls b_wincred_set_with_value
keyring:::b_wincred_set_with_value   calls b_wincred_set_with_raw_value, passing charToRaw(password) into the password string

At this step, it appeared to me that the password is converted using charToRaw before being passed to b_wincred_set_with_raw_value

keyring/R/backend-wincred.R

Lines 273 to 274 in e24752d

b_wincred_set_with_raw_value(self, private, service, username,
charToRaw(password), keyring)

And then, the password is eventually set using the following within b_wincred_set_with_raw_value

b_wincred_i_set(target, password = password, username = username)

My thought process was that the charToRaw call could be replaced by iconv specifying the UTF-16LE encoding, which I think would then be able to be read by Python -- of course I can give this a go and get back to you!

I tested the edit out here 4d5211a. Established tests all pass, but additionally, Python is able to read from a key set by R i.e. the following is true, based on the first reprex:

library(keyring)
library(reticulate)

use_condaenv("keyring", required = TRUE)

keyring::key_set_with_value(service = "testR", username = "testUser", password = "test123")
keyring::key_get(service = "testR", username = "testUser")

pyring = reticulate::import("keyring")
pyring$get_password(":testR:testUser", "testUser")
## [1] "test123"

The critical difference is that the last call to python's API pyring$get_password is now able to read from the service, as opposed to the current state where it returns an error.

How does this edit look to you?

This is a breaking change, so it is not that simple.

The encoding of the secrets is apparently application dependent, and Python chose UTF16, and we chose UTF8. Some apps do this, others that. E.g. git uses UTF16, but Slack is UTF8 AFAICT. The node.js client is also UTF8.

FWIW, the Python package is also considering UTF8 support: jaraco/keyring#438

Got it, I understand now. I'll close this issue now, thank you!

Don't close it please, we can still "fix" this here. :) How about the following?

  • Add an encoding option to the get/set functions.
  • Its default can be given using an env var and an option.
  • If these are not set, that is that same as 'auto', which is the current behavior, for compatibility.
  • If set, it has to be a valid encoding, and we use iconv to encode and decode.
  • On Windows, when we run the set method, we also give a message that explains this issue a bit, so it you forgot to set the envvar/option/argument, you'd get some feedback.

Is this good enough for you?

I misunderstood your comment sorry! I am happy to continue testing out solutions in a more robust way, and continue this conversation. I will work on implementing your recommendations in the next few days and get back to you.

Well, we would need to implement those bullet points in keyring first, before you can try anything, I am afraid. :)

Thank you! I apologize I am new to open-source development -- I meant that I would like to try contributing those bullet points to this package. That is, unless you prefer to work on it, in which case I greatly appreciate your work in resolving this issue!

Oh, right, contributions are by all means welcome, so go ahead please. :)

Hi! In case it still can help, UTF-8 decoding for the Python library was added in jaraco/keyring#482

Yes this does improve things quite a bit, thanks for the heads-up Francesco @ftruzzi ! I just tried the example at the top with the latest python keyring version 21.8.0 (which includes the merge you linked), and python can now read the UTF-8 encoded credential set by R.

@gaborcsardi I have a PR out and while the CI isn't passing I was wondering if you could comment on the approach, whether you think the way I've gone about it is valid, and if you think it still needs to be implemented now that this issue is technically resolved. Thank you!

@awong234 If you want a quick solution, look at the oskeyring R package. That implements "raw" access to the credential store on Windows and macOS, and probably solves your issues.

Thanks for the suggestion, Gábor! I hadn't seen that one, I will take a look at that.

Fixed by #88.