marklogic/java-client-api

Non-US-ASCII URIs are mangled when reading a multipart response from v1/documents

Closed this issue · 2 comments

See #1687 for a failing test that demonstrates this from the user perspective.

Reasons why this is happening:

  1. Java Mail (either javax.mail or jakarta.mail, same behavior with both) has an InternetHeaders class that requires multipart header fields to adhere to RFC 822, which requires US-ASCII characters - see https://docs.oracle.com/javaee/7/api/javax/mail/internet/InternetHeaders.html (not that latest link, but the docs are the same in the latest jakarta.mail version of this class).
  2. MarkLogic URIs of course do not require US-ASCII characters.
  3. If a MarkLogic URI does have non-US-ASCII characters, those get mangled when the Java Client fetches a multipart response from v1/documents, where each body part has the URI in a header.

I verified that if we were to switch to OkHttp's new MultipartReader - see https://square.github.io/okhttp/5.x/okhttp/okhttp3/-multipart-reader/index.html - then we don't run into this issue because that feature is not enforcing RFC 822.

Testing has shown that setting mail.mime.allowutf8 property does not work with the com.sun.mail:javax.mail:1.6.2 dependency, but it does work when using the newer Jakarta Mail dependencies:

implementation "jakarta.mail:jakarta.mail-api:2.1.3"
implementation "org.eclipse.angus:angus-mail:2.0.3"

Ideally, the Java Client fixes this by shifting to OkHttp's MultipartReader, but that right now is a significant change. For a connector or an application like Flux, we can fix this by changing to jakarta-mail and eagerly setting that System property.

Resolved via #1689