daisy/pipeline

Support running behind a proxy

Opened this issue · 6 comments

Expected Behavior

It should be easy to configure proxy settings. Most non-Java applications reads the http_proxy environment variable, while Java applications use the http.proxyHost (and related) system properties: https://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html#Proxies

Setting http.proxyHost, http.proxyPort, https.proxyHost and https.proxyPort doesn't work for me. I'm not quite sure if I'm doing something wrong, or if Pipeline 2 simply cannot be configured to use a proxy for its HTTP(S) requests (made by p:http-request etc.).

Actual Behavior

org.daisy.common.xproc.XProcErrorException: org.apache.http.conn.HttpHostConnectException: Connect to www.nlb.no:80 [www.nlb.no/13.53.103.156] failed: Connection timed out (Connection timed out)
	at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:255) ~[na:na]
	at org.daisy.pipeline.job.Job.run(Job.java:218) ~[na:na]
	at org.daisy.pipeline.job.impl.DefaultJobExecutionService$1.run(DefaultJobExecutionService.java:111) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: com.xmlcalabash.core.XProcException$3: org.apache.http.conn.HttpHostConnectException: Connect to www.nlb.no:80 [www.nlb.no/13.53.103.156] failed: Connection timed out (Connection timed out)
	at com.xmlcalabash.core.XProcException.rebaseOnto(XProcException.java:336) ~[na:na]
	at com.xmlcalabash.core.XProcException.rebaseOnto(XProcException.java:305) ~[na:na]
	at com.xmlcalabash.runtime.XStep.handleException(XStep.java:313) ~[na:na]
	at com.xmlcalabash.runtime.XPipelineCall.run(XPipelineCall.java:103) ~[na:na]
	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:236) ~[na:na]
	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136) ~[na:na]
	at org.daisy.common.xproc.calabash.impl.CalabashXProcPipeline.run(CalabashXProcPipeline.java:251) ~[na:na]
	... 3 common frames omitted
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to www.nlb.no:80 [www.nlb.no/13.53.103.156] failed: Connection timed out (Connection timed out)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:158) ~[na:na]
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353) ~[na:na]
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380) ~[na:na]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) ~[na:na]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184) ~[na:na]
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88) ~[na:na]
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[na:na]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184) ~[na:na]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[na:na]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) ~[na:na]
	at com.xmlcalabash.library.HttpRequest.run(HttpRequest.java:346) ~[na:na]
	at com.xmlcalabash.runtime.XAtomicStep.run(XAtomicStep.java:396) ~[na:na]
	at com.xmlcalabash.runtime.XCompoundStep.run(XCompoundStep.java:264) ~[na:na]
	at com.xmlcalabash.runtime.XOtherwise.run(XOtherwise.java:31) ~[na:na]
	at com.xmlcalabash.runtime.XChoose.run(XChoose.java:142) ~[na:na]
	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:236) ~[na:na]
	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136) ~[na:na]
	at com.xmlcalabash.runtime.XPipelineCall.run(XPipelineCall.java:101) ~[na:na]
	... 6 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
	at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
	at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
	at java.base/java.net.Socket.connect(Socket.java:591) ~[na:na]
	at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74) ~[na:na]
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141) ~[na:na]
	... 23 common frames omitted

Environment

  • Operating system: Docker instance of the NLB branch
  • DAISY Pipeline 2 version: 1.12.1-SNAPSHOT (note: version from NLB branch)
  • Interface: Any

Hmm, I wonder why it is not already working.

Just for a good understanding, what is the use case?

The use case is that we've moved our servers to the @NationalLibraryOfNorway and to be able to reach the internet from their servers, we need to run all requests through their proxy.

In another Java application we use, I had to add some code to make it work: nlbdev/quickbase-dump@1b0a0cd

But why do you need to reach the internet? (Just curious.)

We produce our newsletter with a Pipeline 2 script, and that script retrives a list of all new books from our library system, news from our website, and then posts the results to mailchimp. I can also imagine other scripts making http requests (maybe resolving DTDs referenced from uploaded XML files), although it's the newsletter that currently fails.

I'm also actually getting a connection timeout while reading a file URI with saxon (invoked directly, not through pipeline 2), so there's definitely something I haven't configured right. It seems to be mainly a Java issue, as bash, python etc. seems to respect the http_proxy environment variable.

It seems we've found a solution for our particular case, by just opening up for the specific domains we need, so this is not a priority for us anymore.

OK good. It would still be nice to find a proper solution to this problem, but I have no idea how I could simulate being behind a proxy. With a VM maybe, but I don't know if I want to go through so much trouble...