Dynamic example fails on OpenShift instance
Closed this issue · 5 comments
I am using scraperjs@0.3.4
on an OpenShift instance. I am trying the examples on the README page. The static example works, but the dynamic one fails with a strange error. Any hints?
I would expect it might be due to the limited environment available on OpenShift instances and I wonder what is the cause so I can try to fix it.
> cat > static.js
var scraperjs = require('scraperjs');
scraperjs.StaticScraper.create('https://news.ycombinator.com/')
.scrape(function($) {
return $(".title a").map(function() {
return $(this).text();
}).get();
}, function(news) {
console.log(news);
})
> node static.js
[ 'Show HN: My SSH server knows who you are',
'Show HN: JAWS – A JavaScript and AWS Stack',
'Federal Judge Strikes Down Idaho ‘Ag-Gag Law’',
...
> cat > dynamic.js
var scraperjs = require('scraperjs');
scraperjs.DynamicScraper.create('https://news.ycombinator.com/')
.scrape(function() {
return $(".title a").map(function() {
return $(this).text();
}).get();
}, function(news) {
console.log(news);
})
> node dynamic.js
events.js:72
throw er; // Unhandled 'error' event
^
Error: listen EACCES
at errnoException (net.js:905:11)
at Server._listen2 (net.js:1024:19)
at listen (net.js:1065:10)
at net.js:1147:9
at asyncCallback (dns.js:68:16)
at Object.onanswer [as oncomplete] (dns.js:121:9)
It seems that scraperjs
tries to open a port on the host. When I run the dynamic example on a different host I capture this with netstat
:
tcp 0 0 127.0.0.1:45873 0.0.0.0:* LISTEN 355/node
The port is a random port which an user account should be able to open but I guess this is not allowed on OpenShift instances. Could the library achieve the required functionality without opening a port?
Actually it seems that ports can be open but only within this range [1]:
It is possible to bind to the internal IP with port range: 15000 - 35530.
I think I fixed this. DynamicScraper uses Phantom. Phantom allows for an options
argument where a port can be specified. I added an options
argument to DynamicScraper where a port can be specified. For example:
scraperjs.DynamicScraper.create(url, {port: 29999})
See #42 for a pull request.
For OpenShift, since it only allow for opening ports on the internal IPs [1], a complete example is:
scraperjs.DynamicScraper.create(url, {
port: 29999,
hostname: process.env.OPENSHIFT_NODEJS_IP || '127.0.0.1'
})
I'm glad that you were able to solve the issue. It's related with node-phantom
. They need a network connection to communicate, hence the necessity for a port definition.