marcomontalbano/html-miner

Empty array with multiple sites

Closed this issue · 2 comments

Hey!
I have the following code:

var sites = [
    
    {
      "url": "https://www.origo.hu/index.html",
      "selector": "a.news-title",
      "side": "right"
    },
    {
      "url": "https://index.hu",
      "selector": "h1.cikkcim a",
      "side": "left"
    }

  ]
for (site of sites) {
	superagent
	.get(site.url)
	.end((err, res) => {

		htmlMiner(res.text,{
			_each_: site.selector,
			text: function(arg) {
				return arg.$scope.text();
			},
			href: function(arg) {
				return arg.$scope.attr('href');
			}
		})
	});

} 

The parsing of one site does always work. But when using two sites, it returns the results from one site, and an empty array for the another site.
Why? I guess it doesn't parse it fast enough? How could I make it wait and then print the results?
Can you help me?
Thank you

Hi @daaniiieel,
htmlMiner is a synchronous task, that means that it takes its time to parse the full HTML.

The issue here is the combination of a for-of loop with an asynchronous task.

for (site of sites) {
    superagent
    .get(site.url)
    .end((err, res) => {
        console.log(site.selector)
    });
}

You might ordinarily expect this code to print a.news-title and then h1.cikkcim a, but it outputs h1.cikkcim a two times.

If you can use ES2015 syntax in your node.js project, you can easly solve the issue just using let or const instead of the implicit variable declaration.

- for (site of sites) {
+ for (const site of sites) {
    superagent
    .get(site.url)
    .end((err, res) => {
        const json = htmlMiner(res.text,{
            _each_: site.selector,
            text: function(arg) {
                return arg.$scope.text();
            },
            href: function(arg) {
                return arg.$scope.attr('href');
            }
        })

        console.log(json)
    });
}

Let me know if this solves your issue.