akrennmair/newsbeuter

Fetch feeds in parallel ?

kaihendry opened this issue · 24 comments

newsbeuter 2.9 - http://www.newsbeuter.org/
Copyright (C) 2006-2010 Andreas Krennmair

newsbeuter is free software and licensed under the MIT/X Consortium License.
Type `newsbeuter -vv' for more information.

Compilation date/time: Mar  8 2017 21:56:57
System: Linux 4.11.9-1-ARCH (x86_64)
Compiler: g++ 6.3.1 20170306
ncurses: ncurses 6.0.20170527 (compiled with 6.0)
libcurl: libcurl/7.54.1 OpenSSL/1.1.0f zlib/1.2.11 libpsl/0.17.0 (+libicu/59.1) libssh2/1.8.0 nghttp2/1.23.1 (compiled with 7.53.1)
SQLite: 3.19.3 (compiled with 3.17.0)
libxml2: compiled with 2.9.4

It would appear at least from the UI that refreshing feeds goes one by one. Why? It should do them all in parallel, no?

reload-threads (parameters: ; default value: 1)
The number of parallel reload threads that shall be started when all feeds are reloaded. (example: reload-threads 3)

Why is this a parameter? If I have 20 feeds, I expect 20 reload threads, no?

It depends on your network's capabilities.

I'm pretty sure my network can handle many hundreds of parallel connections like any other network...

Really like the reload-threads 0 idea, though the default should still be kept low (perhaps 20?) because each thread takes up memory and not all of us got gigabytes of it laying around.

And we should limit the number of connections per host like browsers do, though in our case we can set it to 1. I wager that:

  • it's rare to fetch multiple feeds from one host; and
  • in such cases, the host itself is well-hosted (Google's Feedburner, BBC etc.), so it'll respond quick enough anyway.

Anyone willing to draw up a PR?

The issue fixed by this commit appears to be in another bugtracker.

Yes, it's on Google Code. No details there, though.

should limit the number of connections per host like browsers do, though in our case we can set it to 1

An option like reload-threads-per-server would be nice for this. My feeds include ~20 YouTube feeds and ~30 GitHub feeds. Firefox still defaults to 6 here, but newsbeuter seems happy fetching these all simultaneously.

Edit: I do see a quick CPU spike across my 4 cores when reloading all feeds.

Anyone willing to draw up a PR?

If I could. :)

An option like reload-threads-per-server would be nice for this.

I think users will just jack it up real high in hopes of getting their feeds a second earlier. Which might even not happen, i.e. it'll be slower, because of TCP slow start.

newsbeuter seems happy fetching these all simultaneously

I consider this a bug :)

FWIW, it appears to check them all in under a second.

Actually I've found it misses several feeds when fetching all at once.

What do you mean by "misses"? How do you check that?

I had enabled delete-read-articles-on-quit, which I didn't know would cause most (if not all) of my feeds to re-add articles, and some were missing until I lowered reload-threads to 30. Have not played with reload-threads since then.

After launching just now (with auto-reload enabled), I've found at least three more feeds that "weren't previously fetched".

I had enabled delete-read-articles-on-quit, which I didn't know would cause most (if not all) of my feeds to re-add articles

I just pushed a fix to master, can you please test? Commit ID is cc5d03e

I'm not sure what's up with articles missing if you have reload-threads set to some high(-ish) value. If you enable error-log, would you see any errors there? (I assume the problem is intermittent and running with -dnewsbeuter.log -l6 isn't an option due to the size of the log.)

@Minoru, thank you, it does work.

Will keep an eye on the error log.

[2017-08-10 15:28:22] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UCR-QYzXrZF8yFarK8wZbHog: HTTP response code said error
[2017-08-10 15:28:22] Error while retrieving https://protonmail.com/blog/feed/: SSL connect error                      
[2017-08-10 15:28:22] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UC2DjFE7Xf11URZqWBigcVOQ: HTTP response code said error
[2017-08-10 15:28:23] Error while retrieving https://www.youtube.com/feeds/videos.xml?channel_id=UC7pp40MU_6rLK5pvJYG3d0Q: HTTP response code said error

Edit: Got basically the same results at the next auto-reload. ProtonMail apparently borked SSL for their feed. No errors when I :set reload-threads 30 and manually reload-all.

Edit2: 8 YT feed errors after setting reload-threads to 30 in the config and restarting newsbeuter. None after lowering it to 20 and a subsequent restart. Only 1 error after setting it back to 30 and restarting. Not getting errors for ProtonMail anymore..

Edit3: Another auto-reload: 2 more YT errors and the ProtonMail error is back.

Edit4: I receive YT errors nearly every auto-reload. reload-threads has been set to 10 for the last two auto-reloads.

I'll try to add the actual HTTP code to the log tomorrow—right now it's quite useless. Knowing what kind of errors it is will help understand what can/should we do about them. I wager it's 429 because you're fetching a lot of stuff simultaneously.

Today with reload-threads set to 30:

[2017-08-11 18:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 19:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 20:33:37] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 21:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates
[2017-08-11 22:33:36] Error while retrieving https://lwn.net/headlines/newrss: Peer certificate cannot be authenticated with given CA certificates

(Edited your comment to move the log excerpt from external Pastebin into the comment itself. Had enough Paste- and imagebin links expire on me to not trust any. Nothing short of GitHub's death should take this issue tracker down :)

SSL errors are easily explained if you look at LWN's certificate—it has been issued yesterday, on August 11th. Apparently they forgot to do it and the HTTPS was down for a few hours. No worries.

@polyzen, I just added the HTTP code to the error message. Let's see what causes those fetch failures for you!

The error log was empty all day, until 3 404's from YT about an hour ago.

Hmm. A fluke on YouTube's site? That's weird, but I can't do anything about it.

I'm not even sure what we're looking for anymore. Apparently reload-threads doesn't produce a ton of errors, so... problem solved?

Thank you. Will continue to play around with the setting.

Not seeing any noteworthy errors while fetching all feeds at once.