eafer/rdrview

The futex facility returned an unexpected error code

m040601 opened this issue · 6 comments

I get the message

The futex facility returned an unexpected error code

on pages like this, http://www.softpanorama.org/Admin/Monitoring/sar.shtml

Using the flag "--disable-sandbox" , rdrview does work and does the job.

The only "abnormal" thing I notice is that this page is served through http not https, it is .shtml not .html
and it's document charset is cp1252 Latin1, not Unicode.

Is this expected ?

eafer commented

The problem is indeed with the encoding of the document.

Got it.

...which should fix your problem

It did. The errror message is gone.

Funny thing. I've been using these "readability" scripts written in python and other scripting languages for years.

And I had never thought about the security implications of parsing eventually "foreign evill" html that you "stuff" into my system.
But that's what "real" browsers have to do all the time right ?

Now I remember, there's even a page in the ArchWiki on how to run Firefox "sandboxed" in Firejail.

The only other command line program that I fed loads of external foreign html to in my system is Pandoc. But is written in Haskell, and so i hear, should be secure.

So, as an end user, not a developer, I'm now beginnig to understand better why a "simple" C program parsing "foreign" html has to be somehow "sandboxed" to shield the system from an attack.

But it makes sense.
Just like you should never do

curl https://github.com/some_script.sh | sh

Who knows what could be put in a html page.
But this is a very improbable vector attack right ?
Modern linux systems shield me as an end user from C programs parsing "crazy" html right ?

This is all "obvious" to you of course. But not for an end user.

Could you edit just one or two lines to the README about this security implications ? Just so that end users can some how understand where and why these security related error messages come from.

I did notice the "security" section,

This tool is young and written in C, so it's reasonable to wonder about the potential for memory issues. To be safe, all HTML parsing happens inside a sandboxed subprocess. Seccomp is used for this purpose on Linux, Pledge on OpenBSD, and Capsicum on FreeBSD.

but took it for a message for programmers.

And who is this "security sandbox" anyway ? Is it a feature of the kernel itself ? Or is it some tightening/tunning made by my linux distro ? Or some big library ? glib ?

PS: Found this

https://en.wikipedia.org/wiki/Seccomp

that you could had to the README

eafer commented

And I had never thought about the security implications of parsing eventually "foreign evill" html that you "stuff" into my system.
But that's what "real" browsers have to do all the time right ?

Real browsers have their own sandboxes like rdrview, but far more complicated. This means that, once in a while, a security researcher will find a way to bypass them. You can setup your system to sandbox them further if you want, but it isn't always practical.

Rdrview's sandbox is very simple and tight, so I don't think it's likely to be bypassable, unless there are bugs in the kernel.

So, as an end user, not a developer, I'm now beginnig to understand better why a "simple" C program parsing "foreign" html has to be somehow "sandboxed" to shield the system from an attack.

A sandbox of some sort is a good idea for code written in any language. It means that you only need to audit a small fraction of my code to confirm that it's not doing anything stupid or malicious; without a sandbox you would have to read the whole thing. C and C++ just have the additional issue of memory bugs, but I'm not sure if the risk is big in practice, for something like rdrview.

But this is a very improbable vector attack right ?

For rdrview, yes, but it doesn't hurt to be careful. For high value targets with huge codebases like Firefox or Chrome, the risk is much bigger.

Modern linux systems shield me as an end user from C programs parsing "crazy" html right ?

You can't assume that in general: security is usually in the hands of the developer of the program. But it's rare to parse html in C these days. I think some distros are moving towards doing some sandboxing themselves (via AppArmor or the like) but we aren't there yet.

This is all "obvious" to you of course. But not for an end user.

I guess I assumed that most people who are willing to build a command-line tool from source would know this stuff already, or research it on their own.

And who is this "security sandbox" anyway ? Is it a feature of the kernel itself ? Or is it some tightening/tunning made by my linux distro ? Or some big library ? glib ?

Seccomp is a service provided by the kernel, that the program needs to setup and request. You can't do sandboxing in a library because it runs with the same privileges as the program, so it could be bypassed easily.

that you could had to the README

I like to keep the readme short, just quick installation instructions. Most of the usage information is in the man page, which I prefer to keep small too so that users can actually read it and there are no surprises. There is an endless amount of extra information that could be added, about how rdrview works, or about different ways to use it (like the w3m shortcuts you mentioned in the other post). I might add some of that in time, but in the meantime it might be more practical if I just start a wiki here on Github, and you can write your findings yourself for other users.

Thanks for taking the time for giving this feedback and insights.
I learned more with it today, than in years of browsing with command line apps.

... But it's rare to parse html in C these days....
.... I guess I assumed that most people who are willing to build a command-line tool from source would know this stuff already, or research it on their own....

Just before you close this issue, and in case you're curious, and so that you can understand why someone who's not a developer is so much interested in this.

I've been super proficient in "modern" browsers Firefox/Qutebrowers for years. I can bend their customizations to my needs. Hack user.js and twist them to be keyboard based to my liking. Without touching the GUI or pressing buttons.

But, the way the "modern" web seems to be going, I am actually in the process of dumping trashing out completely those "modern" tools, and pushing hard to go back to simple tools. Very inspired also by gemini this year.

In my personal case, a heavy command line and unix user, w3m, (elinks, lynx, newsbeuter etc) and others )are not just a gimmick or an ocasional tool for browsing the web. I actually use them daily as my main tools.

Command line html parsing is my firefox.

It's associates (youtube-dl, tmux, newsbeuter, readabilty,pandoc,mpv,vim,git etc) are my desktop.They suck, parse and consumes thousand of http/html lines daily, coming out from the outside "evil" internet. Daily.

One of my raspberry pi's swallows daily thousand of rss and html, parses, trims and filters that soup with rdrview and many others and produces simple epub ebooks that I can read offline later away from the internet.

And it's not only for work or serious stuff. Not just reading/dumping "simple" websites for productive/professional use.

It's even for fun and time wasters.To do the same other people need a 1000$ desktop mac or an expensive smartphone..

I'm talking about heavy javascript infested stuff like youtube, twitter, facebook, reddit etc. Pure entertainment. With the help from youtube-dl, invidious, nitter.net, teddit.net etc . Many cant just believe that you can also consume than on the command line with tools that were written some 30 or 40 years ago.

i got error from rdrview

The futex facility returned an unexpected error code.

how to fix it? please

@Phantasimay sorry for the long delay, I haven't been paying much attention to rdrview. If you still care about this, I would need to see the url for the page that's giving you trouble.