dash doesn't like the yahoo page

Question

dash doesn't like the yahoo page

gamag opened this issue 7 years ago · 4 comments

When using scrape_yahoo on debian with sh linked to dash 0.5.7, the output file is empty most of the time.

Testing
sh -c 'echo "$(wget https://help.yahoo.com/kb/SLN23997.html -q -O -)"'
mostly gives an incomplete page (output stops somewhere in the css styles). Without the echo and $() there is nothing missing.

I assume this is due to some Unicode characters or null bytes that dash doesn't like, changing shebang to bash solves the problem for me. Using a pipe directly between wget to grep seems to work too, but maybe there is a cleaner solution.

Answer 1 · 2017-08-19T08:52:35.000Z

I can confirm that this happens consistently. The shebang interpreter should be changed to /bin/bash to solve this issue. In the meantime, if anybody else looks for the soluton, you can run the scrape_yahoo script from cron by running it via bash <path-to>/scrape_yahoo.

Answer 2 · 2017-09-12T17:40:33.000Z

Hi, @drybjed. The original script was re-written a while ago to rely on /bin/sh instead of Bash so that it could run on more systems. I can't replicate the issue, since I'm running the script on a CentOS box and /bin/sh seems to be parsing the Yahoo data fine.

Any other suggestions for fixing the problem for Debian users while still keeping the script as "universal" as possible?

Answer 3 · 2017-10-12T14:46:31.000Z

Downloading to a temporary file might be an option - wget/curl can write directly to files, grep can read from files, so no redirection from the shell would be needed.

Answer 4 · 2017-10-13T12:43:26.000Z

If /bin/sh != bash then scrape_yahoo does not work as it is a bash script and not e.g. a dash script. If you are using the bash dialect in your script you should use /bin/bash as interpreter.

On CentOS /bin/sh is a symbolic link to /bin/bash.