stevejenkins/postwhite

dash doesn't like the yahoo page

gamag opened this issue · 4 comments

gamag commented

When using scrape_yahoo on debian with sh linked to dash 0.5.7, the output file is empty most of the time.

Testing
sh -c 'echo "$(wget https://help.yahoo.com/kb/SLN23997.html -q -O -)"'
mostly gives an incomplete page (output stops somewhere in the css styles). Without the echo and $() there is nothing missing.

I assume this is due to some Unicode characters or null bytes that dash doesn't like, changing shebang to bash solves the problem for me. Using a pipe directly between wget to grep seems to work too, but maybe there is a cleaner solution.

I can confirm that this happens consistently. The shebang interpreter should be changed to /bin/bash to solve this issue. In the meantime, if anybody else looks for the soluton, you can run the scrape_yahoo script from cron by running it via bash <path-to>/scrape_yahoo.

Hi, @drybjed. The original script was re-written a while ago to rely on /bin/sh instead of Bash so that it could run on more systems. I can't replicate the issue, since I'm running the script on a CentOS box and /bin/sh seems to be parsing the Yahoo data fine.

Any other suggestions for fixing the problem for Debian users while still keeping the script as "universal" as possible?

gamag commented

Downloading to a temporary file might be an option - wget/curl can write directly to files, grep can read from files, so no redirection from the shell would be needed.

If /bin/sh != bash then scrape_yahoo does not work as it is a bash script and not e.g. a dash script. If you are using the bash dialect in your script you should use /bin/bash as interpreter.

On CentOS /bin/sh is a symbolic link to /bin/bash.