dash doesn't like the yahoo page
gamag opened this issue · 4 comments
When using scrape_yahoo on debian with sh linked to dash 0.5.7, the output file is empty most of the time.
Testing
sh -c 'echo "$(wget https://help.yahoo.com/kb/SLN23997.html -q -O -)"'
mostly gives an incomplete page (output stops somewhere in the css styles). Without the echo
and $()
there is nothing missing.
I assume this is due to some Unicode characters or null bytes that dash doesn't like, changing shebang to bash solves the problem for me. Using a pipe directly between wget to grep seems to work too, but maybe there is a cleaner solution.
I can confirm that this happens consistently. The shebang interpreter should be changed to /bin/bash
to solve this issue. In the meantime, if anybody else looks for the soluton, you can run the scrape_yahoo
script from cron by running it via bash <path-to>/scrape_yahoo
.
Hi, @drybjed. The original script was re-written a while ago to rely on /bin/sh instead of Bash so that it could run on more systems. I can't replicate the issue, since I'm running the script on a CentOS box and /bin/sh seems to be parsing the Yahoo data fine.
Any other suggestions for fixing the problem for Debian users while still keeping the script as "universal" as possible?
Downloading to a temporary file might be an option - wget/curl can write directly to files, grep can read from files, so no redirection from the shell would be needed.
If /bin/sh != bash then scrape_yahoo does not work as it is a bash script and not e.g. a dash script. If you are using the bash dialect in your script you should use /bin/bash as interpreter.
On CentOS /bin/sh is a symbolic link to /bin/bash.