Difference WebApi - Api
GoogleCodeExporter opened this issue · 1 comments
GoogleCodeExporter commented
I am using Boilerpipe for both web-api and api . For example on the site
http://www.davidicke.com/forum/showthread.php?page=2&t=72909 , Boilerpipe
WebAPI working properly while the boilerpipe api return the error
"java.io.IOException: Server returned HTTP response code: 403 for URL:
http://boilerpipe-web.appspot.com/extract?url=http://www.davidicke.com/forum/sho
wthread.php?page%3D2%26t%3D72909&extractor=KeepEverythingExtractor&output=htmlFr
agment"
Help me! I do not use any proxy
Original issue reported on code.google.com by lopiccol...@gmail.com
on 28 Mar 2013 at 4:37
GoogleCodeExporter commented
i think the problem is because they do not use an user agent when asking for
the html, and thus creates an error 403 in some websites, but you can try to
download the html manually and then send that to the
ArticleExtractor.INSTANCE.getText(String text) but i am not sure.
Original comment by jorgec...@gmail.com
on 17 Aug 2013 at 12:35