ruippeixotog/scala-scraper

How to check for status code?

mjenczmyk opened this issue · 3 comments

Hi,

Is there any way to check for status code when making a HTTP request? For instance, when executing

browser.post(loginURL, Map(
        "email" -> email,
        "password" -> password,
        "op" -> "Login",
        "form_build_id" -> getLoginFormID,
        "form_id" -> "packt_user_login_form"
    ))

how can one check whether status code is 200 (successful logging) or not? One way would be to write a custom HtmlValidator to validate returned Document, but can I do it explicitly by checking status code?

Hi @mjenczmyk! The behavior of a Browser when a request returns a non-200 status code is implementation-dependent. In the case of jsoup, for example, an HttpStatusException is thrown:

scala> JsoupBrowser().get("https://example.com/non_existing_page")
org.jsoup.HttpStatusException: HTTP error fetching URL
  at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:760)
  at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:706)
  at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:299)
  (...)

You can catch this exception to check the specific status code that was returned.

Note that scala-scraper does not intend to provide full-fledged HTTP clients. While the Browser class surely helps in the most common use cases, dealing with more complex HTTP responses may require using a proper external HTTP client and passing the response body to Browser#parseString manually.

Thanks, that'll help. I was using scalaj-http when making more complex requests (as you've suggested), but now I'll be able to use plain scala-scraper more. Thanks for help!

Feel free to close the issue if you want to.

For me changing the user agent to "Mozilla/5.0" alone fixed the issue.
Document doc = Jsoup.connect(url)
.userAgent("Mozilla/5.0")
.timeout(30000)
.get();