wojtekmach/req

Req exits with :badarg when redirecting to invalid URL

bfolkens opened this issue · 5 comments

When redirect encounters an invalid URL, Req will exit with :badarg due to Finch exiting with the same. The following error is an example:

iex> Req.new(method: :get, url: "http://mob.muchong.com/html/201502/8544958.html") |> Req.Request.run()

 [debug] redirecting to //muchong.com/html/201502/8544958.html
** (exit) :badarg
    (finch 0.18.0) lib/finch/http1/pool.ex:96: Finch.HTTP1.Pool.request/6
    (finch 0.18.0) lib/finch.ex:472: anonymous fn/4 in Finch.request/3
    (req 0.4.13) lib/req/steps.ex:979: Req.Steps.run_finch_request/3
    (req 0.4.13) lib/req/steps.ex:837: Req.Steps.run_finch/4
    (req 0.4.13) lib/req/request.ex:993: Req.Request.run_request/1
    (req 0.4.13) lib/req/request.ex:938: Req.Request.run/1
    (req 0.4.13) lib/req/steps.ex:1977: Req.Steps.redirect/1
    (req 0.4.13) lib/req/request.ex:1010: anonymous fn/2 in Req.Request.run_response/2
    (elixir 1.15.6) lib/enum.ex:4830: Enumerable.List.reduce/3
    (elixir 1.15.6) lib/enum.ex:2564: Enum.reduce_while/3
    (req 0.4.13) lib/req/request.ex:938: Req.Request.run/1

The request above fails after it redirects to //muchong.com/html/201502/8544958.html. As it turns out, After URI.merge/2 in Req.Steps.build_redirect_request/2 - the URL port may become nil if merged with relative scheme ("//somehost.com").

URI.merge(URI.parse("https://good.url/"), URI.parse("//test")).port == nil

In other words, the following triggers the failure:

Req.new(method: :get, url: %URI{scheme: "http", host: "muchong.com", path: "/html/201502/8544958.html"}) |> Req.Request.run()

And the following succeeds:

Req.new(method: :get, url: %URI{scheme: "http", port: 80, host: "muchong.com", path: "/html/201502/8544958.html"}) |> Req.Request.run()

Initially I thought to just monkeypatch URI.merge/2, however, since there are other cases where it seems we're encountering invalid URLs, what I did internally for our use case was to add a URL validation step around line 2005 after the merge, and then halt the pipe with an error if there is an invalid URL.

Another thought would be if we should just add a URL validation "step" instead of putting this into the redirect step explicitly.

Happy to submit a patch if that's acceptible for the library...

Thank you for the report.

I'm not sure if it's standardised but I believe skipping the scheme, //example.com as opposed to http[s]://example.com, implies the scheme that is currently used. I think I maybe saw it in HTML referencing assets like CSSes and images.

In other words, if it's standardised (ideally in https://www.rfc-editor.org/rfc/rfc9110.html) that redirects without scheme should imply current scheme, we should fix it in Req. For example, if scheme and port is nil but host is set, fill them in.

Could you investigate?

The function URI.merge/2 correctly merges the URI scheme, but leaves port nil which is what causes the :badarg downstream.

iex(1)> URI.merge(URI.parse("https://good.url/"), URI.parse("//relative"))
%URI{
  scheme: "https",
  authority: "relative",
  userinfo: nil,
  host: "relative",
  port: nil,
  path: nil,
  query: nil,
  fragment: nil
}

In other words, if it's standardised (ideally in https://www.rfc-editor.org/rfc/rfc9110.html) that redirects without scheme should imply current scheme, we should fix it in Req. For example, if scheme and port is nil but host is set, fill them in.

Yes, RFC 9110 15.4 Redirection 3xx step 1 indicates that a relative Location header should be resolved relative to the original request URI.

However, RFC 3986 5.2 Relative Resolution doesn't seem to prescribe any steps for including the port in the merge steps. It looks like that's covered in RFC 3986 6.2.3 Scheme-Based Normalization, where the spec indicates an empty port should be assigned that of the default one for the scheme (as we would expect).

Since the URI.merge/2 documentation suggests that they're trying to follow RFC 3986 5.2 Relative Resolution, it seems that we would then need some sort of "normalize URI" step prior to exiting Req.Steps.build_redirect_step/2.

While something of the above will solve the nil port problem, we still experience issues with redirects where URI's contain characters that fail :inet_parse.visible_string/1, so I think we should also include some sort of halt and exception instead of allowing the :bardarg from Finch once we solve the above issue.

Something like the following solves the nil port problem:

diff --git a/lib/req/steps.ex b/lib/req/steps.ex
index 712a8a5..222c28c 100644
--- a/lib/req/steps.ex
+++ b/lib/req/steps.ex
@@ -1964,7 +1964,10 @@ defmodule Req.Steps do
           request.options[:redirect_trusted]
       end

-    location_url = URI.merge(request.url, URI.parse(location))
+    location_url =
+      request.url
+      |> URI.merge(URI.parse(location))
+      |> normalize_redirect_uri()

     request
     # assume put_params step already run so remove :params option so it's not applied again
@@ -1980,6 +1983,10 @@ defmodule Req.Steps do
     Logger.log(level, ["redirecting to ", location])
   end

+  defp normalize_redirect_uri(%URI{scheme: "http", port: nil} = uri), do: %URI{uri | port: 80}
+  defp normalize_redirect_uri(%URI{scheme: "https", port: nil} = uri), do: %URI{uri | port: 443}
+  defp normalize_redirect_uri(%URI{} = uri), do: uri
+
   # https://www.rfc-editor.org/rfc/rfc9110#name-301-moved-permanently and 302:
   #
   # > Note: For historical reasons, a user agent MAY change the request method from

This seems related sneako/finch#186

Fixed in #328