sneako/finch

Improve error messages on invalid URLs

wojtekmach opened this issue · 3 comments

In wojtekmach/req#81, we noticed Finch exits on an invalid URL:

Mix.install([
  {:finch, "~> 0.11.0"}
])

{:ok, _} = Finch.start_link(name: MyFinch)
Finch.build(:get, "http://") |> Finch.request(MyFinch) |> IO.inspect()
** (exit) :badarg
    (finch 0.11.0) lib/finch/http1/pool.ex:62: Finch.HTTP1.Pool.request/5
    (finch 0.11.0) lib/finch.ex:294: Finch.request/3
    bug.exs:6: (file)

I did a quick survey of a couple of similar edge cases and here are the result:

defmodule Main do
  def main do
    {:ok, _} = Finch.start_link(name: MyFinch)

    get("bad")
    get("http:/")
    get("http://")
    get("https://")
  end

  def get(url) do
    Finch.build(:get, url)
    |> Finch.request(MyFinch)
    |> IO.inspect(label: inspect(url))
  catch
    kind, reason ->
      {kind, reason}
      |> IO.inspect(label: inspect(url))
  end
end
"bad": {:error, %ArgumentError{message: "scheme is required for url: bad"}}
"http:/": {:error,
 %ArgumentError{
   message: "the :hostname option is required when address is not a binary"
 }}
"http://": {:exit, :badarg}
"https://": {:error,
 %Mint.TransportError{
   reason: {:options,
    {:socket_options,
     [
       nodelay: true,
       keepalive: true,
       packet_size: 0,
       packet: 0,
       header: 0,
       active: false,
       mode: :binary
     ]}}
 }}

(The "https://" one is especially odd!)

All these errors are slightly different, perhaps it would be worthwhile to make them a bit more uniform.

Great catch! I agree we should clean these up and probably return %Finch.Error{}s instead.

I did some investigations and found that the reason of this :exit, :badarg seems to have its root in OTP's inet_parse:visible_string/1, which obliges the given string consists of only characters in the range 0x21..0x7e, otherwise gen_tcp:connect/4 returns :badarg which we see.

iex(22)> Mint.HTTP.connect(:http, "", 80, [])                               
** (exit) :badarg
    (kernel 8.3.1) gen_tcp.erl:218: :gen_tcp.connect/4
    (mint 1.4.0) lib/mint/core/transport/tcp.ex:41: Mint.Core.Transport.TCP.connect/3
    (mint 1.4.0) lib/mint/http1.ex:115: Mint.HTTP1.connect/4
iex(22)> Mint.HTTP.connect(:http, "über.de", 80, [])
** (exit) :badarg
    (kernel 8.3.1) gen_tcp.erl:218: :gen_tcp.connect/4
    (mint 1.4.0) lib/mint/core/transport/tcp.ex:41: Mint.Core.Transport.TCP.connect/3
    (mint 1.4.0) lib/mint/http1.ex:115: Mint.HTTP1.connect/4

I'm interested to address this issue, but I'm also wondering how you'd like this :badarg to be taken care of. My idea is either reimplementing visible_string/1, or leaving it at all and letting it exit. I've thought of catching :exit and returning %Finch.Error{}, but I feel it's not going to be comfortable to live with.

Because inet_parse:visible_string/1 is part of kernel, I bet it is ok to assume it does not change too soon, but it goes against the fail-fast philosophy. On the other hand letting it return the terse :badarg is not only unhelpful but also intimidating. Of these two I'd prefer the former, maybe reflecting my habit of defensive programming 💭

What do you think? Or do you have any other ideas?

ndrean commented

I would insert a filter is_valid_url/1 before using Finch.

def get(url) do
  with {:check_url, true} <- {:check_url, is_valid_url?(string)},
          req <- Finch.build(:get,url),
         {:ok, response} <- Finch.request(req, MyFinch) do
      .....   
  else
      {:check_url, false} ->
         {:error, "invalid url"}
   end
end

def is_valid_url?(string) do
    map = URI.parse(string)
    Enum.map([:scheme, :host, :port], &Map.get(map, &1))
    |> Enum.all?()
end