philss/floki

Unhandled error for Floki.parse_fragment/2

fireproofsocks opened this issue · 1 comments

Description

Unhandled error parsing malformed fragment.

To Reproduce

Steps to reproduce the behavior:

  • Using Floki v0.34.0
  • Using Elixir v1.13.4
  • Using Erlang OTP v24.1.7
  • With this code:
iex> input =  "<div style=\"text-align:center;width:100%;margin:22px 0;height:1px;border-top:1px solid #DDDDDD\"></div> <center><div class=\"transparency-container aplus-content-container\"> <a href=\"/b?node=12691228011\"><h3><img src=\"https://images-na.ssl-images-amazon.com/images/G/01/img16/pc/easychoice/landing/easychoice_landing_header.jpg\" width=\"65%\"/></h3></a></center></div> <div style=\"text-align:center;width:100%;margin:22px 0;height:1px;border-top:1px solid #DDDDDD\"></div><B>Internal Modem</B><br> NETGEAR's DG814.<B>Comprehensive</B><br> DG814s (such as NetMeeting).<B>Protective</B><br>. NAT (Network Address Translation).<B>Powerful</B><br> Ultra-fast 10/100 m (328 ft). a 50&#-37;me.<B>Uncomplicated</B><br>"

iex> Floki.parse_fragment(input)

** (ArgumentError) argument error
    (floki 0.34.0) lib/floki/entities.ex:16: Floki.Entities.decode/1
    (floki 0.34.0) src/floki_mochi_html.erl:700: :floki_mochi_html.tokenize_charref_raw/3
    (floki 0.34.0) src/floki_mochi_html.erl:650: :floki_mochi_html.tokenize_charref/2
    (floki 0.34.0) src/floki_mochi_html.erl:298: :floki_mochi_html.tokens/3
    (floki 0.34.0) src/floki_mochi_html.erl:83: :floki_mochi_html.parse/1
    (floki 0.34.0) lib/floki/html_parser/mochiweb.ex:10: Floki.HTMLParser.Mochiweb.parse_document/1

Expected behavior

I would expect Floki.parse_fragment/2 to return an error tuple.

philss commented

In this case the text will not be parsed, and it's going to keep as it is.

I should release a new version soon, but is fixed in the main branch. Thanks!