rflechner/ScrapySharp

Issue parsing empty select

balexandrov opened this issue · 4 comments

If there is an select input without options (populated later with script) form parser throws exception.
PageWebForm.cs
ParseFormFields
Here I've put some null checks for value.

var selects = from @select in node.CssSelect("select")
                          let name = @select.GetAttributeValue("name")
                          let option =
                              @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                              @select.CssSelect("option").FirstOrDefault()
                          let value = (option == null) ? null : option.GetAttributeValue("value")
                          select new FormField
                          {
                              Name = name,
                              Value = string.IsNullOrEmpty(value) ? option == null ? "" : option.InnerText : value
                          };

Could you please provide a HTML sample ?

Can't find now the exact page that triggered this but the HTML code was empty select element without options in it. ie
<select></select>

Sorry, I cannot reproduce with a test:

        [Test]
        public void When_parsing_empty_select_tag()
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(@"<html><body><select></select></body></html>");
            var node = doc.DocumentNode;

            var selects = (from @select in node.CssSelect("select")
                let name = @select.GetAttributeValue("name")
                let option =
                    @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                    @select.CssSelect("option").FirstOrDefault()
                let value = (option == null) ? null : option.GetAttributeValue("value")
                select new 
                {
                    Name = name,
                    Value = string.IsNullOrEmpty(value) ? option == null ? "" : option.InnerText : value
                }).ToArray();
            
            Assert.AreEqual(1, selects.Length);
        }

I've just checked. This code is present in two files PageWebForm.cs and WebForm.cs. I've hit this when loading page, containing such html in PageWebForm but it must be fixed at the other place too.
The original code is missing the check for null: "let value = (option == null)" and option is clearly null when there are none of them.

This is the original code:

var selects = from @select in node.CssSelect("select")
                          let name = @select.GetAttributeValue("name")
                          let option =
                              @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                              @select.CssSelect("option").FirstOrDefault()
                          let value = option.GetAttributeValue("value")
                          select new FormField
                          {
                              Name = name,
                              Value = string.IsNullOrEmpty(value) ? option.InnerText : value
                          };