zzzprojects/html-agility-pack

Creating empty value for attributte

joffremota opened this issue ยท 8 comments

1. Description

I've got the following value
<a download href=\"/downloads/arquivo.zip\">Download do Arquivo</a>

When I pass this into the following method in order to add the target="_blank" attributte, I'm getting this:
<a download="" href=\"/downloads/arquivo.zip\" target=\"_blank\">Download do Arquivo</a>

How can I prevent the lib to add the empty value on download?

Here is the method.

        private string UpdateAnchorTagsWithTargetBlank(string html)
        {
            var doc = new HtmlDocument();

            doc.LoadHtml(html);
            var anchorNodes = doc.DocumentNode.SelectNodes("//a[@href]");
            if (anchorNodes != null)
            {
                foreach (var node in anchorNodes)
                {
                    if (node.GetAttributeValue("target", "") != "_blank")
                        node.SetAttributeValue("target", "_blank");
                }
            }

            return doc.DocumentNode.OuterHtml;
        }

2. Exception

Not applicable

3. Fiddle or Project

Not applicable

4. Any further technical details

  • HtmlAgilityPack (1.11.40)
  • SDK Version: 6.0.404

It's not so obvious, but to get the desired behavior you have to configure HtmlDocument.GlobalAttributeValueQuote to use AttributeValueQuote.Initial, i.e.:

var doc = new HtmlDocument()
{
    GlobalAttributeValueQuote = AttributeValueQuote.Initial
};

(This could also be done after loading an HTML document.)



EDIT: I just noticed that the newly created target attribute will have single-quotes when setting up the HtmlDocument instance with AttributeValueQuote.Initial and it's impossible to change this by fiddling with the HtmlAttribute's QuoteType property. Dang! If you can't tolerate single quotes, my suggested solution isn't acceptable, unfortunately.

The problem is the internal field HtmlAttribute.InternalQuoteType being left untouched for newly created attributes and therefore initialized with the default value (which is SingleQuote). Either the cause is the untouched HtmlAttribute.InternalQuoteType field itself or this if expression is borked:

if (quoteType == AttributeValueQuote.Initial && !(att._isFromParse && !att._hasEqual && string.IsNullOrEmpty(att.XmlValue)))
{
quoteType = att.InternalQuoteType;
}

Thank you @elgonzo ,

Indeed to keep attribute, your proposed solution is perfect: doc.GlobalAttributeValueQuote = AttributeValueQuote.Initial;

As for the SingleQuote problem, I guess the only way at this moment is to use reflection to set the value to DoubleQuote.

Such as:

var html = "<a download href=\"/downloads/arquivo.zip\">Download do Arquivo</a>";
var doc = new HtmlDocument();
doc.GlobalAttributeValueQuote = AttributeValueQuote.Initial;
doc.LoadHtml(html);

var anchorNodes = doc.DocumentNode.SelectNodes("//a[@href]");
if (anchorNodes != null)
{
	foreach (var node in anchorNodes)
	{
		if (node.GetAttributeValue("target", "") != "_blank")
		{
			node.SetAttributeValue("target", "_blank");
			var targetAttribute = node.GetAttributes("target").Single();
			var internalQuoteTypeProperty = typeof(HtmlAgilityPack.HtmlAttribute).GetProperty("InternalQuoteType", System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
			internalQuoteTypeProperty.SetValue(targetAttribute, AttributeValueQuote.DoubleQuote);
		}                            
	}
}

var outputHtml = doc.DocumentNode.OuterHtml;

Best Regards,

Jon

Hi @JonathanMagnan ,

I just commited a PR to propose a correction for this issue, can you please take a look, I am facing the same issue and need this to be fixed in my system :).

Thanks in advance and best regards
POFerro

Thank you @POFerro for your PR.

I will try to look at it very soon.

Best Regards,

Jon

Hi @JonathanMagnan ,

Any news? :)

Hello @POFerro ,

Sorry for the delay. I didn't say it, but I have been on vacation since June 25 (a few days after your PR).

I'm returning tomorrow, so I will look at it and merge it if accepted next week.

Best Regards,

Jon

Hello @POFerro ,

Thank you again for your pull request. It has been merged and released in the version v1.11.62

Honestly, I'm always afraid of side impacts that will cause other developers as now the download doesn't have a double quote anymore, but I guess we will see if some people report this new behavior as an issue or not in the following weeks.

@joffremota , could you confirm it indeed fixed your issue as well? It seems to work flawlessly on my side.

Best Regards,

Jon

Hi @JonathanMagnan

Thanks for accepting the PR.
I already tested in my case and works like a charm.

Thanks and best regards ;)
POFerro