sparklemotion/nokogiri

Replace `Reader#attribute_nodes` with a rough equivalent

Opened this issue · 3 comments

Context

Issue #2598 describes at length an incompatibility between Ruby's compacting garbage collector and libxml2's xmlTextReaderExpand which re-uses C structs previously returned. The solution chosen and implemented in #2599 was to deprecate Reader#attribute_nodes. This method was removed entirely in Nokogiri v1.16.0.

In #2599 I acknowledged that there may be use cases the needed the semantics of #attribute_nodes (e.g., returning all attributes) and for which #attribute_hash may not suffice (because the non-namespaced hash keys used may collide). I intentionally deferred introducing a replacement until/unless we had a real-world use case that could drive out the API design.

In #2599 (comment), @ccutrer brought a real-world use case: https://github.com/instructure/ratom/blob/v0.10.11/lib/atom/xml/parser.rb#L103

So let's do this!

API design

I would like to play with a few options for retrieving the namespaced attributes from a Reader cursor.

Some ideas:

  1. Mimic Node#attribute_nodes with a new Reader#attribute_nodes and return an Array<XML::Attr>. This might be a challenging implementation because we'll be re-complicating the XML::Attr lifecycle again, but it's worth attempting to see if something simple like an ephemeral XML::Document would work.
  2. Mirror Reader#attribute_hash with Reader#namespaced_attribute_hash which returns a Hash<String => String> in which the hash key is the attribute name including the namespace prefix if any, and the hash value is the attribute value.

@ccutrer I'm curious what you think the API might look like?

Cleanup

While we're here, we should make sure to delete the additional GC lifecycle handling in xml_node.c for attributes. I deleted the #attribute_nodes method but didn't go further than that.

#3 is fine with me, as long as I also have any easy way to get from the namespace prefix to the namespace's href. Though maybe the key should be a pair of strings? So I don't have to split on ":" immediately to get back to the prefix itself? Or maybe not. Gotta consider attributes without a namespace. The ratom code actually re-combines it and joins with ":" for one of its branches.

@ccutrer I'm not sure I understand - you say #3 but I only listed two ideas?

It was meant as a "or an alternative you didn't mention." Sorry for any confusion