ruby/psych

TypeError: allocator undefined for Nokogiri::HTML5::Document on YAML.unsafe_load_file

MatzFan opened this issue · 3 comments

I am trying to store and read back from a file a Nokogiri::XML::NodeSet object, using YAML.

I can serialize it OK, but when trying to deserialize it psych raises this error.

MRE:

require 'nokogiri'
require 'yaml'

nodeset = Nokogiri::HTML5.parse('<html><head></head><body></body></html>').xpath('//body')

File.write 'data.dump', YAML.dump(nodeset)
YAML.unsafe_load_file 'data.dump'

Stack trace from the last line:

/home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:408:in `allocate': allocator undefined for Nokogiri::HTML5::Document (TypeError)
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:408:in `revive'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:215:in `visit_Psych_Nodes_Mapping'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:347:in `block in revive_hash'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `each'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `each_slice'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `revive_hash'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:409:in `revive'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:215:in `visit_Psych_Nodes_Mapping'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:320:in `visit_Psych_Nodes_Document'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/nodes/node.rb:50:in `to_ruby'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:274:in `unsafe_load'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:649:in `block in unsafe_load_file'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:648:in `open'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:648:in `unsafe_load_file'
	from yaml_test.rb:8:in `<main>'

If I replace the last line with

YAML.load_file 'data.dump', permitted_classes: Nokogiri::XML::NodeSet

I get this:

/home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:326:in `safe_load': undefined method `map' for Nokogiri::XML::NodeSet:Class (NoMethodError)

    class_loader = ClassLoader::Restricted.new(permitted_classes.map(&:to_s),
                                                                ^^^^
Did you mean?  tap
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:369:in `load'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:671:in `block in load_file'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:670:in `open'
	from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:670:in `load_file'
	from yaml_test.rb:7:in `<main>'

The serialized object looks like this:-

--- !ruby/object:Nokogiri::XML::NodeSet
document: !ruby/object:Nokogiri::HTML5::Document
  decorators:
  errors: []
  node_cache:
  - !ruby/object:Nokogiri::XML::Element {}
  - !ruby/object:Nokogiri::XML::Element {}
  namespace_inheritance: false
  url:
  quirks_mode: 1

Is this a bug or is there some reason I can't deserialize this object?

The argument should be given as an array, like: permitted_classes: [ Nokogiri::XML::NodeSet ].

@olleolleolle my bad. At least the error is now clearer, looks like it can't be done:

Tried to load unspecified class: Nokogiri::HTML5::Document (Psych::DisallowedClass)

If I can't deserialize Nokogiri Documents or NodeSets I'll store the HTML as text. Closing & thanks.

For clarity and completeness: Note that the list of permitted classes may have many known-by-you class names.