TypeError: allocator undefined for Nokogiri::HTML5::Document on YAML.unsafe_load_file
MatzFan opened this issue · 3 comments
I am trying to store and read back from a file a Nokogiri::XML::NodeSet object, using YAML.
I can serialize it OK, but when trying to deserialize it psych raises this error.
MRE:
require 'nokogiri'
require 'yaml'
nodeset = Nokogiri::HTML5.parse('<html><head></head><body></body></html>').xpath('//body')
File.write 'data.dump', YAML.dump(nodeset)
YAML.unsafe_load_file 'data.dump'
Stack trace from the last line:
/home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:408:in `allocate': allocator undefined for Nokogiri::HTML5::Document (TypeError)
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:408:in `revive'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:215:in `visit_Psych_Nodes_Mapping'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:347:in `block in revive_hash'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `each'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `each_slice'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:345:in `revive_hash'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:409:in `revive'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:215:in `visit_Psych_Nodes_Mapping'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:320:in `visit_Psych_Nodes_Document'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:30:in `visit'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/visitor.rb:6:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/visitors/to_ruby.rb:35:in `accept'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych/nodes/node.rb:50:in `to_ruby'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:274:in `unsafe_load'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:649:in `block in unsafe_load_file'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:648:in `open'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:648:in `unsafe_load_file'
from yaml_test.rb:8:in `<main>'
If I replace the last line with
YAML.load_file 'data.dump', permitted_classes: Nokogiri::XML::NodeSet
I get this:
/home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:326:in `safe_load': undefined method `map' for Nokogiri::XML::NodeSet:Class (NoMethodError)
class_loader = ClassLoader::Restricted.new(permitted_classes.map(&:to_s),
^^^^
Did you mean? tap
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:369:in `load'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:671:in `block in load_file'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:670:in `open'
from /home/me/.rbenv/versions/3.2.2/lib/ruby/3.2.0/psych.rb:670:in `load_file'
from yaml_test.rb:7:in `<main>'
The serialized object looks like this:-
--- !ruby/object:Nokogiri::XML::NodeSet
document: !ruby/object:Nokogiri::HTML5::Document
decorators:
errors: []
node_cache:
- !ruby/object:Nokogiri::XML::Element {}
- !ruby/object:Nokogiri::XML::Element {}
namespace_inheritance: false
url:
quirks_mode: 1
Is this a bug or is there some reason I can't deserialize this object?
The argument should be given as an array, like: permitted_classes: [ Nokogiri::XML::NodeSet ]
.
@olleolleolle my bad. At least the error is now clearer, looks like it can't be done:
Tried to load unspecified class: Nokogiri::HTML5::Document (Psych::DisallowedClass)
If I can't deserialize Nokogiri Documents or NodeSets I'll store the HTML as text. Closing & thanks.
For clarity and completeness: Note that the list of permitted classes may have many known-by-you class names.