kaitai-io/kaitai_struct

Ruby: `enum.to_i` implementation is buggy due to string replacement

generalmimon opened this issue · 0 comments

When stumbling upon this line of code in KSC, I got suspicious because I'm generally convinced is that any kind of string manipulation on translated expressions (i.e. whenever you don't treat the translated expressions as opaque strings) is naive, flawed and can cause hard-to-detect bugs:

RubyTranslator.scala:67

    enumDirectMap.replace(enumNameDirect, enumNameInverse)

It turns out that this use of String.replace() does indeed cause an observable bug. See reproduction code below. For comparison, I wrote two .ksy specs that are very similar, one where the bug isn't triggered (enum_nested_no_bug.ksy) and one where the bug is triggered (enum_nested_bug.ksy):

meta:
  id: enum_nested_no_bug
instances:
  pet_1_to_i:
    value: a_b::abc::cat.to_i
types:
  a_b:
    enums:
      abc:
        42: cat
meta:
  id: enum_nested_bug
instances:
  pet_1_to_i:
    value: a_b_c::abc::cat.to_i
types:
  a_b_c:
    enums:
      abc:
        42: cat
diff --git 1/enum_nested_no_bug.ksy 2/enum_nested_bug.ksy
index 1d88414..c5bb137 100644
--- 1/enum_nested_no_bug.ksy
+++ 2/enum_nested_bug.ksy
@@ -1,10 +1,10 @@
 meta:
-  id: enum_nested_no_bug
+  id: enum_nested_bug
 instances:
   pet_1_to_i:
-    value: a_b::abc::cat.to_i
+    value: a_b_c::abc::cat.to_i
 types:
-  a_b:
+  a_b_c:
     enums:
       abc:
         42: cat

I also added a debug print in KSC to make it easy to see how the bug happens (this is at RubyTranslator.scala:67):

--- i/shared/src/main/scala/io/kaitai/struct/translators/RubyTranslator.scala
+++ w/shared/src/main/scala/io/kaitai/struct/translators/RubyTranslator.scala
@@ -64,7 +64,9 @@ class RubyTranslator(provider: TypeProvider) extends BaseTranslator(provider)
     val enumNameDirect = Utils.upperUnderscoreCase(enumTypeAndName.last)
     val enumNameInverse = RubyCompiler.inverseEnumName(enumNameDirect)
 
-    enumDirectMap.replace(enumNameDirect, enumNameInverse)
+    val res = enumDirectMap.replace(enumNameDirect, enumNameInverse)
+    println("replacing " + enumNameDirect + " with " + enumNameInverse + " in " + enumDirectMap + " -> " + res)
+    res
   }
 
   override def arraySubscript(container: Ast.expr, idx: Ast.expr): String =
$ kaitai-struct-compiler -- --verbose file -t ruby enum_nested_no_bug.ksy enum_nested_bug.ksy
parsing enum_nested_no_bug.ksy...
reading enum_nested_no_bug.ksy...
... compiling it for ruby...
replacing ABC with I__ABC in AB::ABC -> AB::I__ABC
.... writing enum_nested_no_bug.rb
parsing enum_nested_bug.ksy...
reading enum_nested_bug.ksy...
... compiling it for ruby...
replacing ABC with I__ABC in ABC::ABC -> I__ABC::I__ABC
.... writing enum_nested_bug.rb

You can already see the problem - the valid reference to the "enum inverse map" would be ABC::I__ABC, but the compiler ended up with I__ABC::I__ABC because of the naive string substitution. This is reflected in the generated code:

diff --git 1/enum_nested_no_bug.rb 2/enum_nested_bug.rb
index 4ddbb4e..8a8b84f 100644
--- 1/enum_nested_no_bug.rb
+++ 2/enum_nested_bug.rb
@@ -6,7 +6,7 @@ unless Gem::Version.new(Kaitai::Struct::VERSION) >= Gem::Version.new('0.11')
   raise "Incompatible Kaitai Struct Ruby API: 0.11 or later is required, but you have #{Kaitai::Struct::VERSION}"
 end
 
-class EnumNestedNoBug < Kaitai::Struct::Struct
+class EnumNestedBug < Kaitai::Struct::Struct
   def initialize(_io, _parent = nil, _root = nil)
     super(_io, _parent, _root || self)
     _read
@@ -15,7 +15,7 @@ class EnumNestedNoBug < Kaitai::Struct::Struct
   def _read
     self
   end
-  class AB < Kaitai::Struct::Struct
+  class ABC < Kaitai::Struct::Struct
 
     ABC = {
       42 => :abc_cat,
@@ -32,7 +32,7 @@ class EnumNestedNoBug < Kaitai::Struct::Struct
   end
   def pet_1_to_i
     return @pet_1_to_i unless @pet_1_to_i.nil?
-    @pet_1_to_i = AB::I__ABC[:abc_cat]
+    @pet_1_to_i = I__ABC::I__ABC[:abc_cat]
     @pet_1_to_i
   end
 end

It's not hard to imagine that the enum_nested_no_bug.rb file works, whereas enum_nested_bug.rb doesn't:

$ ruby -I /c/temp/kaitai_struct/runtime/ruby/lib -e 'require_relative "enum_nested_no_bug"' -e 'r = EnumNestedNoBug.new(Kaitai::Struct::Stream.new(""))' -e 'pp r.pet_1_to_i'
42

pp@DESKTOP-89OPGF3 MINGW64 /c/temp/ks-experiments/ruby-enum-str-replace-bug
$ ruby -I /c/temp/kaitai_struct/runtime/ruby/lib -e 'require_relative "enum_nested_bug"' -e 'r = EnumNestedBug.new(Kaitai::Struct::Stream.new(""))' -e 'pp r.pet_1_to_i'
C:/temp/ks-experiments/ruby-enum-str-replace-bug/enum_nested_bug.rb:35:in `pet_1_to_i': uninitialized constant EnumNestedBug::I__ABC (NameError)
        from -e:3:in `<main>'

This bug well demonstrates that string manipulation on translated expressions should be avoided as much as possible, as it almost always has unintended consequences (this is because it's inherently the wrong tool for the task).

Git blames kaitai-io/kaitai_struct_compiler#159 authored by @ams-tschoening and merged by @GreyCat. I'm not mentioning this because I want to be mean, but I wanted to let you know about this pitfall so we can avoid it in the future. Everyone makes mistakes, but I think this kind can be easily avoided if you know what to look out for.