kaitai-io/kaitai_struct

Allow "repeat-expr" and "repeat-until" keys without "repeat" key

Mingun opened this issue · 8 comments

Requiring repeat key when repeat-expr or repeat-until key is specified a bit annoying. These keys are enough to understand what type of repetition is required. So I propose to make repeat key non-mandatory when repeat-expr or repeat-until is specified.

All kinds of correct input
seq:
  - id: name
    size: 1
    repeat-expr: 3
seq:
  - id: name
    size: 1
    repeat: expr
    repeat-expr: 3
seq:
  - id: name
    size: 1
    repeat-until: false
seq:
  - id: name
    size: 1
    repeat: until
    repeat-until: false
seq:
  - id: name
    size: 1
    repeat: eos
All kinds of incorrect input
seq:
  - id: name
    size: 1
    repeat: eos
    repeat-expr: 3
seq:
  - id: name
    size: 1
    repeat: until
    repeat-expr: 3
seq:
  - id: name
    size: 1
    repeat: eos
    repeat-until: false
seq:
  - id: name
    size: 1
    repeat: expr
    repeat-until: false

#128 is related

I actually agree on the idea. When things like repeat / repeat-expr / repeat-eos were designed, Kaitai Struct was still a very early PoC, not even having any kind of proper data schema parser or anything class structure backing it up.

Nowadays, it you'll see into implementation, attribute is structured like that:

sealed trait RepeatSpec
case class RepeatExpr(expr: Ast.expr) extends RepeatSpec
case class RepeatUntil(expr: Ast.expr) extends RepeatSpec
case object RepeatEos extends RepeatSpec
case object NoRepeat extends RepeatSpec

case class ConditionalSpec(ifExpr: Option[Ast.expr], repeat: RepeatSpec)

trait AttrLikeSpec extends MemberSpec {
  def dataType: DataType
  def cond: ConditionalSpec
  def valid: Option[ValidationSpec]
  def doc: DocSpec
}

So, ultimately, it would have made sense to have RepeatSpec a clear separate item with its own structure, something like:

- repeat:
    expr: 42
- repeat:
    until: some_expression
- repeat: eos

If everyone's ok with something like that, we can try to phase this in, compatible with current implementation, as phase out old syntax post-1.0.

See also #55, which collects radical changes.

I like that idea. But I also think it will be wise to interpret repeat: something as

repeat:
  expr: something

and also use repeat: _io.eos instead of repeat: eos (but also syntax also can be left for backward compatibility, under some flag in meta). Using expression for repetition is most used type of repetition.

I made some calculations over formats repository and here is the result:

  • eos: 113
  • expr: 302
  • until: 30
Computation code
use ksc_rs::parser::{Repeat, Attribute, TypeSpec};
use std::path::PathBuf;
#[derive(Debug, Default, PartialEq)]
struct Counts {
  eos: usize,
  expr: usize,
  until: usize,
}
impl Counts {
  fn inc_attr(&mut self, a: &Attribute) {
    match a.repeat {
      Some(Repeat::Eos) => self.eos+=1,
      Some(Repeat::Expr) => self.expr+=1,
      Some(Repeat::Until) => self.until+=1,
      _ => {},
    }
  }
  fn inc_type(&mut self, t: &TypeSpec) {
    if let Some(seq) = &t.seq {
      for a in seq {
        self.inc_attr(&a);
      }
    }
    if let Some(seq) = &t.instances {
      for (_, i) in seq {
        self.inc_attr(&i.common);
      }
    }
  }
  fn inc(&mut self, t: &TypeSpec) {
    self.inc_type(&t);
    if let Some(types) = &t.types {
      for (_, t) in types {
        self.inc(&t);
      }
    }
  }
  fn count_file(&mut self, resource: PathBuf) {
    let file = File::open(resource).unwrap();
    let ksy: Ksy = serde_yaml::from_reader(file).unwrap();

    self.inc(&ksy.root);
  }
}
#[test]
fn count() {
  let mut counts = Counts::default();
  for file in glob::glob("formats/**/*.ksy").unwrap() {
    counts.count_file(file.unwrap());
  }
  assert_eq!(counts, Counts::default());
}

@Mingun:

But I also think it will be wise to interpret repeat: something

Whoa, not so fast. A simple instance of undefined behavior with this change:

seq:
  - id: foo
    type: s1
    repeat: eos
instances:
  eos:
    value: 3

What is repeat: eos supposed to mean?

Yes, that's exactly the reason I suggest using _io.eos instead of eos. Among other things, the new syntax also fits well into the existing paradigm (and will also allow the use of strange things like repetition until some other stream is exhausted, and not the current). As a last resort, as I mentioned above, you can enter special flags in the compiler/ksy file, which defines how to interpret eos -- as a reference to a variable or a keyword in the expression language

Yes, that's exactly the reason I suggest using _io.eos instead of eos. Among other things, the new syntax also fits well into the existing paradigm (and will also allow the use of strange things like repetition until some other stream is exhausted, and not the current).

I think you've meant _io.eof instead (repeat: until + repeat-until: _io.eof translates to something like do {...} while (!this._io.isEof()); currently), eos is merely a special enum value in context of repeat key.

If I understand what you suggest, you want to drop repeat: eos in favor of repeat-until: _io.eof. And then replace repeat-expr and repeat-until with a new key repeat, which can be either an integer or a boolean. The repeat key shall then switch its behavior according to the type of its value - if the value has type integer, it'll mean repeat-expr, if the value is boolean, the repeat key will behave as repeat-until.

This doesn't seem like a good idea to me - a single key should not switch its behavior according to the type of the value. Renaming repeat-expr to just repeat could be tolerated, that doesn't look so bad:

seq:
  - id: num_items
    type: u4
  - id: items
    type: item
    repeat: num_items

But the repeat: <until-condition> looks wrong to me. OK, repeat: _io.eof can be understood correctly, but about repeat: false, repeat: true or repeat: _.sign == -1? The naive understanding would be that it translates into a while (<while-cond>) {...}, but it would actually be do {...} while(not <until-cond>);, so the actual logic would be inverted. repeat: false would actually be an infinite loop, and repeat: true would stop after the first parsed element (due to the do...while loop).

From this you can actually see that repeat: eos and repeat-until: _io.eof actually doesn't work the same, consider these examples:

seq:
  - id: cont
    type: items_type
    size: 0
types:
  items_type:
    seq:
      - id: items
        type: s1
        repeat: eos
seq:
  - id: cont
    type: items_type
    size: 0
types:
  items_type:
    seq:
      - id: items
        type: s1
        repeat: until
        repeat-until: _io.eof

They actually doesn't work the same, the first example will run fine (with cont.items.size == 0), but the second example will end up with an EOS exception (because the parsing code for the items array looks like do { items.push(_io.read_s1()) } while(!_io.is_eof()). To make the second example work the same as the first, one has to add another if: not _io.eof into the items attribute so that the first element is not parsed.

So we can't really replace repeat: eos with repeat: _io.eof until we have the repeat-while key (described in #156 and #625). The repeat-while: not _io.eof would be finally an appropriate equivalent to repeat: eos.

Another problem with a single repeat key instead of the separate repeat-expr and repeat-until keys (or something like the repeat-while key, that doesn't really matter in this concern) is that it isn't obvious that the type-dependent behavior decision of the repeat key must happen at compile time, not at run time. So repeat: 'is_repeated_till_eof ? _io.eof : 3' won't work.

So I wouldn't introduce the "short syntax" like repeat: _.is_last instead of the "long syntax" like repeat/until: _.is_last (or repeat: true instead of repeat/while: true, if we had the repeat-while key) as @GreyCat suggested for anything except the repeat/expr case. According to your statistics, repeat: until is still used far less frequently than repeat: eos or repeat: expr, so writing the full repeat syntax for repeat/until like

repeat:
  until: _.is_last_item

doesn't matter so much.

Yes, as I said, even the short syntax for repeat/expr brings the question about the repeat: eos ambiguity. But it's quite a marginal case, and it can be solved by prohibiting any attributes (seq fields, instances and params) named eos in the scope where the repeat: eos is used, which doesn't hurt much.

I think you've meant _io.eof instead

Wow, that property already exists. I just didn't assume it, so I called it _io.eos.

And then replace repeat-expr and repeat-until with a new key repeat, which can be either an integer or a boolean.

No, I'm just against mixing types. It will not be obvious where the number is and where the boolean expression is. In my proposal, I simply did not notice that hypotetical _io.eos (i.e. actual _io.eof) would be a boolean expression, not a integer expression, so really, the whole idea of my proposal loses its meaning. You absolutely right in your analysis

I'd like to request addition to this feature, and that is combination of eos and expr/until condition.
Hopefuly example will be helpful

This way, the field is expected if might be present but if eos condition is met, it's not parsed (result is empty array)

seq:
  - id: optional_fields_might_be_present
    type: b1
  - id: dummy
    type: b7
  - id: optional_fields
    type: optional_field
    if: optional_field_might_be_present == true
    repeat:
      eos: true
      until: _.has_next_item == true
types:
  optional_field:
    seq:
      - id: has_next_item
        type: b1
      - id: data
        type: b7