Allow "repeat-expr" and "repeat-until" keys without "repeat" key
Mingun opened this issue · 8 comments
Requiring repeat
key when repeat-expr
or repeat-until
key is specified a bit annoying. These keys are enough to understand what type of repetition is required. So I propose to make repeat
key non-mandatory when repeat-expr
or repeat-until
is specified.
All kinds of correct input
seq:
- id: name
size: 1
repeat-expr: 3
seq:
- id: name
size: 1
repeat: expr
repeat-expr: 3
seq:
- id: name
size: 1
repeat-until: false
seq:
- id: name
size: 1
repeat: until
repeat-until: false
seq:
- id: name
size: 1
repeat: eos
All kinds of incorrect input
seq:
- id: name
size: 1
repeat: eos
repeat-expr: 3
seq:
- id: name
size: 1
repeat: until
repeat-expr: 3
seq:
- id: name
size: 1
repeat: eos
repeat-until: false
seq:
- id: name
size: 1
repeat: expr
repeat-until: false
I actually agree on the idea. When things like repeat
/ repeat-expr
/ repeat-eos
were designed, Kaitai Struct was still a very early PoC, not even having any kind of proper data schema parser or anything class structure backing it up.
Nowadays, it you'll see into implementation, attribute is structured like that:
sealed trait RepeatSpec
case class RepeatExpr(expr: Ast.expr) extends RepeatSpec
case class RepeatUntil(expr: Ast.expr) extends RepeatSpec
case object RepeatEos extends RepeatSpec
case object NoRepeat extends RepeatSpec
case class ConditionalSpec(ifExpr: Option[Ast.expr], repeat: RepeatSpec)
trait AttrLikeSpec extends MemberSpec {
def dataType: DataType
def cond: ConditionalSpec
def valid: Option[ValidationSpec]
def doc: DocSpec
}
So, ultimately, it would have made sense to have RepeatSpec a clear separate item with its own structure, something like:
- repeat:
expr: 42
- repeat:
until: some_expression
- repeat: eos
If everyone's ok with something like that, we can try to phase this in, compatible with current implementation, as phase out old syntax post-1.0.
See also #55, which collects radical changes.
I like that idea. But I also think it will be wise to interpret repeat: something
as
repeat:
expr: something
and also use repeat: _io.eos
instead of repeat: eos
(but also syntax also can be left for backward compatibility, under some flag in meta
). Using expression for repetition is most used type of repetition.
I made some calculations over formats repository and here is the result:
- eos: 113
- expr: 302
- until: 30
Computation code
use ksc_rs::parser::{Repeat, Attribute, TypeSpec};
use std::path::PathBuf;
#[derive(Debug, Default, PartialEq)]
struct Counts {
eos: usize,
expr: usize,
until: usize,
}
impl Counts {
fn inc_attr(&mut self, a: &Attribute) {
match a.repeat {
Some(Repeat::Eos) => self.eos+=1,
Some(Repeat::Expr) => self.expr+=1,
Some(Repeat::Until) => self.until+=1,
_ => {},
}
}
fn inc_type(&mut self, t: &TypeSpec) {
if let Some(seq) = &t.seq {
for a in seq {
self.inc_attr(&a);
}
}
if let Some(seq) = &t.instances {
for (_, i) in seq {
self.inc_attr(&i.common);
}
}
}
fn inc(&mut self, t: &TypeSpec) {
self.inc_type(&t);
if let Some(types) = &t.types {
for (_, t) in types {
self.inc(&t);
}
}
}
fn count_file(&mut self, resource: PathBuf) {
let file = File::open(resource).unwrap();
let ksy: Ksy = serde_yaml::from_reader(file).unwrap();
self.inc(&ksy.root);
}
}
#[test]
fn count() {
let mut counts = Counts::default();
for file in glob::glob("formats/**/*.ksy").unwrap() {
counts.count_file(file.unwrap());
}
assert_eq!(counts, Counts::default());
}
But I also think it will be wise to interpret
repeat: something
Whoa, not so fast. A simple instance of undefined behavior with this change:
seq:
- id: foo
type: s1
repeat: eos
instances:
eos:
value: 3
What is repeat: eos
supposed to mean?
Yes, that's exactly the reason I suggest using _io.eos
instead of eos
. Among other things, the new syntax also fits well into the existing paradigm (and will also allow the use of strange things like repetition until some other stream is exhausted, and not the current). As a last resort, as I mentioned above, you can enter special flags in the compiler/ksy file, which defines how to interpret eos
-- as a reference to a variable or a keyword in the expression language
Yes, that's exactly the reason I suggest using
_io.eos
instead ofeos
. Among other things, the new syntax also fits well into the existing paradigm (and will also allow the use of strange things like repetition until some other stream is exhausted, and not the current).
I think you've meant _io.eof
instead (repeat: until
+ repeat-until: _io.eof
translates to something like do {...} while (!this._io.isEof());
currently), eos
is merely a special enum value in context of repeat
key.
If I understand what you suggest, you want to drop repeat: eos
in favor of repeat-until: _io.eof
. And then replace repeat-expr
and repeat-until
with a new key repeat
, which can be either an integer or a boolean. The repeat
key shall then switch its behavior according to the type of its value - if the value has type integer, it'll mean repeat-expr
, if the value is boolean, the repeat
key will behave as repeat-until
.
This doesn't seem like a good idea to me - a single key should not switch its behavior according to the type of the value. Renaming repeat-expr
to just repeat
could be tolerated, that doesn't look so bad:
seq:
- id: num_items
type: u4
- id: items
type: item
repeat: num_items
But the repeat: <until-condition>
looks wrong to me. OK, repeat: _io.eof
can be understood correctly, but about repeat: false
, repeat: true
or repeat: _.sign == -1
? The naive understanding would be that it translates into a while (<while-cond>) {...}
, but it would actually be do {...} while(not <until-cond>);
, so the actual logic would be inverted. repeat: false
would actually be an infinite loop, and repeat: true
would stop after the first parsed element (due to the do...while loop).
From this you can actually see that repeat: eos
and repeat-until: _io.eof
actually doesn't work the same, consider these examples:
seq:
- id: cont
type: items_type
size: 0
types:
items_type:
seq:
- id: items
type: s1
repeat: eos
seq:
- id: cont
type: items_type
size: 0
types:
items_type:
seq:
- id: items
type: s1
repeat: until
repeat-until: _io.eof
They actually doesn't work the same, the first example will run fine (with cont.items.size == 0
), but the second example will end up with an EOS exception (because the parsing code for the items
array looks like do { items.push(_io.read_s1()) } while(!_io.is_eof())
. To make the second example work the same as the first, one has to add another if: not _io.eof
into the items
attribute so that the first element is not parsed.
So we can't really replace repeat: eos
with repeat: _io.eof
until we have the repeat-while
key (described in #156 and #625). The repeat-while: not _io.eof
would be finally an appropriate equivalent to repeat: eos
.
Another problem with a single repeat
key instead of the separate repeat-expr
and repeat-until
keys (or something like the repeat-while
key, that doesn't really matter in this concern) is that it isn't obvious that the type-dependent behavior decision of the repeat
key must happen at compile time, not at run time. So repeat: 'is_repeated_till_eof ? _io.eof : 3'
won't work.
So I wouldn't introduce the "short syntax" like repeat: _.is_last
instead of the "long syntax" like repeat/until: _.is_last
(or repeat: true
instead of repeat/while: true
, if we had the repeat-while
key) as @GreyCat suggested for anything except the repeat/expr
case. According to your statistics, repeat: until
is still used far less frequently than repeat: eos
or repeat: expr
, so writing the full repeat syntax for repeat/until
like
repeat:
until: _.is_last_item
doesn't matter so much.
Yes, as I said, even the short syntax for repeat/expr
brings the question about the repeat: eos
ambiguity. But it's quite a marginal case, and it can be solved by prohibiting any attributes (seq
fields, instances
and params
) named eos
in the scope where the repeat: eos
is used, which doesn't hurt much.
I think you've meant
_io.eof
instead
Wow, that property already exists. I just didn't assume it, so I called it _io.eos
.
And then replace
repeat-expr
andrepeat-until
with a new key repeat, which can be either an integer or a boolean.
No, I'm just against mixing types. It will not be obvious where the number is and where the boolean expression is. In my proposal, I simply did not notice that hypotetical _io.eos
(i.e. actual _io.eof
) would be a boolean expression, not a integer expression, so really, the whole idea of my proposal loses its meaning. You absolutely right in your analysis
I'd like to request addition to this feature, and that is combination of eos and expr/until condition.
Hopefuly example will be helpful
This way, the field is expected if might be present but if eos condition is met, it's not parsed (result is empty array)
seq:
- id: optional_fields_might_be_present
type: b1
- id: dummy
type: b7
- id: optional_fields
type: optional_field
if: optional_field_might_be_present == true
repeat:
eos: true
until: _.has_next_item == true
types:
optional_field:
seq:
- id: has_next_item
type: b1
- id: data
type: b7