Proposal: Generator / Iterator Syntactic Sugar

Question

Proposal: Generator / Iterator Syntactic Sugar

Closed this issue 3 years ago · 10 comments

I know, I know, but hear me out; I have an argument to be made beyond "it looks pretty".

So consider the following premise: there is a need to generatively sub-string an input string according to some delimiters. In this scenario, we want to split up the string "device.title" into "device" and "title".

The initial tact I took for solving this was:

var tokenizer = Tokenizer.init("device.title");

while (tokenizer.next(".")) |token| {
  // Utilize token...
}

However, I was unhappy with the resulting code and I decided to see if there were any other ways of expressing this in Zig by looking around online, and I stumbled upon #809 .

So basically the gripe I have is that the above example is that it exposes state in the callsite that should not be managed by the user of Tokenizer.

foreach (var token : Token.Tokenize("device.title", "."))
{
  // Utilize token...
}

foreach (token; tokenize("device.title", ".")) {
  // Utilize token...
}

C# of course implements the IEnumerable intrinsic behavior, which is far from ideal, and D has "voldemort structs" which can use opApply, an operator overload, which is in my opinion far worse for self-documenting how the hell it iterates.

I'm not sold on the idea of using either solutions that C# or D bring forth because of the above reasons mentioned, but I do think there needs to be a way to compound the initialization and iterative behavior into a single statement so as to be able to limit the scope of the generator / iterator and better define user-exposed APIs in a way that's also arguably more function at no cost to overhead.

Interested to hear other's thoughts on this though as this has been raised before for similar reasons.

Answer 1 · 2020-05-12T23:40:05.000Z

If I understand correctly, the problem is that "." needs to be specified by the iteration code? That's easy to fix:

const MyTokenizer = struct {
    inner: Tokenizer,
    delimiter: []const u8,

    pub fn init(parse_string: []const u8, delimiter: []const u8) MyTokenizer { 
        return .{
            .inner = Tokenizer.init(parse_string),
            .delimiter = delimiter,
        };
    }

    pub fn next(self: *MyTokenizer) ?[]const u8 {
        return self.inner.next(self.delimiter);
    }
};

var tokenizer = MyTokenizer.init("device.title", ".");

while (tokenizer.next()) |token| {
  // Utilize token...
}

Now the iteration site is using only the information that's necessary.

Answer 2 · 2020-05-13T00:07:13.000Z

The issue I was more so getting at is that the iterable instance is not functionally constant.

Perhaps I'm trying to ad-hoc in too many assumptions given to me by other languages, but it's generally not ideal to expose things that shouldn't be mutated beyond the way they're designed to be interfaced.

tokenizer has its members exposed to the rest of the scope of the loop's callsite. I suppose being able to limit tokenizer to the scope of the loop would be efficient for make it more "functionally constant", but that will require additional scope guards unless there's a way of supplying it in a for or while loop that I'm unfamiliar with?

Answer 3 · 2020-05-13T10:29:56.000Z

I've been taking a closer look at while and for loops and can't see a difference between how they operate and are written. Am I missing something?

Answer 4 · 2020-05-13T10:35:42.000Z

@kayomn
for operates on slices and arrays, while operates on arbitrary conditions and can do optional unwrapping.

You cannot do something like for (iterator.next()) |val| {...} since iterator.next() is not a slice.

Answer 5 · 2020-05-22T20:56:44.000Z

This is kind of just a thing with Zig, we technically have while loops and for loops, but really you should consider it as if we just have while loops and nothing else.

See #3110 for a long discussion on this. My own opinion is that iteration is one of those things that Zig kind of makes a mess out of, mainly because I think its a tad odd to use a bunch of optional syntax to do iteration with while loops.

Anyway if you don't want to leak variables around you just have to wrap stuff in curly braces to create scopes.

So doing a C style for loop in Zig would be like this:

// Code
for (int i = 0; i<20; i++){
    for (int j = 0; j<20; j++{
        // Do stuff with i and j
    }
}
// More Code

// Code
{
    var i = 0;
    while (i < 20):(i+=1){
        var j = 0;
        while (j < 20):(j+=1){
            // Do stuff with i and j
        }
    }
}
// More code (i and j are not leaked)

Answer 6 · 2020-05-22T23:02:05.000Z

I don't necessarily think that's exposing internals unnecessarily. While iterators in many languages have their next method sugared out, that might be a little too implicit compared to the rest of Zig. Also, having the next method exposed is really useful if you are writing something like a parser by hand.

Answer 7 · 2020-05-23T03:45:58.000Z

I think I understand what you mean; there is always a mutable alias to the iterator available within the body of the loop. You cannot get around this currently with scope blocks, so code within the loop might do nasty things with the state of the iterator.

maybe something like this would be appropriate:

while ( { Tokenizer.init("device.name") }.next(".")) |token| {
// ...
}

where the expression in { ... } is only evaluated once. I don't know what a good syntax for something like this would actually look like (or if its really protecting you from much), but I think something like this accomplishes what you are saying is needed without hiding the call to next() or imposing an interface on iteration.

Answer 8 · 2020-05-23T21:17:14.000Z

Yeah that's pretty much my thoughts @jessrud . I wouldn't say I'm a particularly strong advocate of functional programming, but it's nice to have something that is "functionally const" to the rest of the program due to how it implicitly scopes itself.

Answer 9 · 2020-11-02T11:08:29.000Z

@RUSshy

Do you mean opApply? The issue I think with that is it's a form of abstraction that goes against Zig's larger philosophy.

Answer 10 · 2021-07-02T22:33:55.000Z

Although #3110 was a much broader proposal than this one, I think the comment Andrew left as he closed that one is relevant here too.

I realize this is controversial but I'm a big fan of status quo iteration. What I like about it is that there is no hidden control flow. You don't have to know the type of anything to understand how iteration works. If anything, I'm tempted to delete for loops from the language.

One thing is clear, we can't make next a magic method name baked into the compiler, that would be crossing a boundary not usually crossed in Zig. So any proposal needs to include the user typing next in their for clause or whatever it is.

IMO the most realistic proposal addressing this problem is #8019 (expression-scoped variables).