Proposal: Generator / Iterator Syntactic Sugar
Closed this issue · 10 comments
I know, I know, but hear me out; I have an argument to be made beyond "it looks pretty".
So consider the following premise: there is a need to generatively sub-string an input string according to some delimiters. In this scenario, we want to split up the string "device.title"
into "device"
and "title"
.
The initial tact I took for solving this was:
var tokenizer = Tokenizer.init("device.title");
while (tokenizer.next(".")) |token| {
// Utilize token...
}
However, I was unhappy with the resulting code and I decided to see if there were any other ways of expressing this in Zig by looking around online, and I stumbled upon #809 .
So basically the gripe I have is that the above example is that it exposes state in the callsite that should not be managed by the user of Tokenizer
.
foreach (var token : Token.Tokenize("device.title", "."))
{
// Utilize token...
}
foreach (token; tokenize("device.title", ".")) {
// Utilize token...
}
C# of course implements the IEnumerable intrinsic behavior, which is far from ideal, and D has "voldemort struct
s" which can use opApply
, an operator overload, which is in my opinion far worse for self-documenting how the hell it iterates.
I'm not sold on the idea of using either solutions that C# or D bring forth because of the above reasons mentioned, but I do think there needs to be a way to compound the initialization and iterative behavior into a single statement so as to be able to limit the scope of the generator / iterator and better define user-exposed APIs in a way that's also arguably more function at no cost to overhead.
Interested to hear other's thoughts on this though as this has been raised before for similar reasons.
If I understand correctly, the problem is that "."
needs to be specified by the iteration code? That's easy to fix:
const MyTokenizer = struct {
inner: Tokenizer,
delimiter: []const u8,
pub fn init(parse_string: []const u8, delimiter: []const u8) MyTokenizer {
return .{
.inner = Tokenizer.init(parse_string),
.delimiter = delimiter,
};
}
pub fn next(self: *MyTokenizer) ?[]const u8 {
return self.inner.next(self.delimiter);
}
};
var tokenizer = MyTokenizer.init("device.title", ".");
while (tokenizer.next()) |token| {
// Utilize token...
}
Now the iteration site is using only the information that's necessary.
The issue I was more so getting at is that the iterable instance is not functionally constant.
Perhaps I'm trying to ad-hoc in too many assumptions given to me by other languages, but it's generally not ideal to expose things that shouldn't be mutated beyond the way they're designed to be interfaced.
tokenizer
has its members exposed to the rest of the scope of the loop's callsite. I suppose being able to limit tokenizer
to the scope of the loop would be efficient for make it more "functionally constant", but that will require additional scope guards unless there's a way of supplying it in a for
or while
loop that I'm unfamiliar with?
I've been taking a closer look at while
and for
loops and can't see a difference between how they operate and are written. Am I missing something?
@kayomn
for
operates on slices and arrays, while
operates on arbitrary conditions and can do optional unwrapping.
You cannot do something like for (iterator.next()) |val| {...}
since iterator.next()
is not a slice.
This is kind of just a thing with Zig, we technically have while
loops and for
loops, but really you should consider it as if we just have while loops and nothing else.
See #3110 for a long discussion on this. My own opinion is that iteration is one of those things that Zig kind of makes a mess out of, mainly because I think its a tad odd to use a bunch of optional syntax to do iteration with while loops.
Anyway if you don't want to leak variables around you just have to wrap stuff in curly braces to create scopes.
So doing a C style for loop in Zig would be like this:
// Code
for (int i = 0; i<20; i++){
for (int j = 0; j<20; j++{
// Do stuff with i and j
}
}
// More Code
// Code
{
var i = 0;
while (i < 20):(i+=1){
var j = 0;
while (j < 20):(j+=1){
// Do stuff with i and j
}
}
}
// More code (i and j are not leaked)
I don't necessarily think that's exposing internals unnecessarily. While iterators in many languages have their next
method sugared out, that might be a little too implicit compared to the rest of Zig. Also, having the next
method exposed is really useful if you are writing something like a parser by hand.
I think I understand what you mean; there is always a mutable alias to the iterator available within the body of the loop. You cannot get around this currently with scope blocks, so code within the loop might do nasty things with the state of the iterator.
maybe something like this would be appropriate:
while ( { Tokenizer.init("device.name") }.next(".")) |token| {
// ...
}
where the expression in { ... }
is only evaluated once. I don't know what a good syntax for something like this would actually look like (or if its really protecting you from much), but I think something like this accomplishes what you are saying is needed without hiding the call to next() or imposing an interface on iteration.
Yeah that's pretty much my thoughts @jessrud . I wouldn't say I'm a particularly strong advocate of functional programming, but it's nice to have something that is "functionally const" to the rest of the program due to how it implicitly scopes itself.
Do you mean opApply
? The issue I think with that is it's a form of abstraction that goes against Zig's larger philosophy.
Although #3110 was a much broader proposal than this one, I think the comment Andrew left as he closed that one is relevant here too.
I realize this is controversial but I'm a big fan of status quo iteration. What I like about it is that there is no hidden control flow. You don't have to know the type of anything to understand how iteration works. If anything, I'm tempted to delete
for
loops from the language.
One thing is clear, we can't make next
a magic method name baked into the compiler, that would be crossing a boundary not usually crossed in Zig. So any proposal needs to include the user typing next
in their for
clause or whatever it is.
IMO the most realistic proposal addressing this problem is #8019 (expression-scoped variables).