Additional basic functions beyond .map to allow for more functional programming
hhoeflin opened this issue ยท 1 comments
๐ The feature
For IterDataPipe
, the .map
maps a function over the items of an iterable. where the function has the form
f: Any -> Any
Other basic building blocks could be .pipe
, .iter_map
and .comsume
. where
.pipe
would takef: Iterable -> Iterable
.iter_map
takesf: Any -> Iterable
.comsume
takesf: Iterable -> Any
Motivation, pitch
Such an approach would allow for more flexible functional programming and would reduce most currently provided IterDataPipe
classes to a simple functional call. For example
The Enumerator
class would become
dp.pipe(enumerate)
This would immediately enable to use all itertools functions in this context.
The TarArchiveLoader
could become
def iter_from_tar_archive(fd):
.<code to yield files from tar archive >
dp.iter_map(iter_from_tar_archive)
I believe using this approach, almost all provided classes could be written using less boilerplate using generator functions (essentially just writing the code inside __iter__
as a standalone generator function, possibly curried for convenience if other parameters are being used).
Would be great to hear if this was considered? Thanks!
Alternatives
The .pipe
can already be written as
dp2 = IterableWrapper(enumerate(dp))
but I believe this would be a lot less nice than the above
dp.pipe(enumerate)
Additional context
No response
Just wanted to ping about this issue. Would be great to hear the development teams perspective. Even after looking into it more, it still appears to me that most of the functionality provided could be exposed as individual functions.
Would be great to know if I am missing something or misunderstand about the functionality of torchdata.
Thanks