Documentation for applying to new dataset
tomhosking opened this issue · 4 comments
Hi, I'm interested in applying GECA to a new dataset - could you provide some brief documentation or examples on how I might augment an arbitrary list of utterances using your implementation? Thanks!
Hi Jacob,
I've tried to put together a minimum working example to then export to my own project + framework, but I'm finding it difficult. With compute adjacency set to True
(I think otherwise it doesn't actually do anything?), I tried the following:
from data.builder import OneShotDataset
train_data = [
((), tuple('red lorry'.split())),
((), tuple('red car'.split())),
((), tuple('yellow lorry'.split())),
]
ds = OneShotDataset(train_data, [], [])
print(ds.multiplicity)
defaultdict(<function OneShotDataset._compute_adjacency.<locals>.<lambda> at 0x7f1604c318c8>, {(1, 6, 5, 8, 2): 2, (1, 6, 5, 9, 2): 1, (1, 6, 7, 5, 2): 2, (1, 6, 10, 5, 2): 1, (1, 6, 5, 2): 3, 1: 0, 2: 0, 5: 0, 6: 0, 8: 0, 9: 0, 7: 0, 10: 0})
print(ds.templ_to_templ)
defaultdict(<class 'set'>, {1: {(1, 6, 10, 5, 2), (1, 6, 5, 2), (1, 6, 7, 5, 2), (1, 6, 5, 9, 2), (1, 6, 5, 8, 2)}, 2: {(1, 6, 10, 5, 2), (1, 6, 5, 2), (1, 6, 7, 5, 2), (1, 6, 5, 9, 2), (1, 6, 5, 8, 2)}, 5: {(1, 6, 10, 5, 2), (1, 6, 5, 2), (1, 6, 7, 5, 2), (1, 6, 5, 9, 2), (1, 6, 5, 8, 2)}, 6: {(1, 6, 10, 5, 2), (1, 6, 5, 2), (1, 6, 7, 5, 2), (1, 6, 5, 9, 2), (1, 6, 5, 8, 2)}, 8: {(1, 6, 5, 8, 2), (1, 6, 5, 9, 2)}, 9: {(1, 6, 5, 8, 2), (1, 6, 5, 9, 2)}, 7: {(1, 6, 7, 5, 2), (1, 6, 10, 5, 2)}, 10: {(1, 6, 7, 5, 2), (1, 6, 10, 5, 2)}})
print(ds.comp_pairs)
[]
Iterating through ds.sample_comp_train()
then throws an error, since comp_pairs
is empty.
My understanding is that this should at the very least add 'yellow car' to the dataset?
If I understand these lines correctly, comp_pairs
will never get populated since the keys in templ_to_templ
different types of structure to the keys in multiplicity
:
comp_pairs = []
for templ1 in self.templ_to_templ:
if self.multiplicity[templ1] <= 1:
continue
for templ2 in self.templ_to_templ[templ1]:
comp_pairs.append((templ1, templ2))
A standalone MWE would be really helpful for using GECA in other research!
Thanks
This is extremely late, but there's now a minimal example under data/colors.py
. Hope you got it working!