Bug in REaLTabFormer.sample() when relational model generates no data
efstathios-chatzikyriakidis opened this issue · 3 comments
Hi @avsolatorio!
I have a super urgent bug ticket that I need to solve asap and I have solve it here. I wonder if you could just push it and create a new PyPI release.
It is related to the library in the def _processes_sample()
method.
Bug:
I have trained a relational model and some times the data are not so many to learn the relationship and the sampling of the relational model generates an empty synth_sample
and as a consequence the following line if synth_sample[col].iloc[0].startswith(col):
fails with an exception because it assumes that at least one line at iloc 0 will always exist: https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/rtf_sampler.py#L449
The function def _processes_sample()
is used also when sampling a tabular model and probably this can happen also in this tabular model sampling. Regardless the case (tabular/relational) 2-3 lines below the bug line (L449) the function already (and wisely) has some logic to check if the sampled data synth_df
is empty and if that's the case it throws the SampleEmptyError
exception.
Resolution:
So, I think we can add also a similar check above the line https://github.com/worldbank/REaLTabFormer/blob/main/src/realtabformer/rtf_sampler.py#L443 checking synth_sample
dataframe instead, something like:
if synth_sample.empty:
# Handle this exception in the sampling function.
raise SampleEmptyError(in_size=len(sample_outputs))
Then in my application I can catch the specific exception:
from realtabformer.rtf_exceptions import SampleEmptyError
try:
// run some REalTabFormer sampling code
except SampleEmptyError as e:
// handle the specific exception
What do you think?
Can you do the fix and build a new PyPI package?
Thanks!
Hello @efstathios-chatzikyriakidis, thank you for sharing this issue. I have made the changes and also published a release with that patch. Please try installing the new version.
In the future, I recommend that you submit a pull request. This is entirely your solution, and I want people like you who find issues and implement solutions to be part of the official contributors! 🤗
Thank you @avsolatorio!
I appreciate your effort and the work you did for resolving this. Yes, in the future for any issue if I have a resolution I will send a PR.
Regarding the commit changes I see that also FrozenSeq2SeqTrainer
is replaced with Seq2SeqTrainer
. Is this something that could cause any problem or what was the intuition behind it, does it optimize anything?
Thanks!
Regarding the commit changes I see that also
FrozenSeq2SeqTrainer
is replaced withSeq2SeqTrainer
. Is this something that could cause any problem or what was the intuition behind it, does it optimize anything?
No, this should work the same. I needed to implement the FrozenSeq2SeqTrainer
before because the Seq2SeqTrainer
on the transformers library had a bug. I reported it here: huggingface/transformers#21182.
It was already fixed, so the FrozenSeq2SeqTrainer
is no longer needed.