Massive Label Leakage in GAM/GAM* Implementation

Question

Massive Label Leakage in GAM/GAM* Implementation

VijayLingam95 opened this issue 5 years ago · 5 comments

Answer 1 · 2021-04-10T12:53:19.000Z

Hi GAM Authors,

I have noticed a massive label leakage bug in your implementation of GAM/GAM*. Both edge_iterator and pair_iterator are using true labels instead of predicted labels. Below are more details on label leakage bug in each of these iterators.

bug in edge_iterator (defined in trainer_classification_gcn.py)
Source of label leakage:
elif labeling == 'lu':
edges = (
data.get_edges(src_labeled=True, tgt_labeled=False) + // Adds edges where src node is labeled and tgt node is unlabeled
data.get_edges(src_labeled=False, tgt_labeled=True)) // Adds edges where src node is unlabeled and tgt node in unlabeled

I have bold the line of concern.

In line: 692, LU_edges and UL_edges are concatenated. Note that unlabeled edges are added as source nodes in the bolded line.
We also see this by printing the edges variable in line::710.

While iterating through edges in Line::718, true labels are assigned to unlabeled indices. I have pasted the lines below and have highlighted the line of concern.

for edge in iterator:
indices_src = edge[:, 0]
indices_tgt = edge[:, 1]
features_src = data.get_features(indices_src)
features_tgt = data.get_features(indices_tgt)
labels_src = data.get_labels(indices_src)
labels_tgt = data.get_labels(indices_tgt)
yield (indices_src, indices_tgt, features_src, features_tgt, labels_src,
labels_tgt)

data.get_edges() returns true labels for unlabeled indices, thus showing massive improvements as reported in the paper.

Post fixing this label-leakage bug (either by remove UL edges and reversing UL edges), we can only observe marginal improvements over baselines.

bug in pair_iterator:

The way pair-iterator is defined in trainer_classification_gcn.py::633
printing variables labels_src, labels_tgt in line 668, 669, we can see that true label instead of predicted labels are assigned for LU and UU pair iterators.
the _select_from_pool() method invoked by pair_iterator assigns labels by using data.get_labels(indices_batch). This call returns true labeles for unlabeled indices instead of predicted labels.

Answer 2 · 2021-04-12T03:59:41.000Z

Hi Vijay,

Thanks for discovering this issue! I have made a pull request with a fix for the edge_iterator. See pull request #82. Given this change, we need to tune again the hyperparameters, so we are reruning some experiments and will post updates on the GAM repository.

However, I am not sure I understand the issue with "pair_iterator". You are right that it returns the true labels for unlabeled nodes, but these are not used. If you check the function "_construct_feed_dict" line 610 in "trainer_classification_gcn.py", we do not use the labels of the targets of LU edges (which are the unlabeled ones). Based on this, GAM* results should not be affected.

Answer 3 · 2022-01-04T04:03:50.000Z

Hi, Otilia and Krishna. Can this bug be closed, now that PR #82 has been merged? Thanks.

Answer 4 · 2022-01-11T18:30:42.000Z

+Otilia Stretcu ***@***.***> Hi Otilia, Can we close this?

…

On Mon, Jan 3, 2022 at 8:04 PM aheydon-google ***@***.***> wrote: Hi, Otilia and Krishna. Can this bug be closed, now that PR #82 <#82> has been merged? Thanks. — Reply to this email directly, view it on GitHub <#81 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANBFXNXNWMY2WG5VGLY4NTDUUJWTBANCNFSM42WORAPQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were assigned.Message ID: ***@***.***>

Answer 5 · 2022-01-11T19:03:06.000Z

Sure. I will update the readme and push a change by the end of the week. On Tue, Jan 11, 2022 at 10:30 AM Krishnamurthy Viswanathan ***@***.***> wrote:

…

+Otilia Stretcu ***@***.***> Hi Otilia, Can we close this? On Mon, Jan 3, 2022 at 8:04 PM aheydon-google ***@***.***> wrote: > Hi, Otilia and Krishna. Can this bug be closed, now that PR #82 > <#82> has > been merged? Thanks. > > — > Reply to this email directly, view it on GitHub > <#81 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ANBFXNXNWMY2WG5VGLY4NTDUUJWTBANCNFSM42WORAPQ> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > > You are receiving this because you were assigned.Message ID: > ***@***.***> >