datasig-ac-uk/nlpsig

Check plotEmbedding after `dsig.create_features`

Closed this issue · 1 comments

In the notebook, after encoding the text data, we can plot the embeddings:

Screenshot 2022-10-12 at 14 57 35

which seems reasonable as we have three classes. In fact, we have four classes, but we map them into three:

            "label":
                {"economy": 2,
                 "obama": 1,
                 "microsoft": 0,
                 "palestine": 0
                }

However, after time injection and

x_data = dsig.create_features(path, sig_combined, last_index_dt_all, bert_embeddings, time_feature)

The results look very different (see the notebook). Do I miss anything?

Interesting point from meeting with @kasra-hosseini: potential reason why this happens might be because in this example, we are adding random time-stamps / time-ids and this could be why we're getting strange results which don't seem to cluster very well...