datasig-ac-uk/nlpsig

Model specifications

Closed this issue · 1 comments

I changed the model_specifics dictionary to a nested one with the following entries. Does this make sense?

model_specifics = {
    "encoder_args": {
        "col_name_text": "content",
        "model_name": "all-MiniLM-L6-v2",
        "model_args": {
            "batch_size": 64,
            "show_progress_bar": True,
            "output_value": 'sentence_embedding', 
            "convert_to_numpy": True,
            "convert_to_tensor": False,
            "device": None,
            "normalize_embeddings": False
        }
    },
    "dim_reduction": {
        "method": 'ppapca', #options: ppapca, ppapcappa, umap
        "num_components": 10, # options: any int number between 1 and embedding dimensions
    },
    "time_injection": {
        "history_tp": 'timestamp', #options: timestamp, None
        "post_tp": 'timestamp', #options: timestamp, timediff, None
    },
    "embedding":{
        "global_embedding_tp": 'SBERT', #options: SBERT, BERT_cls , BERT_mean, BERT_max
        "post_embedding_tp": 'sentence', #options: sentence, reduced
        "feature_combination_method": 'attention', #options concatenation, attention 
    },
    "signature": {
        "dimensions": 3, #options: any int number larger than 1
        "method": 'log', # options: log, sig
        "interval": 1/12
    },
    "classifier": {
        "classifier_name": 'FFN2hidden', # options: FFN2hidden (any future classifiers added)
        "classes_num": '3class', #options: 3class (5class to be added in the future)
    }
}

Notes from meeting with @kasra-hosseini

  • Perhaps could rename encoder_args to text_encoder_args
    • Can maybe add option to load in embeddings if user doesn't want to obtain new ones (can maybe combine this with embedding)
  • Make DyadicSignature() just compute the signatures and deals with them there
  • Refactor embedding to work with the text and signature features obtained by BERT (or otherwise) and path signatures respectively
    • This could deal with alternative methods for combining these two sets of features