Model specifications

Question

Model specifications

Closed this issue a year ago · 1 comments

I changed the model_specifics dictionary to a nested one with the following entries. Does this make sense?

model_specifics = {
    "encoder_args": {
        "col_name_text": "content",
        "model_name": "all-MiniLM-L6-v2",
        "model_args": {
            "batch_size": 64,
            "show_progress_bar": True,
            "output_value": 'sentence_embedding', 
            "convert_to_numpy": True,
            "convert_to_tensor": False,
            "device": None,
            "normalize_embeddings": False
        }
    },
    "dim_reduction": {
        "method": 'ppapca', #options: ppapca, ppapcappa, umap
        "num_components": 10, # options: any int number between 1 and embedding dimensions
    },
    "time_injection": {
        "history_tp": 'timestamp', #options: timestamp, None
        "post_tp": 'timestamp', #options: timestamp, timediff, None
    },
    "embedding":{
        "global_embedding_tp": 'SBERT', #options: SBERT, BERT_cls , BERT_mean, BERT_max
        "post_embedding_tp": 'sentence', #options: sentence, reduced
        "feature_combination_method": 'attention', #options concatenation, attention 
    },
    "signature": {
        "dimensions": 3, #options: any int number larger than 1
        "method": 'log', # options: log, sig
        "interval": 1/12
    },
    "classifier": {
        "classifier_name": 'FFN2hidden', # options: FFN2hidden (any future classifiers added)
        "classes_num": '3class', #options: 3class (5class to be added in the future)
    }
}

Answer 1 · 2022-10-26T13:54:27.000Z

Notes from meeting with @kasra-hosseini

Perhaps could rename encoder_args to text_encoder_args
- Can maybe add option to load in embeddings if user doesn't want to obtain new ones (can maybe combine this with embedding)
Make DyadicSignature() just compute the signatures and deals with them there
Refactor embedding to work with the text and signature features obtained by BERT (or otherwise) and path signatures respectively
- This could deal with alternative methods for combining these two sets of features