Model specifications
Closed this issue · 1 comments
kasra-hosseini commented
I changed the model_specifics
dictionary to a nested one with the following entries. Does this make sense?
model_specifics = {
"encoder_args": {
"col_name_text": "content",
"model_name": "all-MiniLM-L6-v2",
"model_args": {
"batch_size": 64,
"show_progress_bar": True,
"output_value": 'sentence_embedding',
"convert_to_numpy": True,
"convert_to_tensor": False,
"device": None,
"normalize_embeddings": False
}
},
"dim_reduction": {
"method": 'ppapca', #options: ppapca, ppapcappa, umap
"num_components": 10, # options: any int number between 1 and embedding dimensions
},
"time_injection": {
"history_tp": 'timestamp', #options: timestamp, None
"post_tp": 'timestamp', #options: timestamp, timediff, None
},
"embedding":{
"global_embedding_tp": 'SBERT', #options: SBERT, BERT_cls , BERT_mean, BERT_max
"post_embedding_tp": 'sentence', #options: sentence, reduced
"feature_combination_method": 'attention', #options concatenation, attention
},
"signature": {
"dimensions": 3, #options: any int number larger than 1
"method": 'log', # options: log, sig
"interval": 1/12
},
"classifier": {
"classifier_name": 'FFN2hidden', # options: FFN2hidden (any future classifiers added)
"classes_num": '3class', #options: 3class (5class to be added in the future)
}
}
rchan26 commented
Notes from meeting with @kasra-hosseini
- Perhaps could rename
encoder_args
totext_encoder_args
- Can maybe add option to load in embeddings if user doesn't want to obtain new ones (can maybe combine this with
embedding
)
- Can maybe add option to load in embeddings if user doesn't want to obtain new ones (can maybe combine this with
- Make
DyadicSignature()
just compute the signatures and deals with them there - Refactor
embedding
to work with the text and signature features obtained by BERT (or otherwise) and path signatures respectively- This could deal with alternative methods for combining these two sets of features