zjunlp/Relphormer

About structrue-enhanced self-attention

mqEpiphany opened this issue · 20 comments

I saw in your paper that the adjacency matrix is generated from the central triplet as structural information and added to self-attention, because your code is very confusing and I can't find the corresponding code, can you tell me the code for this part where is it?
image

Hi, in this code version, we directly put strucural information as adjacency matrix in the self-attention module for simplicity.
It is easy to put structural encoder (an extra linear to encode the power of attention matrix).
Because the structural encoder is actually like a hyper-parameter and you can choose to use this module which is dependable on the actual knowledge graph you choose.
We will complete this part in the future version and also release the code version for relation prediction.

you can find the implementation in the BertSelfAttention module (321 line.) in huggingface_relformer.py,

Hi, thank you for your patient reply. I also want to ask why I can't enter BertSelfAttention in huggingface_relformer.py when I start from the main function for single-step debugging?

Did you first pre-train the model for initialization and then load the pre-trained model?
The hyper-parameters of two stage are different and the second stage will import the huggingface_relformer.py.

When performing pre-training or entity prediction, some files need to be downloaded, but it is very slow, so I downloaded it from https://huggingface.co/bert-base-uncased, am I right? The following is the hyperparameter setting for my running entity prediction.
image

Yes, that is right. I think I find the different point.
You should run the "main.py" in the "Relphormer" directory instead of the same file in the "pre-train" directory.

Sorry, I didn't present clearly, I just put the pre-trained model bert-base-uncased in the "pretrain" directory, this is run the "main.py" in the "Relphormer" directory.
image

ok, can you check the program by putting the debug point in [from models.huggingface_relformer import [BertForMaskedLM]]
It imports the "models.huggingface_relformer" file.

Sorry, I can only find the class BertForMaskedLM, but I can't find the code [from models.huggingface_relformer import [BertForMaskedLM]]

Can you directly import the huggingface_relformer.py file? Because we rewrite the BertForMaskedLM class in this file.

Do you mean to create a new file and try to import the huggingface_relformer.py file?

Yes, you can try do that in the second training stage and it will also make sense.

Single-step debugging of the main function to trainer.fit(lit_model, datamodule=data) will start training, unable to enter huggingface_relformer.py

You can find the line which imports the model class.

model_class = _import_class(f"models.{temp_args.model_class}")

model_class = _import_class(f"models.{temp_args.model_class}")

The hyperparameter setting of args.model_class is BertKGC. Will the model in huggingface_relformer.py be called during the execution of BertKGC?

# from transformers.models.bert.modeling_bert import BertForMaskedLM
from models.huggingface_relformer import BertForMaskedLM
class BertKGC(BertForMaskedLM):
@staticmethod
def add_to_argparse(parser):
parser.add_argument("--pretrain", type=int, default=0, help="")
return parser

The model_class BertKGC is inherited from BertForMaskedLM in huggingface_relformer.py

Doesn't the attention_mask in the BertSelfAttention module (line 321) in huggingface_relformer.py actually build? I saw that attention_mask first came from the image below:
image

Hi, we generate the attention mask for each center triple and you can find in the input of the module.

Can you tell me exactly where in the input module?

masked_head_seq = set()
masked_head_seq_id = set()
masked_tail_seq = set()
masked_tail_seq_id = set()
masked_tail_graph_list = masked_tail_neighbor["\t".join([line[0],line[1]])] if len(masked_tail_neighbor["\t".join([line[0],line[1]])]) < max_triplet else \
random.sample(masked_tail_neighbor["\t".join([line[0],line[1]])], max_triplet)
masked_head_graph_list = masked_head_neighbor["\t".join([line[2],line[1]])] if len(masked_head_neighbor["\t".join([line[2],line[1]])]) < max_triplet else \
random.sample(masked_head_neighbor["\t".join([line[2],line[1]])], max_triplet)