Question about WADI dataset

Question

Question about WADI dataset

Opened this issue 10 months ago · 1 comments

Hello, thanks for sharing your excellent work!

I have some specific questions about the WADI dataset in experiments and hope you can answer them:

While working with the WADI dataset, I encountered the following error:
"In 'D:\PythonProjects\FuSAGNet-main\models\FuSAGNet.py', at line 296, within the 'forward' method, an issue arises when attempting to reshape the 'all_embeddings' tensor with the view function - 'embeds = all_embeddings.view(node_num, -1)'. This action triggers a RuntimeError, as the specified shape '[112, -1]' is incompatible with the size of the input, which is 8128.
After a thorough debugging process, I identified that this error correlates with the code in lines 108 and 109 of the provided 'main.py' file:
elif env_config["dataset"] == "wadi": process_dict = {"P1": 19, "P2": 90, "P3": 15, "P4": 3}
The root of the problem seems to stem from the cumulative number of sensors in the 'process_dict' for the four processes P1 to P4, which totals 127 ('P1': 19 + 'P2': 90 + 'P3': 15 + 'P4': 3 = 127). Consequently, the operation 'all_embeddings.view(node_num, -1)' fails as the total number is not divisible by 'node_num'.

This issue is further elucidated in the 'FuSAGNet.py' file, around line 285, where the code is as follows:
` for j, embedding in enumerate(self.embeddings):
sensors = torch.arange(embedding.num_embeddings).to(device)
embedded = embedding(sensors)
embedded = embedded.unsqueeze(0)
embedded, _ = self.rnn_embedding_modulesj
embedded = embedded.squeeze()
embedded_sensors.append(embedded)
y_process.extend(batch_num * [j for _ in range(embedded.size(0))])

        y_process = torch.tensor(y_process).to(device)
        all_embeddings = torch.cat(embedded_sensors) #torch.Size([127, 64])
        embeds = all_embeddings.view(node_num, -1)

`

Upon further investigation, I discovered that the shape of 'all_embeddings' is 'torch.Size([127, 64])', where 127 equals the sum of the 'process_dict' ('P1': 19, 'P2': 90, 'P3': 15, 'P4': 3) and 64 represents the embedding dimensions. However, the feature count (node_num) for the WADI dataset I am using is 112, leading to a division error in the 'all_embeddings.view(node_num, -1)' operation.

Could you inform me what values were assigned to P1, P2, P3, and P4 in the 'process_dict' when you used the WADI dataset?

Alternatively, could you provide the 'data/wadi/list.txt' file used in your case? Perhaps the feature count (node_num) of the WADI dataset I am using is incorrect and should not be 112.

I think this is an excellent paper, and I hope to know more experiment details and analysis. Sincerely look forward to your reply and assistance.

Answer 1 · 2024-04-05T02:38:32.000Z

Hi @MyAmbitious, sorry for the late reply.

The number of respective sensors under P1, P2, P3, and P4 I wrote for WADI should be correct.
I presume you have a different version of the WADI dataset (it gets updated from time to time), hence the different number of sensors.
If this is the case, try using a different version of the dataset which has a total of 127 sensors, or you can adjust process_dict per your version.

Thanks.