- Our goal is to create a model that can predict the effectiveness of chemical compounds against non-small cell lung cancer using complex chemical structured data. The model will be designed to distinguish between compounds that are effective and those that are not.
- input data consists of 2 feature sets: the first set represents the nodes, and the second set represents the edges between these nodes.
- output is if this chemical compound is effective 1 or not 0
- classification and prediction
-
Our dataset is stored in SDF files, which are chemical files that represent each sample as nodes and the edges that connect these nodes, along with the corresponding output. Therefore, we require a specialized function to extract the relevant information from these files.
-
The data is imbalanced
- Addressing this medical problem has the potential to make significant progress in the field of medicine.
- Trial "9" GNN-FiLM Using RandomOverSample was the highest Test Accuracy = 88.37% with those Hyperparameters
- hidden_dim = 32, num_layers = 6, film_parameter_MLP_hidden_layers = 1