Codebase for the paper "Application of Attention Mechanism-Based Neural Networks in Predicting Chemical Molecular Properties". Includes model training, evaluation, and data processing scripts.
Attention-Based-Chem-Predictor/
│
├── edit-Attention-split.py # Attention-based neural network model
├── edit-BP-split.py # Backpropagation neural network model
├── edit-RBF-split.py # Radial Basis Function neural network model
├── draw.m # MATLAB script for plotting loss curves
├── predictions.xlsx
├── predictions_optimized.xlsx
├── LICENSE
├── README.md
├── README_zh.md
│
├── history/ # training loss history
│ ├── Attention_loss_history.txt
│ ├── BP_loss_history.txt
│ └── RBF_loss_history.txt
│
└── output/ # output plots
├── Attention_loss_plot.png
├── BP_loss_plot.png
└── RBF_loss_plot.png
This project implements and compares three different neural network models for predicting chemical molecular properties:
- Attention-based Neural Network
- Backpropagation Neural Network
- Radial Basis Function Neural Network
Each model is implemented in a separate Python script and uses the same dataset (data.csv
) for training and evaluation.
Before running any scripts, ensure you have downloaded the data.csv
file from the Releases section and placed it in the project's root directory.
To run any of the neural network models:
- Ensure you have the required dependencies installed (pandas, numpy, tensorflow, scikit-learn).
- Run the desired script:
python edit-Attention-split.py
python edit-BP-split.py
python edit-RBF-split.py
Each script will:
- Load and preprocess the data from
data.csv
- Split the data into training and testing sets
- Train the respective neural network model
- Evaluate the model and save predictions to
predictions.xlsx
- Save the loss history to a text file in the
history/
directory
To visualize the training and validation loss:
- Open
draw.m
and set thechoose
variable to select the desired model (1 for Attention, 2 for RBF, 3 for BP). - Run the script.
The script will generate a loss plot and save it as a PNG file in the output/
directory.
The data.csv
file contains the dataset used for training and testing the models. It includes 100 feature columns and 3 target columns (y1, y2, y3).
Due to GitHub's file size limitations, the data.csv
file is not included in the main repository. You can download it from the Releases section of this project. Please download and place it in the root directory of the project before running the scripts.
The training process generates loss history files in the history/
directory. These files contain the training and validation loss for each epoch.
The output/
directory contains PNG images of the loss curves for each model, generated using the MATLAB script.
This project is licensed under the Apache License, Version 2.0. See the LICENSE
file for details.