/GA-WeightedEditSimilarity

Official implementation of "Weighted Weighted Edit Distance optimized using Genetic Algorithm for SMILES-based Compound Similarity, PAAA(SCIE)".

Primary LanguagePythonMIT LicenseMIT

GA-WeightedEditSimilarity

This repository is the official implementation of "Weighted Edit Distance optimized using Genetic Algorithm for SMILES-based Compound Similarity, PAAA(SCIE)".

Authors: In-Hyuk Choi and Il-Seok Oh

DOI: https://doi.org/10.1007/s10044-023-01141-3

Published: 18.Feb.2023

Introduction

Edit distance(Levenshtein distance) has three operations; insert, delete, substitute. We set each operation to have a different weight, which is Weighted Edit Distance. With Genetic Algorithm(GA), we present optimal weight set of weighted edit distance for each SMILES data.

Environment

conda create -n GA-WeightedEditSimilarity python=3.7 -y
conda activate GA-WeightedEditSimilarity
conda install numpy scipy scikit-learn matplotlib tqdm -y

Run

python main.py -d [e, ic, gpcr, nr]

Dataset

We use four dataset; Enzyme, Ion channel, GPCR, Nuclear receptor.

Citation

@article{choi2023wes,
  title = {{Weighted edit distance optimized using genetic algorithm for SMILES-based compound similarity}},
  author = {In-Hyuk Choi and Il-Seok Oh},
  doi={10.1007/s10044-023-01141-3},
  journal={Pattern Analysis and Applications},
  year={2023}
}