Original csv-file is too big.
Chembl.ipynb - get a SMILES.
dirty_Chembl.ipynb - work with preprocessing data (without API). Loss 10% coincidence, but programm running times a few seconds.
get_act.ipynb - get an activities.
get_act_to_dirty.ipynb - get an activities to data from dirty_Chembl.ipynb.
Visualisation.ipynb - plots for values IC50
sorted.csv - needed columns on DataFrame (on request).
df_train.csv - first 100 fluorinated compounds with their characteristics.
ID.csv - 66300 coincidence SMILES of fluoro- and hydrohenium compounds.
activity_train.csv - activity of first 100 compounds and their analog.
activity_F_66300.csv - activity of all fluorinated compounds which has analogs
total_IC50.csv - IC50 values to all fluorinated compounds and their hydrogenous analogs