/ochem-data

published organic chemistry data

Primary LanguagePythonCreative Commons Attribution 4.0 InternationalCC-BY-4.0

ochem-data

This repo compiles published organic chemistry data, including raw data and calculated descriptor sets for molecules and reactions.

File structure:

Each folder contains one reaction (see publication list for details).

Every reaction folder was divided into two sub-folders:

  • mols: a list of molecules categorized by reaction roles)
  • rxns: a list of reaction entries with multiple reaction components and yield).

Reading data directly from this repo

import pandas as pd

REPO_PATH = 'https://raw.githubusercontent.com/beef-broccoli/ochem-data/main/'
FP = 'deoxyF/paper-dft/train.csv'  # change this  
df = pd.read_csv(REPO_PATH + FP)

# do things with df...

Each subfolder (ohe, mol2vec, mordred...) includes different descriptor encodings for the list of molecules or reaction entries

Publications for reaction data:

deoxyF

Vbur

CN

asym_epox

CHaryl-1

kraken

nib

Publications/resources for descriptor sets: