This repository contains python scripts and Jupyter notebooks to download and analyse the data from:
- Materials project (MP)
- Open quantum materials database (OQMD)
- Inorganic crystal structure database (ICSD)
A periodic table combining data from mendeleev
, ase
, pymatgen
and other custom descriptors is provided.
- Requirements
- Materials Project (MP)
- Inorganic crystal structure database (ICSD)
- Open quantum materials database (OQMD)
- Periodic table
- References
The download scripts require
pip install requests
pip install pandas
pip install jupyter
pip install pymatgen
The notebooks require
pip install seaborn
pip install sklearn
pip install umap-learn
pip install mendeleev
pip install ase
Note that the same version of Pandas must be used to save and load the .pkl
binary files, otherwise you will get errors.
Add your your API key by creating a file mp/api_key.json
as
echo '{"api_key":"******************"}' > mp/api_key.json
Download all MP data with
python mp/download.py
A pandas.DataFrame
object will be saved as mp/materials_project.pkl
.
See the example_mp.ipynb
and example_pca.ipynb
Jupyter notebooks for usage examples.
Your ICSD credentials by creating a file icsd/icsd_credentials.json
as
echo '{"loginid":"**********","password":"****************"}' > icsd/icsd_credentials.json
Download all ICSD cif
strings with
python icsd/download.py
A pandas.DataFrame
object will be saved in binary format in the file icsd/icsd_cifs.pkl
. Extract information from the cif
strings it contains with
python icsd/augment.py
which will extract new columns :
- id
- _database_code_ICSD
- _chemical_formula_structural
- _chemical_formula_sum
- _cell_length_a
- _cell_length_b
- _cell_length_c
- _cell_angle_alpha
- _cell_angle_beta
- _cell_angle_gamma
- _cell_volume
in a new pandas.DataFrame
saved in icsd/all_icsd_cifs_augmented.pkl
file. The data is also saved in a .csv
files icsd/icsd_formulas_all.csv
, but without the cif
column. Two additional files are also saved, icsd_formula_structural_integer.csv
and icsd_formula_sum_integer.csv
which contain stochiometric compounds only.
See the example_icsd.ipynb
Jupyter notebooks for usage examples (along with example_mp_vs_icsd.ipynb
if you have MP downloaded).
Download all of OQMD materials (can take up to a few days) with
python oqmd/download.py
A pandas.DataFrame
object will be saved in binary format in the file oqmd/oqmd.pkl
.
Build the periodic table with
python ptable/build.py
A pandas.DataFrame
object will be saved in binary format in the file ptable/ptable.pkl
.
See the example_descriptors.ipynb
Jupyter notebook for usage examples.
- https://materialsproject.org/open describes the various ways to acces the data. The present code used the pymatgen wrapper.
- http://oqmd.org/static/docs/getting_started.html#setting-up-the-database provides a way to access materials:
qmpy-rester
but it is inpython 2.7
. Readingqmpy
's source code helped make the full download script with therequests
python package to access web ressources.
- https://icsd.fiz-karlsruhe.de/api/ is more of an GUI to the API. You can experiment with it and find what is accessible. The HTML queries can then be translated from
curl
to python package to access web ressources.
- https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/
- then get lost and ask yourself how http works: https://medium.com/better-programming/writing-your-own-http-server-introduction-b2f94581268b
- Useful curl to python translation: https://curl.trillworks.com
- data scraping: https://github.com/hegdevinayi/icsd-queryer
- cambridge cristallographic data center: https://www.ccdc.cam.ac.uk/support-and-resources/support/case/?caseid=c344eefa-6b74-e811-91fa-005056975d8a
- AIIDA database importer: https://aiida-core.readthedocs.io/en/stable/import_export/dbimporters/icsd.html. This one is interesting, but it requires an intranet version of the database.
- re3data.org : https://www.re3data.org/repository/r3d100010085
- matminer retreiver for MDF, MPDS, OQMD and MongoDB : https://hackingmaterials.lbl.gov/matminer/matminer.data_retrieval.html#