yoshida-lab/XenonPy

data source and new data

Closed this issue · 2 comments

  1. Don't you add Prof. Oguchi's atomic data made of his first-principles calculation? I think that you have them.
  2. Is it possible to show original feature values with NaN and the interpolated ones? How about changing them with a flag?
  3. Can you show data sources of the features? Sometimes their names are vague to judge what is the descriptor. You can't correct the values if you don't know what they are exactly.

@nim-hrkn Sorry late.
About 1. Yes, we have electronic density data but Yoshida-sensei want to release them later after discussion with Oguchi-sensei. For now, we are trying to import DFT data from OQMD database.
For 2. You can load original data as following:

from xenonpy.preprocess.datatools import Loader
load = Loader()  # init with None params
ele = load('elements')
ele.info()

For 3. Elements data are summaried from several sources. You can get a simple description about each features from Dataset page.

This project is very young and documentation is also very lacking. Weare planing to enhance the quality of the documentation after current feature development.

For 2 and 3.
You use the descriptors that has proper relationship to or at least have something to do with the target variable. I feel that it is dangerous to simply use all the descriptors of xenonpy because it is much less likely that e.g., the band gap has closely related to the features related to the lattice properties.
I believe that It is important to make what they are (and their origins) clear and give users much explanations to select them.