
How to prepare other training dataset

Closed this issue · 3 comments

As the author shows, we can use a dataset of small structures to predict large-scale material systems. But I felt confused about the "appropriate dataset of small structures".

  1. How many small structures in the dataset?
  2. The author gave us a Bi36 dataset to predict Bi244, why do you use Bi36, instead of Bi16, Bi4?
  3. What does "have close chemical bonding environment" mean, the same spacegroup?
mzjb commented

One needs to try serval times to find out the proper number of structures and atoms of the dataset. Larger number of atoms in each structure allows chemical environments with more complexity to be included, and of course, it is more time-consuming to generate such dataset. "Chemical environments" means atomic positions and types of neighbors rather than the space group.

Does "atomic positions and types of neighbors" mean the same elements and the same ratio of each element?

mzjb commented

Does "atomic positions and types of neighbors" mean the same elements and the same ratio of each element?

It means that there are similar local atomic structures within the 'nearsightedness' length scales in both the dataset and the structure to be predicted.