prisms-center/CASMcode

Enumeration to reach my target structure (B_3AC_6) for non stoichiometric compound

WalterWhite1611 opened this issue · 17 comments

After exporting this cif file into POSCAR,
A 0.5000 0.000000 0.000000 0.000000 Biso 1.000000 A
B 0.7500 0.333333 0.666667 0.000000 Biso 1.000000 B
C 1.0 0.333333 0.000000 0.250000 Biso 1.000000 C I got 12 atoms in the POSCAR. Now considering the partial occupancy to initialize CASM project I created JSON script for enumeration, which head looks like,

{ "basis" : [ { "coordinate" : [ -0.000000000000, -0.000000007502, 0.000000000002 ], "occupants" : [ "A", "Va" ] }, { "coordinate" : [ 0.500000000000, -0.000000007502, 0.000000000002 ], "occupants" : [ "A", "Va" ] }, { "coordinate" : [ -0.000000000000, 0.666666659165, 0.333333333335 ], "occupants" : [ "B", "Va" ] }, { "coordinate" : [ -0.000000000000, 0.333333325832, -0.333333333332 ], "occupants" : [ "B", "Va" ] }, { .....
And then I enumerated for ccasm enum -m ScelEnum -i '{"min":8,"max": 8}', There I aware that I have to deal with 2 * 2 * 2 supercell. So I pick SCEL8_2_2_2_0_0_0 supercell where I got 96 atoms ( A = 16 , B = 32 , C = 48) from POSCAR file. Now my target structure is B_3AC_6. For that I need only 80 atoms ( A = 8, B = 24, C = 48) in POSCAR considering partial occupancy of A (Vacancy= 0.5) and B (Vacancy = 0.25), also maintaining the ratio A:B:C = 1:3:6. So I want to reach 96 atoms to 80 atoms with exact enumeration.

For SCEL8_2_2_2_0_0_0 supercell I made a settings.json file,
{ "scelnames": ["SCEL8_2_2_2_0_0_0"], "fill": { "supercells": { "min": 1, "max": 1, "fixed_shape": true } }, "output_configurations": true, "primitive_only": false, "output_configurations_options": { "path": "conventional.enum.json", "properties": ["poscar"], "json": true } }
Then I wrote for enumerating all occupations (Exhaustive enumeration) with ccasm enum --method ConfigEnumAllOccupations --settings settings.json. But It's getting to expensive computationally. So, I go with random occupations (Stochastic enumeration) with ccasm enum --method ConfigEnumRandomOccupations --settings settings.json , which created initially 100 configurations by default. I can vary the number manually. But I want to detect exactly 80 atoms (A = 8, B =24, C=48) from this random configurations enumeration. How to proceed? I hope I able to make you understand.

Thank You

xivh commented

You can filter your enumeration output by properties (see casm query --help properties and casm query --help operators). In this case, you can use comp_n, which is the number of each species per unit cell. Try this settings file:

{
  "scelnames": [
    "SCEL8_2_2_2_0_0_0"
  ],
  "fill": {
    "supercells": {
      "min": 1,
      "max": 1,
      "fixed_shape": true
    }
  },
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3), eq(comp_n(C), 6))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

Due to the filter and the random occupations, I don't think you are guaranteed to get 100 unique configurations each time. So the POSCARs inside conventional.enum.json may not be unique. If you want to get the unique POSCARs, select the volume 8 configurations in your supercell with casm select -c ALL --set 're(scelname,"SCEL8_2_2_2_0_0_0")' and then do casm query --write-pos to write POS files into the training_data folder.

You can filter your enumeration output by properties (see casm query --help properties and casm query --help operators). In this case, you can use comp_n, which is the number of each species per unit cell. Try this settings file:

{
  "scelnames": [
    "SCEL8_2_2_2_0_0_0"
  ],
  "fill": {
    "supercells": {
      "min": 1,
      "max": 1,
      "fixed_shape": true
    }
  },
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3), eq(comp_n(C), 6))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

Due to the filter and the random occupations, I don't think you are guaranteed to get 100 unique configurations each time. So the POSCARs inside conventional.enum.json may not be unique. If you want to get the unique POSCARs, select the volume 8 configurations in your supercell with casm select -c ALL --set 're(scelname,"SCEL8_2_2_2_0_0_0")' and then do casm query --write-pos to write POS files into the training_data folder.

After adding comp_n tag in my settings.json, "filter": "and(eq(comp_n(A), 8), eq(comp_n(B), 24), eq(comp_n(C), 48))", getting this error ,
Begin enumeration Enumerate configurations for: SCEL8_2_2_2_0_0_0 Segmentation fault (core dumped)

and for your second suggestion I abled to extract POSCAR files. But not for my required ratio. Because varying "supercells": { "min": , "max": , range I already extracted 300+ POSCAR files. Now, it's really a hectic job to find my required structure. Is there any way to get my configurations poscars in a single text file, then may be I able to search my needful structure.

P.S. My target is find all possible configuration's poscars of A = 8, B = 24, C = 48.

Thank You so much xivh for your all suggestions.

xivh commented

Can you upload your prim? You should set comp_n as the number of species per primitive cell and not the number per unit cell, but I am not sure why it has a segfault.

To get all the POSCARs into one file, make a selection with the configurations you want:
casm select -c ALL --set selected
casm select --set 'and(eq(comp_n(A), 1), eq(comp_n(B), 3), eq(comp_n(C), 6))'
casm select --set-off 'not(re(scelname,"SCEL8_2_2_2_0_0_0"))'
and then write to a json file:
casm query -k poscar -o poscars.json

You can also remove the training_data/SCEL* folders and regenerate just the ones you want with casm query --write-pos.

Yah sure. Here is my prim,
prim.json

I also wrote, "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3), eq(comp_n(C), 6))", but faced same issue.

xivh commented

You need to create and select a composition axis with casm composition --calc and casm composition --select.

As you mentioned, it is pretty slow to enumerate all configurations. But if you enumerate randomly, you might not get that many results. You can get better results by enumerating by clusters. Here is the settings file:

{
  "confignames": "SCEL8_2_2_2_0_0_0/0",
  "cluster_specs":
  {
    "method": "periodic_max_length",
    "params":
    {
      "orbit_branch_specs":
      {
	
        "1": {"max_length": 0},
	"2": {"max_length": 10},
	"3": {"max_length": 10}
      }
    }
  },
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

In confignames you put the starting configuration with no vacancies. Then the orbit_branch_specs is enumerating clusters of vacancies up to some max_length. In this case, it makes all single vacancies, pairs of vacancies up to 10 angstroms, and triplets of vacancies up to 10 angstroms. At the end, it applies the same filter to get the correct compositions. You can read more about it in casm enum --desc ConfigEnumAllOccupations.

You need to create and select a composition axis with casm composition --calc and casm composition --select.

As you mentioned, it is pretty slow to enumerate all configurations. But if you enumerate randomly, you might not get that many results. You can get better results by enumerating by clusters. Here is the settings file:

{
  "confignames": "SCEL8_2_2_2_0_0_0/0",
  "cluster_specs":
  {
    "method": "periodic_max_length",
    "params":
    {
      "orbit_branch_specs":
      {
	
        "1": {"max_length": 0},
	"2": {"max_length": 10},
	"3": {"max_length": 10}
      }
    }
  },
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

In confignames you put the starting configuration with no vacancies. Then the orbit_branch_specs is enumerating clusters of vacancies up to some max_length. In this case, it makes all single vacancies, pairs of vacancies up to 10 angstroms, and triplets of vacancies up to 10 angstroms. At the end, it applies the same filter to get the correct compositions. You can read more about it in casm enum --desc ConfigEnumAllOccupations.

I am facing some issue regarding how to deal with the cmposition axes, for my case I got for my B3AC6 system,
`` Possible composition axes:

   KEY     ORIGIN          a          b     GENERAL FORMULA
   ---        ---        ---        ---     ---
     0 A(2)Va(4)C(6) A(2)B(4)C(6) Va(6)C(6)     A(2-2b)Va(4-4a+2b)B(4a)C(6)
     1 Va(6)C(6) Va(2)B(4)C(6) A(2)Va(4)C(6)     A(2b)Va(6-4a-2b)B(4a)C(6)
     2 Va(2)B(4)C(6) A(2)B(4)C(6) Va(6)C(6)     A(2a)Va(2-2a+4b)B(4-4b)C(6)
     3 A(2)B(4)C(6) Va(2)B(4)C(6) A(2)Va(4)C(6)     A(2-2a)Va(2a+4b)B(4-4b)C(6) ``

now how to incorporate manually Va = 0.5 for A (0.5 occupied site) and Va = 0.25 for B (0.75 occupied).

Secondly, I proceed my calculation with composition axes "0", and got,

`` Parametric composition:
comp(a) = -0.0833333*(comp_n(A) - 2) - 0.0833333*(comp_n(Va) - 4) + 0.166667comp_n(B)
comp(b) = -0.333333
(comp_n(A) - 2) + 0.166667*(comp_n(Va) - 4) + 0.166667*comp_n(B)

Composition:
comp_n(A) = 2 - 2comp(b)
comp_n(Va) = 4 - 4
comp(a) + 2comp(b)
comp_n(B) = 4
comp(a)
comp_n(C) = 6

Parametric chemical potentials:
param_chem_pot(a) = 4chem_pot(B)
param_chem_pot(b) = -2
chem_pot(A) ``

again same query, how to put comp(a) = 0.5 and comp(b) = 0.75 ? Or it's taking systemetically after adding filter tag in 'settings.json' ?

Then I enumerated by clusters for this settings files, settings2.json, here I have taken /1071 because this configuration (which I got when I enumerated this file
settings1.json
contains A = 8, B =24 , C =48 . And for "SCEL8_2_2_2_0_0_0/0" this contains A = 8, B = 20, C =48, so I didnt get any A =8, B = 24 type POSCARS. So I prefered ccasm enum --method ConfigEnumRandomOccupations --settings settings2.json with "SCEL8_2_2_2_0_0_0/1071". Am I proceeding in correct path?

Lastly, which configuration contains no vacancies ( are u talking about 96 atoms according to my structure)? Is there any process to reach my required all possible enumerated POSCARs (A =8, B = 24, C =48) direct from enumerating "scelnames": "SCEL8_2_2_2_0_0_0"` by clusters , settings3.json.

Pardon me if I have asked any stupid questions. Thank You xivh.

xivh commented

The composition axes that you set with casm composition are the parametric composition axes. The comp(a) and comp(b) are not the A and B atoms, they are just indices for the parametric composition axes. The relationship between the parametric composition axis and the composition of your A, B, C, and Va is given by this part:

Composition:
comp_n(A) = 2 - 2comp(b)
comp_n(Va) = 4 - 4comp(a) + 2comp(b)
comp_n(B) = 4comp(a)
comp_n(C) = 6

This means that if you want to have 1 A atom per primitive cell, you set comp(b) = 0.5. If you want to have 3 B atoms per primitive cell, then you set comp(a) = 0.75. But for your enumeration, you do not have to use the parametric axes. You can directly set comp_n for each species as I showed above. In comp_n(A), the A is for the A atoms, comp_n(B) is for the B atoms, etc.

What I was suggesting was that you start with your structure containing no vacancies, then enumerate clusters of vacancies onto it. Now I realize that in the volume 8 supercell, you want more than 3 vacancies, which means that triplet clusters are not enough. So starting with your desired composition that already has vacancies and enumerating clusters around that composition makes sense - I think that is what you did? This will give you structures that are similar to your starting configuration but with different vacancy orderings.

Another option is to use the fill command to enumerate in smaller supercells and tile them into larger ones, but I am not sure if that is possible in the conda version of CASM (it is in the latest development build).

I am not sure what the most efficient way is to get all possible orderings. Is there a reason that you need all of them? I enumerated just vacancies on the A sites in the volume 8 cell, and there are already around 1800.

{
  "confignames": "SCEL8_2_2_2_0_0_0/0",
  "sublats": [1, 2],
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

Thank you xivh. Now I totally understood the role of composition axis. Yes firstly I got one configuration of A =8, B = 24 C = 48 80 atoms from volume 8 ( 96 atoms) supercell (SCEL8_2_2_2_0_0_0) writing this command ccasm enum --method ConfigEnumRandomOccupations, but there after as I wrote same command again and again, configurations number increasing, as expected due to random occupations ( correct me if I wrong). So, any how I started "cluster enumeration" with top first configuration, SCEL8_2_2_2_0_0_0/0 with ccasm enum --method ConfigEnumAllOccupations, and getting 256 configurations ( when I used triplet of vacancies up to 10 Å ). But as u mentioned triplet clusters are note enough, so I used "sublats": [1, 2] tag as u suggested. Kindly explain physical significance of this line. So, after that I got 3888 configurations.

Now, the main issue is my starting CIF fie "space group" symmetry was P-3c1(165) and α = β = 90°, γ =120° which must be preserved through out enumeration. But in this calculation I faced that anyhow space group symmetry is breaking, because angles value changed drastically. So, is their anyway to mention space group symmetry during or before starting the enumeration?

No, I dont need all of configurations, even my motive is how to truncate most of them configurations and just pick 1 with minimum energy , which will most stable to proceed further DFT calculations in VASP. Provide all valuable suggestions regarding these issues.

Thank You

The composition axes that you set with casm composition are the parametric composition axes. The comp(a) and comp(b) are not the A and B atoms, they are just indices for the parametric composition axes. The relationship between the parametric composition axis and the composition of your A, B, C, and Va is given by this part:

Composition:
comp_n(A) = 2 - 2comp(b)
comp_n(Va) = 4 - 4comp(a) + 2comp(b)
comp_n(B) = 4comp(a)
comp_n(C) = 6

This means that if you want to have 1 A atom per primitive cell, you set comp(b) = 0.5. If you want to have 3 B atoms per primitive cell, then you set comp(a) = 0.75. But for your enumeration, you do not have to use the parametric axes. You can directly set comp_n for each species as I showed above. In comp_n(A), the A is for the A atoms, comp_n(B) is for the B atoms, etc.

What I was suggesting was that you start with your structure containing no vacancies, then enumerate clusters of vacancies onto it. Now I realize that in the volume 8 supercell, you want more than 3 vacancies, which means that triplet clusters are not enough. So starting with your desired composition that already has vacancies and enumerating clusters around that composition makes sense - I think that is what you did? This will give you structures that are similar to your starting configuration but with different vacancy orderings.

Another option is to use the fill command to enumerate in smaller supercells and tile them into larger ones, but I am not sure if that is possible in the conda version of CASM (it is in the latest development build).

I am not sure what the most efficient way is to get all possible orderings. Is there a reason that you need all of them? I enumerated just vacancies on the A sites in the volume 8 cell, and there are already around 1800.

{
  "confignames": "SCEL8_2_2_2_0_0_0/0",
  "sublats": [1, 2],
  "filter": "and(eq(comp_n(A), 1), eq(comp_n(B), 3))",
  "output_configurations": true,
  "primitive_only": false,
  "output_configurations_options": {
    "path": "conventional.enum.json",
    "properties": [
      "poscar", "is_primitive", "comp_n", "atom_frac"
    ],
    "json": true
  }
}

Using prim.symmetrized.json file, I'm getting configurations of α =120° , β = 90°, γ =90°, where original initial CIF file was, α = β = 90°, γ =120°. how to resolve this! I want α = β = 90°, γ =120° of my all configurations.

xivh commented

The sublats tag restricts the enumeration to some sites and leaves the rest fixed. I just wanted to show that by enumerating only on those sites you already have a lot of configurations, so the total number of configurations will be very large.

If you want to find the lowest energy structure, then you should start with whatever knowledge you have about the system and enumerate from there. For example, if there is a particular vacancy ordering that is stable, you can enumerate perturbations around that. If it's a particular concentration, then maybe it is better to enumerate all configurations of that composition in small supercells first.

After doing some calculations, you can use the cluster expansion to predict the formation energy of new structures and see if any break your convex hull.

I am not sure why the space group symmetry changed. With regards to the lattice vectors, you can force casm to use a particular orientation of the prim with casm init --force. Once you start making supercells, the angles are not necessarily going to be preserved. If you provide your own prim orientation and make a 2x2x2 supercell I would expect them to be the same though.

The sublats tag restricts the enumeration to some sites and leaves the rest fixed. I just wanted to show that by enumerating only on those sites you already have a lot of configurations, so the total number of configurations will be very large.

If you want to find the lowest energy structure, then you should start with whatever knowledge you have about the system and enumerate from there. For example, if there is a particular vacancy ordering that is stable, you can enumerate perturbations around that. If it's a particular concentration, then maybe it is better to enumerate all configurations of that composition in small supercells first.

After doing some calculations, you can use the cluster expansion to predict the formation energy of new structures and see if any break your convex hull.

I am not sure why the space group symmetry changed. With regards to the lattice vectors, you can force casm to use a particular orientation of the prim with casm init --force. Once you start making supercells, the angles are not necessarily going to be preserved. If you provide your own prim orientation and make a 2x2x2 supercell I would expect them to be the same though.

I get your point. I have already extracted 450 A = 8, B = 24, C =48 configurations. As i dont have any prior knowledge of the stable configuration, I want to calculate formation energy and convex hull of these 450 configurations by cluster expansion, then I will able to pick that particular structure with minimum formation energy to further VASP calculation. Kindly, provide me further steps to extract the formation energy value of those configurations.

Yes , I did it also with--force tag. But their I saw 222 supercell wrting their lattice vectors according to their orientation( I think niggli cell according).

xivh commented

You have to run the DFT calculations first. You can use casm-calc --setup to write all the input files.

You have to run the DFT calculations first. You can use casm-calc --setup to write all the input files.

This error is showing,
Traceback (most recent call last):
File "/home/rajdeep-boral/anaconda3/envs/casm/bin/casm-calc", line 8, in
sys.exit(main())
File "/home/rajdeep-boral/anaconda3/envs/casm/lib/python3.9/site-packages/casm/scripts/casm_calc.py", line 130, in main
raise e
File "/home/rajdeep-boral/anaconda3/envs/casm/lib/python3.9/site-packages/casm/scripts/casm_calc.py", line 106, in main
open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/rajdeep-boral/CASM_projects/LiTiCl/training_data/settings/calctype.default/calc.json'

xivh commented

You will need to create the files described in casm format --vasp. It mentions relax.json, but you may need to call that file calc.json.

I recommend running the VASP calculations yourself if possible. Then, you can read the VASP data back into CASM by placing your final runs into the folder calctype.default/run.final, creating a status.json file, and running casm-calc --report and casm update.

status.json for every calctype.default folder:

{
  "status": "complete"
}

I got 450 configurations of 80 atoms (A = 8, B = 24, C= 48), are you saying me to run DFT calculation of these 450 configurations outside of CASM?? But doing these huge number of DFT calculations is computationaly very much expensive. My motive is to pick a most stable structure within these configurations and then I will do further SCF, DOS calculation. Is there any way to do this in CASM?

xivh commented

That is why we usually start with the primitive cell and enumerate smaller cells instead of a large supercell. You need some DFT energies to train the cluster expansion. Also, if your system does not have a lot of distortions, you will not need as many structures.