The x-ray_scripting_out
pipeline processes X-ray Absorption Spectroscopy (XAS) output data generated by the ORCA quantum chemistry software. Its primary purpose is to leverage the force oscillator strength and transition density matrices to derive the core-virtual coupling molecular orbitals (MOs) represented as matrices.
The updated output format integrates transition intensity from the matrix densities with their force oscillator strengths to build the matrices that describe MOs in the core and virtual spaces. These matrices encapsulate the core-virtual coupling MOs, as exemplified below:
1. Number of transition intensities
2. Transition intensity probability
3. Force oscillator strenght
The pipeline is implemented in Shell script and is best suited for a Linux operating system.
To execute the pipeline, you can use either manager.sh
or overall.sh
.
The input required to run this pipeline is an XAS output file from ORCA, generated using either ROCIS/DFT
or PNO-ROCIS/DFT
. This input file must include the molecular orbital (MO) Löwdin population and the standard format of the transition intensities and probabilities for each excited state, along with a list of coupling MOs.
You can specify a localized group of atoms involved in the coupling MO transitions, allowing for focused analysis of transitions between two sets of atoms, such as two amino acids in a protein.
Clone the x-ray_scripting_out
repository using Git
$ git clone https://github.com/caraortizmah/x-ray_scripting_out.git
The pipeline can be run in two ways: a simpler, more automated approach using helper_man.sh
, or a more customizable option with manager.sh
.
manager.sh
: This is the primary script that executes all the pipeline steps in a sequential (noticeable) order, as indicated by their step-specific names.helper_man.sh
: This provides an easier method by reading the required parameters from a separate file, namedconfig.info
.
Run the following command:
$ ./helper_man.sh
helper_man.sh
uses the information in config.info
to execute manager.sh
.
Read further about the config.info file
The config.info
file is self-explanatory, formatted as a two-column table (NAME and FLAG).
The NAME column describes the parameter, option, or condition, while the FLAG column specifies the values that manager.sh
will directly apply to the ORCA outputs.
Please do not alter the file format, such as lines, dashes, or naming conventions. Additionally, do not modify any NAME or FLAG entries.
The following parameters are MANDATORY:
Atom_number_range_A
Atom_number_range_B
core_MO_range
exc_state_range
soc_option
orca_output
The following parameters are OPTIONAL:
spectra_option
external_MO_file
atm_core
wave_f_type
input_path
output_path
-
Atom_number_range_A
andAtom_number_range_B
: Specify the range of atom sequential numbers in the coordinates used in the XAS ORCA output file (orca_output
). Note that the enumeration starts from 0 for the first atom. -
Atom_number_range_A
: Atoms of the core space. -
Atom_number_range_A
: Atoms of the virtual space. -
core_MO_range
: Defines the range of core molecular orbitals (MOs) for the target atom, e.g., C. To study specific core MOs, such as 4 and 15, run the pipeline separately for each, settingcore_MO_range = 4-4
for one andcore_MO_range = 15
for the other. Ifcore_MO_range = 4-15
is specified, the program processes the entire sequential range, following the same logic as the atom number range flags (Atom_number_range_A
andAtom_number_range_B
). -
exc_state_range
: Specifies the range of excited states to analyze, based on those computed inorca_output
. It follows the same format ascore_MO_range
,Atom_number_range_B
andAtom_number_range_A
. -
soc_option
: Accepts 0 or 1, where 0 excludes spin-orbit coupling effects, and 1 includes them (e.g., for sulfur L-edge analysis). -
orca_output
: Refers to the XAS ORCA output file, compatible with ORCA versions 4 and 5.0.4. Note that ORCA 6.0 introduces a substantially different output format, which will be supported in a future update. -
spectra_option
(optional): Accepts 0 or 1. Default is 0 (recommended). Option 1 allows advanced analysis (beta), particularly forsoc_option = 1
, though 0 is still advised unless further testing is conducted. -
external_MO_file
(optional): An ORCA file containing Löwdin population data. Ensure that the ORCA input includes the flag!Normalprint
to output Löwdin populations. This flag allows workflow separation from theorca_output
file. Read more about ORCA input. -
atm_core
(optional): Atomic symbol of the target atom, e.g., C, O, N, P, S. Default is C. -
wave_f_type
(optional): Specifies the type of core MO, such ass
orp
. Default iss
. -
input_path
(optional): Absolute path to the directory containing ORCA output files (inputs for the pipeline). -
output_path
(optional): Absolute path to the directory where the pipeline will save results (outputs).
- The file
config.info
must retain the nameconfig.info
. :) - Sequential ranges (
Atom_number_range_A
,Atom_number_range_B
,core_MO_range
, andexc_state_range
) should be specified with numbers joined by a dash (-
) without spaces (e.g.,4-15
). - To analyze the full set of computed excited states, replace the range with the word
none
(without quotes). soc_option
defaults to 0. It is recommended to explicitly set all FLAG values, even default ones like 0.spectra_option
defaults to 0.external_MO_file
can be left empty, in which case the pipeline assumes that Löwdin populations are included in theorca_output
.atm_core
defaults to C.wave_f_type
defaults tos
.
It is highly recommended to use absolute paths forinput_path
andoutput_path
.- If
input_path
is not provided, the pipeline will attempt to use its current execution location to find theorca_output
andexternal_MO_file
(if applicable). - If
output_path
is not specified, the pipeline will place the results in its execution location. The results will be saved in theoutput_path
under a newly created folder named "orca_output
_out" (e.g.,output_path
/orca_output
_out/). A reduced version for subsequent analysis will be placed in a new directory:output_path
/pop_matrices/orca_output
_csv/.
I recommed to read the information related the config.info
file.
To run the pipeline, use the following command:
$ ./manager.sh $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15
Where:
$1
: Initial number range ofAtom_number_range_A
$2
: Final number range ofAtom_number_range_A
$3
: Initial number range ofAtom_number_range_B
$4
: Final number range ofAtom_number_range_B
$5
: Initial number range ofcore_MO_range
$6
: Final number range ofcore_MO_range
$7
:soc_option
$8
:orca_output
$9
:exc_state_range
$10
:spectra_option
$11
:atm_core
$12
:wave_f_type
$13
:external_MO_file
$14
:input_path
$15
:output_path
Please note that you cannot leave any field empty; otherwise, the subsequent field (parameter) will be interpreted as the missing option for the previous one.
The provided example of the config.info
file serves as a template, where only the second column (the flags) should be modified to suit your analysis.
This example demonstrates an analysis setup for:
- XAS for Sulfur (
atm_core = S
) at the L-edge (wave_f_type = p
) including spin-orbit coupling effects (soc_option = 1
). - Three p core MOs to analyze:
core_MO_range = 63-65
. - Excited states limited to the first seven (
exc_state_range = 1-7
). - Atoms involved (
0-116
): the entire molecule. Although including all atoms might be unnecessary since not all are sulfur, this approach simplifies the setup by screening everything, even if it seems redundant or overly detailed.
For Atom_number_range_A
, include only the enumerated atoms representing Sulfur (core MO space). For Atom_number_range_B
, include the enumerated atoms of the virtual MO space (it is recommended to include all atoms). This range (0 to 116) represents the entire molecule's interaction.
More detailed information about running examples can be found in the example/readme.md
file.
This pipeline primarily utilizes Linux text processing tools:
grep
cut
awk
sed
vim
Contributions are what make the open-source community such a remarkable space for learning, inspiration, and innovation. Your contributions are highly valued and greatly appreciated!
If you have a suggestion to improve this project, feel free to fork the repository and submit a pull request.
Alternatively, you can open an issue with the tag "enhancement." And do not forget to give the project a star if you find it helpful—thank you for your support!
- Fork the Project
- Create Your Feature Branch:
git checkout -b feature/branch
- Commit your Changes:
git commit -m 'Add some Feature'
- Push to the Branch:
git push origin feature/branch
- Open a Pull Request
caraortizmah
Distributed under the GNU General Public License v3.0
caraortizmah
Carlos A. Ortiz-Mahecha - ortizmahecha[at]proton.me