/x-ray_scripting_out

Pipeline script for X-ray computation analysis

Primary LanguageShellGNU General Public License v3.0GPL-3.0

X-ray absorption computation analysis

The x-ray_scripting_out pipeline processes X-ray Absorption Spectroscopy (XAS) output data generated by the ORCA quantum chemistry software. Its primary purpose is to leverage the force oscillator strength and transition density matrices to derive the core-virtual coupling molecular orbitals (MOs) represented as matrices. The updated output format integrates transition intensity from the matrix densities with their force oscillator strengths to build the matrices that describe MOs in the core and virtual spaces. These matrices encapsulate the core-virtual coupling MOs, as exemplified below:

 1. Number of transition intensities
 2. Transition intensity probability
 3. Force oscillator strenght 

Getting Started

The pipeline is implemented in Shell script and is best suited for a Linux operating system. To execute the pipeline, you can use either manager.sh or overall.sh.

Prerequisites

The input required to run this pipeline is an XAS output file from ORCA, generated using either ROCIS/DFT or PNO-ROCIS/DFT. This input file must include the molecular orbital (MO) Löwdin population and the standard format of the transition intensities and probabilities for each excited state, along with a list of coupling MOs. You can specify a localized group of atoms involved in the coupling MO transitions, allowing for focused analysis of transitions between two sets of atoms, such as two amino acids in a protein.

Download

Clone the x-ray_scripting_out repository using Git

$ git clone https://github.com/caraortizmah/x-ray_scripting_out.git

Run

The pipeline can be run in two ways: a simpler, more automated approach using helper_man.sh, or a more customizable option with manager.sh.

  • manager.sh: This is the primary script that executes all the pipeline steps in a sequential (noticeable) order, as indicated by their step-specific names.
  • helper_man.sh: This provides an easier method by reading the required parameters from a separate file, named config.info.

Recommended: Automated Method

Run the following command:

$ ./helper_man.sh

helper_man.sh uses the information in config.info to execute manager.sh.

Read further about the config.info file

The config.info file is self-explanatory, formatted as a two-column table (NAME and FLAG). The NAME column describes the parameter, option, or condition, while the FLAG column specifies the values that manager.sh will directly apply to the ORCA outputs.

Please do not alter the file format, such as lines, dashes, or naming conventions. Additionally, do not modify any NAME or FLAG entries.

General overview of the 'FLAG's

The following parameters are MANDATORY:

  • Atom_number_range_A
  • Atom_number_range_B
  • core_MO_range
  • exc_state_range
  • soc_option
  • orca_output

The following parameters are OPTIONAL:

  • spectra_option
  • external_MO_file
  • atm_core
  • wave_f_type
  • input_path
  • output_path
Description of the 'FLAG's
  • Atom_number_range_A and Atom_number_range_B: Specify the range of atom sequential numbers in the coordinates used in the XAS ORCA output file (orca_output). Note that the enumeration starts from 0 for the first atom.

  • Atom_number_range_A: Atoms of the core space.

  • Atom_number_range_A: Atoms of the virtual space.

  • core_MO_range: Defines the range of core molecular orbitals (MOs) for the target atom, e.g., C. To study specific core MOs, such as 4 and 15, run the pipeline separately for each, setting core_MO_range = 4-4 for one and core_MO_range = 15 for the other. If core_MO_range = 4-15 is specified, the program processes the entire sequential range, following the same logic as the atom number range flags (Atom_number_range_A and Atom_number_range_B).

  • exc_state_range: Specifies the range of excited states to analyze, based on those computed in orca_output. It follows the same format as core_MO_range, Atom_number_range_B and Atom_number_range_A.

  • soc_option: Accepts 0 or 1, where 0 excludes spin-orbit coupling effects, and 1 includes them (e.g., for sulfur L-edge analysis).

  • orca_output: Refers to the XAS ORCA output file, compatible with ORCA versions 4 and 5.0.4. Note that ORCA 6.0 introduces a substantially different output format, which will be supported in a future update.

  • spectra_option (optional): Accepts 0 or 1. Default is 0 (recommended). Option 1 allows advanced analysis (beta), particularly for soc_option = 1, though 0 is still advised unless further testing is conducted.

  • external_MO_file (optional): An ORCA file containing Löwdin population data. Ensure that the ORCA input includes the flag !Normalprint to output Löwdin populations. This flag allows workflow separation from the orca_output file. Read more about ORCA input.

  • atm_core (optional): Atomic symbol of the target atom, e.g., C, O, N, P, S. Default is C.

  • wave_f_type (optional): Specifies the type of core MO, such as s or p. Default is s.

  • input_path (optional): Absolute path to the directory containing ORCA output files (inputs for the pipeline).

  • output_path (optional): Absolute path to the directory where the pipeline will save results (outputs).

Assumptions
  • The file config.info must retain the name config.info. :)
  • Sequential ranges (Atom_number_range_A, Atom_number_range_B, core_MO_range, and exc_state_range) should be specified with numbers joined by a dash (-) without spaces (e.g., 4-15).
  • To analyze the full set of computed excited states, replace the range with the word none (without quotes).
  • soc_option defaults to 0. It is recommended to explicitly set all FLAG values, even default ones like 0.
  • spectra_option defaults to 0.
  • external_MO_file can be left empty, in which case the pipeline assumes that Löwdin populations are included in the orca_output.
  • atm_core defaults to C.
  • wave_f_type defaults to s.
    It is highly recommended to use absolute paths for input_path and output_path.
  • If input_path is not provided, the pipeline will attempt to use its current execution location to find the orca_output and external_MO_file (if applicable).
  • If output_path is not specified, the pipeline will place the results in its execution location. The results will be saved in the output_path under a newly created folder named "orca_output_out" (e.g., output_path/orca_output_out/). A reduced version for subsequent analysis will be placed in a new directory: output_path/pop_matrices/orca_output_csv/.

Customizable Method

I recommed to read the information related the config.info file. To run the pipeline, use the following command:

 $ ./manager.sh $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15

Where:

  • $1: Initial number range of Atom_number_range_A
  • $2: Final number range of Atom_number_range_A
  • $3: Initial number range of Atom_number_range_B
  • $4: Final number range of Atom_number_range_B
  • $5: Initial number range of core_MO_range
  • $6: Final number range of core_MO_range
  • $7: soc_option
  • $8: orca_output
  • $9: exc_state_range
  • $10: spectra_option
  • $11: atm_core
  • $12: wave_f_type
  • $13: external_MO_file
  • $14: input_path
  • $15: output_path

Please note that you cannot leave any field empty; otherwise, the subsequent field (parameter) will be interpreted as the missing option for the previous one.

Examples:

The provided example of the config.info file serves as a template, where only the second column (the flags) should be modified to suit your analysis.

This example demonstrates an analysis setup for:

  • XAS for Sulfur (atm_core = S) at the L-edge (wave_f_type = p) including spin-orbit coupling effects (soc_option = 1).
  • Three p core MOs to analyze: core_MO_range = 63-65.
  • Excited states limited to the first seven (exc_state_range = 1-7).
  • Atoms involved (0-116): the entire molecule. Although including all atoms might be unnecessary since not all are sulfur, this approach simplifies the setup by screening everything, even if it seems redundant or overly detailed.

For Atom_number_range_A, include only the enumerated atoms representing Sulfur (core MO space). For Atom_number_range_B, include the enumerated atoms of the virtual MO space (it is recommended to include all atoms). This range (0 to 116) represents the entire molecule's interaction.

More detailed information about running examples can be found in the example/readme.md file.

Final Comments:

This pipeline primarily utilizes Linux text processing tools:

  • grep
  • cut
  • awk
  • sed
  • vim

Contributing

Contributions are what make the open-source community such a remarkable space for learning, inspiration, and innovation. Your contributions are highly valued and greatly appreciated!
If you have a suggestion to improve this project, feel free to fork the repository and submit a pull request. Alternatively, you can open an issue with the tag "enhancement." And do not forget to give the project a star if you find it helpful—thank you for your support!

Steps to Contribute

  1. Fork the Project
  2. Create Your Feature Branch:
    git checkout -b feature/branch  
  3. Commit your Changes:
    git commit -m 'Add some Feature'  
  4. Push to the Branch:
    git push origin feature/branch 
  5. Open a Pull Request

Top contributors:

caraortizmah

License

Distributed under the GNU General Public License v3.0

Contact

caraortizmah

Carlos A. Ortiz-Mahecha - ortizmahecha[at]proton.me