/AiSurf-Automated-Identification-of-Surface-images

AiSurf is a tool which aims to inspect and classify atomically-resolved images (like AFM and STM) via Scale Invariant Feature Transform (SIFT) + Clustering Algorithms. It is designed to be user-friendly and ready-to-use.

Primary LanguageJupyter Notebook

AiSurf: Automated Identification of Surface images

AiSurf is a tool which aims to inspect and classify atomically-resolved images (like AFM and STM) via Scale Invariant Feature Transform (SIFT) and Clustering Algorithms, inspired by the work of Laanait et al.
The main advantage of AiSurf is that it exploits unsupervised machine learning techniques, so it doesn't require any image database for training, which is a bottleneck for many image classification programs. It can be executed by office computers/laptops with a typical calculation time of 30-60 seconds. No programming skills are required to use this tool, only the istructions written in the Usage section need to be followed.
AiSurf extracts primitive lattice vectors, unit cells, and structural distortions from the original image, with no pre-assumption on the lattice and minimal user intervention.

Cite our work

We kindly ask the user to cite AiSurf's related article when using this code for their scientific research.
Marco Corrias et al 2023 Mach. Learn.: Sci. Technol. 4 015015, DOI: 10.1088/2632-2153/acb5e0

Installation

No installation is needed, the user just needs to download this repository.

Dependencies

  • NumPy
  • Matplotlib
  • SciPy
  • Scikit-learn (sklearn)
  • Python Image Library (PIL)
  • OpenCV

Usage

General setup

In order to start the lattice recognition process, image and simulation parameters need to be set. This can be done in the following way:

  • Create a folder where image, parameters file and results will be stored. In this repository, such folders are inside the experiments folder;
  • Specify the path (relative to the notebook) and the image name at the beginning of the IPython notebook lattice_extraction.ipynb. For example, the third cell of the notebook reads:
# Insert path + filename here:
path = "experiments/SrTiO3(001)/"
filename = "small SrTiO3_1244.png"

Parameters file setup

The parameters file, parameters.ini is the file containing all the parameters needed to run the simulation. It must be put inside the image folder, but if not provided some default parameters will be used instead; such parameters are found at the beginning of the IPython Notebook file. This section will describe the meaning of each parameter; suggestions regarding the parameter tuning are inserted in the Notebook, just before they are used. Images in the experiments folder of this repository can also be used as a reference for parameter tuning.

[SIFT]
Three main parameters of the SIFT algorithm, well explained in the original article by Lowe and in this link.

  • contrast_threshold: the contrast threshold used to filter out weak features. Higher threshold means more discarded features. Default: 0.003;
  • sigma: the sigma of the Gaussian applied to the input image at the first octave. Default: 4;
  • noctavelayers: the number of layers in each octave. The number of octaves is computed automatically from the image resolution. Default: 8.

[Keypoint filtering]
These are thresholds to filter out keypoints ("kp") that could cause issues in the lattice identification process, in units of the median keypoint size.

  • size_threshold: if kp_size > median*size_threshold or kp_size < median/size_threshold the keypoint is deleted. Default: 2;
  • edge_threshold: all keypoints that are closer than median*edge_threshold to one border of the image are deleted. Default: 1.

[Keypoint Clustering]
Clusterings with n clusters between lower and upper bound are evaluated with respect to their silhouette score; the one with the maximal silhouette score is chosen for further processing.

  • cluster_kp_low and cluster_kp_high: values defining the interval containing the optimal number of different clusters found in the image, evaluated by calculating the silhouette score. Default: 2 and 12 respectively, they define the variable clustering_span_kp;
  • cluster_choice: number that selects the chosen reference cluster for the second part of the analysis. The value of 1 indicates the first/most populated cluster, so 2 selects the second most populated one and so on. Default: 1.

[Nearest Neighbours]
Parameters related to the clustering processes used to find the primitive vectors.

  • cluster_kNN_low and cluster_kNN_high: values defining the interval containing the optimal number of clusters for the calculated nearest neighbours (NN) distances, evaluated by the silhouette score. Default: 6 and 24 respectively; they define the variable cluster_span_kNN.
    cluster_kNN_low is also the number of NN considered for each keypoint during the calculations.
  • clustersize_Threshold: used to reduce impact of erroneous NN-vectors on the selection of the lattice vectors. In the final distribution only nn-clusters with population ≥ clustersize_threshold*n_max are considered; n_max is the population of the largest cluster; Default: 0.3.

[Sublattice lookup]
Once the primitive vectors have been found, we look for the sublattice positions.

  • cluster_SUBL_low and cluster_SUBL_high: values defining the interval containing the optimal number of sublattice positions. Default: 2 and 6 respectively; they define the variable clustering_span_SUBL.

[Deviation plot]
Parameters related to the perfect-lattice-deviations plot.

  • k2: number of nearest neighbors considered for each keypoint. Default: 10;
  • rtol_rel: all vectors that are within the relative_r-tolerance of the lattice vectors are drawn; Default: 4 (pixels);
  • arrow_width: the arrow_width can be specified (see matplotlib.quiver() - width parameter). Default: 0.003;
  • c_max_arrow: deviation (in pixels) of a lattice vector with respect to the predicted one. Needed to tune the visualization of bond deviations, purely aesthetic. Default: None.

Example

SrTiO3 (001) with Sr vacancies, calculated with the default parameters written above:
Keypoints localization after cleaning:
clean_kp
Nearest neighbours distances folded into the unit cell:
sublattice_pos
Arrows connecting Sr atoms, with colours based on their deviation from the primitive vector:
deviations
Final prediction of the cell symmetry:
symmetry