/Factor-Analysis

Implementation of factor analysis on BFI (dataset based on personality assessment project)

Primary LanguageJupyter Notebook

Factor Analysis

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus "error" terms.

Simply put, the factor loading of a variable quantifies the extent to which the variable is related with a given factor.

A common rationale behind factor analytic methods is that the information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis is commonly used in psychometrics, personality theories, biology, marketing, product management, operations research, finance, and machine learning. It may help to deal with data sets where there are large numbers of observed variables that are thought to reflect a smaller number of underlying/latent variables. It is one of the most commonly used inter-dependency techniques and is used when the relevant set of variables shows a systematic inter-dependence and the objective is to find out the latent factors that create a commonality.

Types of factor analysis

  • Exploratory factor analysis

Exploratory factor analysis (EFA) is used to identify complex interrelationships among items and group items that are part of unified concepts.[3] The researcher makes no a priori assumptions about relationships among factors.

  • Confirmatory factor analysis

Confirmatory factor analysis (CFA) is a more complex approach that tests the hypothesis that the items are associated with specific factors. CFA uses structural equation modeling to test a measurement model whereby loading on the factors allows for evaluation of relationships between observed variables and unobserved variables. Structural equation modeling approaches can accommodate measurement error, and are less restrictive than least-squares estimation. Hypothesized models are tested against actual data, and the analysis would demonstrate loadings of observed variables on the latent variables (factors), as well as the correlation between the latent variables

Types of factor extraction

Principal component analysis (PCA) is a widely used method for factor extraction, which is the first phase of EFA.Factor weights are computed to extract the maximum possible variance, with successive factoring continuing until there is no further meaningful variance left.The factor model must then be rotated for analysis.

Canonical factor analysis, also called Rao's canonical factoring, is a different method of computing the same model as PCA, which uses the principal axis method. Canonical factor analysis seeks factors which have the highest canonical correlation with the observed variables. Canonical factor analysis is unaffected by arbitrary rescaling of the data.

Common factor analysis, also called principal factor analysis (PFA) or principal axis factoring (PAF), seeks the fewest factors which can account for the common variance (correlation) of a set of variables.

Image factoring is based on the correlation matrix of predicted variables rather than actual variables, where each variable is predicted from the others using multiple regression.

Alpha factoring is based on maximizing the reliability of factors, assuming variables are randomly sampled from a universe of variables. All other methods assume cases to be sampled and variables fixed.

Factor regression model is a combinatorial model of factor model and regression model; or alternatively, it can be viewed as the hybrid factor model, whose factors are partially known.

Principal components analysis and factor analysis are common methods used to analyze groups of variables for the purpose of reducing them into subsets represented by latent constructs (Bartholomew, 1984; Grimm & Yarnold, 1995). Even though PCA shares some important characteristics with factor analytic methods such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), the similarities between the two types of methods are superficial. The most important distinction to make is that PCA is a descriptive method, whereas EFA and CFA are modeling techniques (Unkel & Trendafilov, 2010). Together, PCA, EFA, and CFA are used to analyze multiple variables for the purposes of data reduction, scale construction and improvement, and evaluation of validity and psychometric utility describes the appropriate problems to which each might be correctly applied, and discusses the similarities and differences between these three methods. Principal components analysis (PCA; Goodall, 1954) is a method for explaining the maximum amount of variance among a set of items by creating linear functions of those items for the purpose of identifying the smallest number of linear functions necessary to explain the total variance observed for the item set in the correlation matrix (Grimm & Yarnold, 1995). Put another way, PCA identifies the smallest number of factors or components necessary to explain as much (or all) of the variance as possible. In this context, a factor or component is a set of variables that, when combined in a linear fashion, explains some portion of the observed variance (Mulaik, 1990).