Esa Vilkama, President and Founder at Process Data Insights, LLC

**How to Measure and Calculate Correlations 2**

**Coherence**

In signal processing, the coherence is a statistic that can be used to examine the relation between two signals in frequency domain. It is based on the correlation between two signals. It is commonly used to estimate the power transfer between input and output of a linear system. If the signals meets certain statistical criteria, and the system function is linear, it can be used to estimate the causality between the input and output.

Coherence measures the normalized correlation between two power spectra. A power spectra tells how much of the power energy is contained in the frequency components of the signal. If coherence is 1, both signals are fully in coherence. That is, if signal one is the input and signal two is the output, signal one can be characterized fully using signal two. This is the ideal characteristic of a linear system. If coherence is less than 1 both the signals are in coherence + some noise. If coherence is zero the signals one and two are not related. Coherence is mathematically given by

Here is an example showing the coherence of two signals (s1 and s2). They show a strong coherence at a frequency of around 10 Hz.

Python libraries and additional for coherence: scipy; matplotlib; https://pythontic.com/visualization/signals/coherence; https://en.wikipedia.org/wiki/Coherence_(signal_processing)

**Principal component analysis (PCA)**

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (PCs). This transformation is defined in such a way that the first principal component has the largest possible variance (accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. PCA is sensitive to the relative scaling of the original variables, so usually the original data is normalized before performing the PCA.

PCA is done 1) by singular value decomposition of a design matrix or 2) calculating the data covariance (or correlation) matrix of the original data and then performing eigenvalue decomposition on the covariance matrix.

PCA results help us to understand the level of contribution of the original measured variables to each of the PCs. The loadings are the coefficients of the linear combination of the original variables from which the PCs are constructed. Below is a picture of the loadings of PC1 and PC 2 for the well-known iris flower data set.

PCA is used e.g. for dimensionality reduction, data visualization, exploratory data analysis (EDA), making predictive models, and speeding up a machine learning algorithm. PCA reveals the internal structure of the data in a way that best explains the variance in the data. A drawback for applying PCA in typical manufacturing troubleshooting is that PCs are not real measured variables.

Python libraries for PCA: pca; sklearn; matplotlib

**Mutual information**

Mutual information (MI) between two random variables is a non-negative value, which measures the dependency between the variables. More specifically, it quantifies the “amount of information” obtained about one random variable through observing the other random variable. For finite data sets where distributions are unknown, mutual Information can only be estimated. The estimate relies on nonparametric methods based on entropy estimation from k-nearest neighbors distances. Mutual Information is also known as information gain.

Not limited to real-valued random variables and linear dependence like the correlation coefficient, MI is more general and determines how different the joint distribution of the pair (X,Y) is to the product of the marginal distributions of X and Y. MI is the expected value of the pointwise mutual information (PMI). Mutual information can capture any kind of dependency between variables. Mutual information can be used to study similarity, cause-effects, and time-delays between signals.

The mutual information of X relative to Y is given by

For a Python library and additional information, use these links

sklearn.feature_selection.mutual_info_regression; https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html; https://en.wikipedia.org/wiki/Mutual_information

**Maximal information coefficient**

The maximal information coefficient (MIC) is a measure of the strength of the linear or non-linear association between two variables X and Y. The maximal information coefficient uses binning as a means to apply mutual information on continuous random variables. Binning has been used for some time as a way of applying mutual information to continuous distributions; what MIC contributes in addition is a methodology for selecting the number of bins and picking a maximum over many possible grids. The rationale is that the bins for both variables should be chosen in such a way that the mutual information between the variables be maximal. The MIC belongs to the maximal information-based nonparametric exploration (MINE) class of statistics.

For a Python library and additional information, use these links

minepy; http://minepy.sourceforge.net/docs/1.0.0/index.html; https://en.wikipedia.org/wiki/Maximal_information_coefficient

**Distance correlation**

The distance correlation or distance covariance is a measure of dependence between two paired random vectors. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson’s correlation, which can only detect linear association between two random variables.

With Pearson and Spearman, a correlation value of zero does not prove independence between any two variables, but a distance correlation of zero does mean that there is no dependence between those two variables.

For Python libraries and additional information, use these links:

https://github.com/vnmabus/dcor, https://cran.r-project.org/web/packages/energy/energy.pdf, https://mycarta.wordpress.com/2019/04/10/data-exploration-in-python-distance-correlation-and-variable-clustering/, https://en.wikipedia.org/wiki/Distance_correlation

**Comparisons**

Below is a comparison of a few different correlation metrics for some example data sets. Notice that Pearson correlation is most sensitive to linear correlation, while a measure like MIC can identify non-linear relationships**.**

Please contact us at Process Data Insights, LLC, if you are interested in a comprehensive correlation analysis of your manufacturing process.