subota, 27. srpnja 2024.

Pearson Correlation Heatmap

To obtain the Pearson's correlation heatmap for your dataset please click on the choose file button. By clicking on this button the "Open" window will appear. All you need to do is locate the dataset on your computer (.csv format only) and click "Open" and the Pearson's correlation heatmap will appear.

Pearson's correlation heatmap

The pearsons correlation coefficient, denoted as \(r\), is a measure of the linear relationship between two variables. It quantifies the degree to which two variables are linearly related, providing both the direction and the strength of the relationship.
The range of \(r\) is from -1 to +1 where +1 indicates the perfect positive linear correlation, while -1 indicates the perfect negative correlation. The 0 value indicates no linear relationship. The positive correlation \(r > 0\) between two variables indicates that if the value of one variable decreases the value of other variable also decreases. If the value of one variable increases the value of the other will also increase. The negative corelation coefficient \(r < 0\) between two dataset variables indicates that if the value of one variable increases the value of the other will decrease. If the value of one variable increases.
The magnitude of the \(r\) can be:
  • \(0.0 \leq |r|< 0.3\) - weak correlation
  • \(0.3 \leq |r| < 0.7\) - moderate correlation
  • \(0.7 \leq |r| \leq 1\) - strong correlation
The pearsons correlation coefficient between two dataset variables can be calculated using formula: \begin{equation} r = \frac{\sum{(x_i-\overline{x})(y_i-\overline{y})}}{\sqrt{\sum{(x_i - \overline{x})^2}\sum{(y_i-\overline{y})^2}}} \end{equation} where:
  • \(x_i\) and \(y_i\) are the individual dataset samples
  • \(\overline{x}\) and \(\overline{y}\) are the mean values of the variable \(x\) and variable \(y\).

Steps to calculate Pearsons correlation coefficient

The calculation of the Perasons correlation coefficient consists of the following steps:
  1. Compute the mean< - calculate the mean values of x and y (\(\overline{x}\),\(\overline{y}\)).
  2. Compute the deviations - Subtract the mean of x from each \(x_i\) to get deviations for \(x\). Subtract the mean of \(y\) from each \(y_i\) to get deviations for \(y\).
  3. Compute the products of deviations - Multiply the deviations of \(x\) and \(y\) for each pair of observations.
  4. Sum the products - sum all the products obtained in the previous step
  5. Compute the sum of squared deviations - square the deviations of \(x\) and sum them. Square the deviations of \(y\) and sum them.
  6. Calculate \(r\) - divide the sum of the products of deviations by the square root of the product of the sum of squared deviations.

Example: Calculation of Pearson's correlation coefficient step by step.

Two dataset variables are represented as arrays: \begin{eqnarray} x = [2,4,6,8,10] y = [3,5,7,9,11] \end{eqnarray} As seen both variable arrays have the same number of samples i.e. 5.

Compute the mean value of \(x\) and \(y\)

\begin{equation} \overline{x} = \frac{2+4+6+8+10}{5} = \frac{30}{5} = 6 \end{equation} \begin{equation} \overline{y} = \frac{3+5+7+9+11}{5} = \frac{30}{5} = 7 \end{equation}

Compute the deviations

For \(x\) array the deviations are: \begin{equation} -4,-2,0,2,4 \end{equation} For \(y\) array the deviations are: \begin{equation} -4,-2,0,2,4 \end{equation}

Compute the product of deviations

Multiply the deviations of \(x\) and \(y\) for each pair of dataset samples. \begin{equation} 16,4,0,4,16 \end{equation}

Sum the products

\begin{equation} \sum{(x_i-\overline{x}) (y_i - \overline{y})} = 16+4+0+4+16 = 40 \end{equation}

Compute the sum of squared deviations

\begin{equation} \sum{(x_i - \overline{x})^2} = 16+4+0+4+16 = 40 \end{equation} \begin{equation} \sum{(y_i - \overline{y})^2} = 16+4+0+4+16 = 40 \end{equation}

Calculate the \(r\)

\begin{equation} r = \frac{40}{\sqrt{40\cdot 40}} = \frac{40}{40} = 1 \end{equation}

Conclusion

The Pearson's correlation coefficient is a powerfull statistical tool for understanding the linear relationship between two variables. It provides a quantifiable measure of both the direction and strength of this relationship, which can be invaluable in data analysis, research, and many applied fields. However, it's essential to be aware of its assumptions and limitations to interpret the results correctly.

Nema komentara:

Objavi komentar

CSV to SQL Converter

CSV to SQL Converter Step 1: Choose CSV File ...