In this chapter, we will begin to study a descriptive statistic that represents the cornerstone of modern data analysis-the correlation coefficient. We will examine a fictitious example where a data analyst uses a correlation coefficient to analyze her results. We will then consider the definition of the Pearson product-moment correlation coefficient, learn now to calculate both its population and sample values, and present further sample data sets to help you understand its meaning. In the process, you will learn that the Pearson product-moment correlation coefficient is generally used with interval or ratio scaled data. Other measures of relationship that are generally used with nominal and ordinal scaled data will be examined in a set of special exercises at the end of the chapter.
Your jamovi objectives for this chapter are to: (a) find the Pearson correlation coefficient, (b) find a Pearson correlation matrix, and (c) create a scatterplot.
To keep the presentation in this section less abstract, we will use a data set created by Professor Louise Farcle, an industrial psychologist (or professor of organization and management) at the University of California at Los Angeles (UCLA). The data and the researcher are fictitious but will facilitate our discussion of the correlation coefficient.
Professor Farcle is interested in the relationship between expected job performance (EJP) and actual job performance (AJP) for employees of a sample of seven companies. These companies were randomly selected from a 1985 list of Fortune 100 companies. To examine the relationship between EJP and AJP, Professor Farcle randomly selected ten employees from each company. Figures 7a(i) through 7a(vii), which represent companies A through G, respectively, present this data.
In the analysis of a relationship using a correlation coefficient, units have two measures; one measure is usually denoted by X and the other, by Y. Therefore, in Figures 7a(i) through 7a(vii), the letter X indicates the measure of expected job performance (EJP), and the letter Y indicates the measure of actual job performance (AJP). We will assume these measures are based on highly regarded, standardized instruments.
In each data set, numbers were appended to X and Y to differentiate the scores of one company from those of another. For example, in Figure 7a(i), we appended 1 to both X and Y to form X1 and Y1 for the first company. In Figure 7a(ii), we appended 2 to both X and Y to form X2 and Y2 for the second company and so forth. The following section defines the Pearson product-moment correlation coefficient and presents it through its population and sample equations. These equations by themselves, however, will not provide you with sufficient understanding of this correlation coefficient. Therefore, a discussion of the seven different values of the Pearson correlation coefficient found by Professor Farcle also appears. These seven values provide you with further insight into the meaning of the correlation coefficient.
The Pearson product-moment correlation coefficient measures the linear relationship between two variables. This correlation coefficient is frequently called the zero-order correlation coefficient.
The term zero-order refers to the fact that the correlation coefficient has no secondary subscripts, so that it is based on only two measures. The term first-order refers to a coefficient with one secondary subscript and is a correlation that concerns three measures. For example, the term \(r_{XY}\) represents a zero-order sample correlation coefficient, and the term \(r_{XY}\). Z represents a first-order sample correlation coefficient (Z is the secondary subscript).
The small Greek letter ρ (rho, pronounced rho) denotes the population Pearson correlation coefficient. For variables X and Y, ρXY is found as:
\[ \begin{equation} \rho_{XY} = \frac {\sigma_{XY}} {\sigma_X \sigma_Y} \tag{7-1} \end{equation} \] where
\[ \begin{equation} \sigma_{XY} = \frac {\sum (X-\mu_X) (Y - \mu_Y)} N \tag{7-2} \end{equation} \] σXY is known as the population covariance, X represents the scores on one variable with population mean μX and standard deviation σX, and Y represents the scores on a second variable with population mean μY and standard deviation σY.
The sample Pearson correlation is denoted by the letter r (for regression) and is found as:
\[ \begin{equation} r_{XY} = \frac {s_{XY}} {s_X s_Y} \tag{7-3} \end{equation} \] where
\[ \begin{equation} s_{XY} = \frac {\sum (X-M_X) (Y-M_Y)} {n-1} \tag{7-4} \end{equation} \] sXY is known as the sample covariance, X represents the scores on one variable with sample mean MX and standard deviation sX and Y represents the scores on a second variable with sample mean MY and standard deviation sY.
The covariance measures shown in equations 7-1 and 7-4 are perfectly good measures of relationship by themselves. Indeed, many physical science problems use the covariance as the measure of association. The covariance has a problem, however, as a measure of association because the scale of measurement of X and Y affects it. It is, therefore, best to use the covariance when both X and Y are measured on the same scale. The covariance is divided by the standard deviations of X and Y to remove the scale properties of these measures. This results in a correlation coefficient that will not change with linear transformations on the scales of measurement of X and Y. We will consider this property more fully, later in this chapter.
The sample correlation and covariance equations, 7-3 and 7-4, are generally used in actual data analyses. Therefore, we will use the sample equations throughout this chapter. (The population equations appear in exercise 3.)
In this section, we will calculate the Pearson correlation coefficient between two variables by hand. The hand calculations illustrated here make use of equations (7-3) and (7-4).
Figures 7a(i) through 7a(vii) illustrate the calculation of the Pearson correlation coefficient by hand for each data set. The procedure is somewhat tedious, but it is not difficult. The following steps use the data from Figure 7a(ii) as an example in explaining the process.
A Pearson correlation can have a maximum positive value of 1.00 and a maximum negative value of -1.00 (that is, -1.00 ≤ \(ρ_{XY}\) ≤ 1.00). You should never compute a Pearson correlation coefficient without also plotting a scatterplot. In a scatterplot, dots represent the intersection of the X and Y scores of two variables. A scatterplot is also frequently called a scatter diagram. A Pearson correlation of 1.00 or -1.00 is called a perfect correlation because all of the dots lie on a “perfectly” straight line in the scatterplot.
In the data sets shown in Figures 7a(i) through 7a(vii), we can examine what a value of a Pearson correlation coefficient reflects for a given company and how the data looks in a scatterplot. The following discussion of Figures 7a(i) through 7a(vii) should help you understand what a given Pearson correlation coefficient means. The next section will discuss the correlation’s calculation. (Note that Figures 7a(i) through 7a(vii) are not a single output but a group of outputs.)
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 275/10 = 27.5 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 275/10 = 27.5 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= 2062.5/9 = 229.167 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 2062.5/9 = 229.167 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 2062.5/9 = 229.167 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {229.167} {(15.138)(15.138)} = 1 \end{align} \]
Figure 7a(i) is an example of a perfect, positive Pearson correlation, that is, \(r_{XY}\) = 1.00. All of the dots in the scatterplot fall on a straight line, which is why the Pearson correlation coefficient is called a measure of linear relationship. So there is nothing to the graph except a slope and intercept, no scatter.
In this scatterplot, the low scores on expected job performance are found with low scores on actual job performance (for example, 5 with 5 and 10 with 10); and the high scores on expected job performance are found with high scores on actual job performance (for example, 45 with 45 and 50 with 50). In fact, all of the X and Y scores are equal to each other. This direct relationship between the scores yields a line that has a positive slope, that is, rises from left to right. This is the most common example of a perfect Pearson correlation, but we will soon see that the scores do not have to be equal for \(r_{XY}\) to equal 1.
Before continuing, we should discuss how the line shown in Figure 7a(i) can be described more generally. You can write the equation for a straight line as Y = b0 + b1X (see Chapter 8 and Appendix E for further discussion about the equation for a straight line). In this equation, b0 represents the Y-intercept, and b1 represents the slope of the line. The words pitch, slant, and steepness are essentially synonyms for the word slope. Given any two points on a line [that is, (X1, Y1) and (X2, Y2) where X2 > X1], the slope is the ratio of the difference of the Y scores to the difference of the X scores. The equation is:
\[ \begin{equation} b_1 = \frac {Y_2-Y_1} {X_2-X_1} \tag{7-5} \end{equation} \] For example, the slope of the line shown in Figure 7a(i) can be found using the points (5,5) and (10,10). The equation is:
\[ b_1 = \frac {10-5} {10-5} = \frac {5} {5} = 1 \] You will get the same result for the slope if you try any two points that lie on the line (subject to the condition that X1 < X2). For example, try (30,30) and (50,50).
In the latter example, the slope is a positive number (Y2 is larger than Y1), which indicates that the straight line rises as it moves from left to right. The slope will be negative (Y2 is less than Y1), when the straight line falls as it moves from left to right.
Figure 7a(vii) shows a straight line with a negative slope. The slope of the line in Figure 7a(vii) is found using the points (5,50) and (10,45). The equation is:
\[ b_1 = \frac {45-50} {10-5} = \frac {-5} {5} = -1 \] The Y-intercept is the point on the y-axis where X equals 0 (i.e., the line X = 0). At this point, Y = b0. For example, in Figure 7a(i), the Y-intercept is 0.0; and in Figure 7a(vii), the Y-intercept is 55.0. We will again discuss the calculation of the Y-intercept and the slope later in the next chapter when we consider regression analysis.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 267/10 = 26.7 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 251/10 = 25.1 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= 1291.3/9 = 143.478 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 1750.1/9 = 194.456 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 1576.9/9 = 175.211 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {143.478} {(13.945)(13.237)} = 0.777 \end{align} \]
Figure 7a(ii) presents a high Pearson correlation, \(r_{XY}\) = .78, but one that is less than perfect. In this case, the dots do not fall on a straight line, as in Figure 7a(i); but there still is a direct relationship between the success measures for company B. As in Figure 7a(i), low scores on X (expected job performance) tend to be found with low scores on Y (actual job performance). For example, a case with a lower X (e.g., X = 4) has a lower Y as well (e.g., Y = 6), but a case with a higher X (e.g., X = 37) has a higher Y value (e.g., Y = 42). These can also be written as coordinate pairs (X, Y) such that a case might have lower pair of scores (14, 10) but another case might have higher pair of scores (45, 48). If you draw a line around these dots, the resulting shape is an ellipse that rises from left to right. Another way to think about it is that cases with scores above the mean on X tend have scores above the mean on Y, and cases below the mean on X tend to be below the mean on Y.
We informally define an ellipse as the curve formed when a cone is sliced nonparallel and non-perpendicular to its base. Figure 7a(ii) shows an ellipse. The major axis of an ellipse is the line drawn from the two points on the ellipse that are the farthest from each other. The minor axis of an ellipse is the longest line within the ellipse that is perpendicular to the major axis. When an ellipse is drawn around the points of a scatterplot, the slope of the major axis indicates the sign of the correlation, and the length of the minor axis reflects the magnitude of the correlation.
\[ \begin{equation} r_{XY} = \frac {n \sum{XY} - \sum X \sum Y } {\sqrt{ n \sum X^2 - (\sum X)^2} \sqrt{ n \sum Y^2 - (\sum Y)^2 } } \end{equation} \] where, for example, \[ \begin{align} n &= 10 \\ \sum X &= 267.0 \text{ [available from Figure 7a(ii)]} \\ \sum Y &= 251.0 \text{ [available from Figure 7a(ii)]} \\ (\sum X)^2 &= 267.0^2 = 71289.0 \\ (\sum Y)^2 &= 251.0^2 = 63001.0 \\ \sum{XY} &= 7993.0 \\ n \sum X^2 &= 10 * 8879.0 = 88790.0 \\ n \sum Y^2 &= 10 * 7877.0 = 78770.0 \\ \end{align} \] \[ \begin{align} r_{XY} &= \frac {(10 * 7993) - (267 * 251) } {\sqrt{ (10 * 8879) - 71289} \sqrt{ (10*7877) - 63001 } } \\ r_{XY} &= \frac { 79930 - 67017 } { \sqrt{17501} \sqrt{15769} } \\ r_{XY} &= \frac {12913} {16612} \\ r_{XY} &= 0.77733 \end{align} \]
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 288/10 = 28.8 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 213/10 = 21.3 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= 667.6/9 = 74.178 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 1903.6/9 = 211.511 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 2048.1/9 = 227.567 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {74.178} {(14.543)(15.085)} = 0.338 \end{align} \]
The points in Figure 7a(iii) illustrate the relationship between expected job performance and actual job performance, denoted by X3 and Y3, in company C. The correlation between these two variables in this company is a small, positive Pearson correlation. When you draw a line around the dots, the ellipse is fatter, (that is, wider at its minor axis) than the ellipse for the dots in Figure 7a(ii). This happens because a low, positive Pearson correlation still has a direct relationship between the X and Y, scores (that is, the major axis still has a positive slope), but this direct relationship is not as consistent as the relationship of a higher (stronger) Pearson correlation. For example, in Figure 7a(iii), the scores 8,5, and 41,45 indicate a strong, direct relationship; but the scores 23,4 and 46,12 weaken this relationship.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 284/10 = 28.4 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 284/10 = 28.4 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= 76.4/9 = 8.489 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 1004.4/9 = 111.6 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 2606.4/9 = 289.6 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {8.489} {(10.564)(17.018)} = 0.047 \end{align} \]
Figure 7a(iv) illustrates a Pearson correlation essentially equal to zero for company D. There is practically no relationship between the scores, which are denoted by X4 and Y4. This happens because for similar scores on X4, there are both low and high scores on Y4 (for example, 28,2 and 29,48). When you enclose scores that have no relationship, like those in Figure 7a(iv), you generally find a circle rather than an ellipse.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 290/10 = 29 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 235/10 = 23.5 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= -905/9 = -100.556 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 1852/9 = 205.778 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 1898.5/9 = 210.944 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {-100.556} {(14.345)(14.524)} = -0.483 \end{align} \]
What might be described as a medium, negative Pearson correlation is illustrated in Figure 7a(v) for company E. The pattern of dots is similar to the pattern found in Figure 7a(iii), but in the opposite direction. When you draw a line around the points in Figure 7a(v), you draw an ellipse whose major axis descends from left to right. This happens because scores that have a negative Pearson correlation generally have an inverse relationship. Low scores on X are found with high scores on Y (for example, 8,37 and 17,45), and high scores on Y are found with low scores on X (for example, 50,5 and 35,4). The existence of some pairs that do not comply with this inverse relationship, (for example, 25,8 and 46,27), prevents the Pearson correlation from being a stronger negative one. Another way to think about it is that cases with scores above the mean on X tend have scores below the mean on Y, and cases below the mean on X tend to be above the mean on Y.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 257/10 = 25.7 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 251/10 = 25.1 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= -1352.7/9 = -150.3 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 1934.1/9 = 214.9 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 1368.9/9 = 152.1 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {-150.3} {(14.659)(12.333)} = -0.831 \end{align} \]
Figure 7a(vi) shows a strong negative Pearson correlation of \(r_{XY}\) = -.83 for company F. Low scores on X6 (expected job performance) are found with high scores on Y6 (actual job performance) and vice versa. Also, the ellipse that surrounds these points is more slender, (that is, has a smaller minor axis) than does the ellipse in Figure 7a(v). As is true for all negative Pearson correlations, the major axis of the ellipse that surrounds these dots descends from left to right.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
where
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 275/10 = 27.5 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 275/10 = 27.5 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= -2062.5/9 = -229.167 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 2062.5/9 = 229.167 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 2062.5/9 = 229.167 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {-229.167} {(15.138)(15.138)} = -1 \end{align} \]
Figure 7a(vii) contains the scores for company E, denoted by X7 and Y7. These scores have a perfect, negative Pearson correlation of -1. As in Figure 7a(i), where there is also a perfect Pearson correlation between the success measures, the dots in the scatterplot of Figure 7a(vii) fall on a straight line. Here, however, the straight line has a negative slope; that is, the line descends from left to right.
Sometimes statistics students think that a positive relationship is somehow stronger or more meaningful than a negative relationship. If you have Pearson correlations of the same magnitude, however, you know as much about a positive relationship as you do about a negative relationship. A positive sign indicates the relationship is direct (low with low; high with high), and a negative sign indicates the relationship is inverse (low with high; high with low). That \(r_{XY}\) = 1 provides as much information as \(r_{XY}\) = -1 should be apparent from a comparison of Figures 7a(i) and 7a(vii), where the Pearson correlations are 1 and -1, respectively. In both cases, given X, we know what Y will be.
Professor Farcle would conclude from her results that there is a strong relationship between expected job success and actual job success for companies A, B, F, and G; a medium relationship in companies C and E; and no relationship in company D. Her results indicate that the relationship found between the two measures of job success was strongly dependent on the company on which the study was conducted. She therefore concluded that she must investigate each company separately to try and account for her unusual findings.
We will consider further aspects of the correlation coefficient and what variables the correlation coefficient is sensitive to after we learn how to calculate it.
DESCRIPTIVES
Descriptives
────────────────────────────────────────────────────────────────────────────
N Mean Sum SD Variance
────────────────────────────────────────────────────────────────────────────
X 10 26.700 267.00 13.945 194.46
Y 10 25.100 251.00 13.237 175.21
XDEV 10 7.0985e-16 7.1054e-15 13.945 194.46
YDEV 10 -1.4209e-15 -1.4211e-14 13.237 175.21
XDEV_SQ 10 175.010 1750.10 166.441 27702.72
YDEV_SQ 10 157.690 1576.90 182.536 33319.32
Cross_Product 10 129.130 1291.30 175.183 30689.24
────────────────────────────────────────────────────────────────────────────
Compare these results to Figure 7a(ii).
\[ \begin{align} \text {n} &= 10 \\ \text {Mean X} = \overline{X} = M_X = \sum X / n &= 26.7 \\ \text {Mean Y} = \overline{Y} = M_Y = \sum Y / n &= 25.1 \\ \text {Sum of Squares X} = SS_X = \sum {(X - M_X)^2} &= 1750.1 \\ \text {Sum of Squares Y} = SS_Y = \sum {(Y - M_Y)^2} &= 1576.9 \\ \text {Variance X} = s_X^2 = SS_X / (n-1) &= 194.45556 \\ \text {Variance Y} = s_Y^2 = SS_Y / (n-1) &= 175.21111 \\ \text {Standard Deviation X} = SD_X = s_X = \hat{\sigma}_X = \sqrt {s_X^2} &= 13.94473 \\ \text {Standard Deviation Y} = SD_Y = s_Y = \hat{\sigma}_Y = \sqrt {s_Y^2} &= 13.23673 \\ \text {Sum of Cross-Products XY} = SCP_{XY} = \sum {(X - M_X)(Y - M_Y)} &= 1291.3 \\ \text {Covariance XY} = s_{XY} = \hat{\sigma}_{XY} = SCP_{XY} / (n-1) &= 143.47778 \\ \text {Correlation XY} = r_{XY} = \hat{\rho}_{XY} = s_{XY} / (s_X s_Y) &= 0.77731 \end{align} \]
If you were to read a journal article where the Pearson correlation coefficient was used as the measure of relationship between two variables, you would probably encounter Pearson correlations between several variables (i.e., not just two variables X and Y). These Pearson correlations would probably be arranged in a correlation matrix. We can define a matrix as a rectangular array of elements. Here, an element is the general term used to describe what is in the matrix. For example, the Pearson correlation matrix for the data in Figure 7a(ii) is written as below, where the elements are the Pearson correlations between the variables. To identify which variables are being correlated, a Pearson correlation matrix usually has its rows and columns labeled with variable names. If we do this, we have:
CORRELATION MATRIX
Correlation Matrix
─────────────────────────────────────────────
X Y
─────────────────────────────────────────────
X Pearson's r —
df —
p-value —
95% CI Upper —
95% CI Lower —
Spearman's rho —
df —
p-value —
N —
Y Pearson's r 0.77731 —
df 8 —
p-value 0.00814 —
95% CI Upper 0.94462 —
95% CI Lower 0.28924 —
Spearman's rho 0.74392 —
df 8 —
p-value 0.01363 —
N 10 —
─────────────────────────────────────────────
Note. * p < .05, ** p < .01, *** p <
.001
The Pearson correlation between X and X, and Y and Y is not shown, because the Pearson correlation between any variable and itself is always 1 [see Figure 7a(i)]. Also, the Pearson correlation between X and Y is 0.7773, which is the same as the Pearson correlation between Y and X. Therefore, the correlation is only reported once, in what is often called the lower triangular half of the matrix.
In the Pearson correlation matrix output (shown in Figure 7c) the rows and columns are identified by their column numbers, and the Pearson correlation between columns 3 and 4 is blank. Since the Pearson correlation between Y and X is the same as the Pearson correlation between X and Y, only the latter result is shown, making the printout less muddled. Figure 7c also includes the covariance and the sums of squares used in calculating correlations, as provided by jamovi.
To introduce you to the idea of a correlation matrix, the latter example has only two variables: X, actual job performance, and Y, expected job performance.
The correlation matrix is of most value, however, in presenting the relationships between more than two variables. For example, consider the following three variables:
These three variables play an important role in deciding government fiscal and monetary policy. The arguments are too complex for discussion here but the Pearson correlations among these variables provide us with a good example of a correlation matrix.
Table 7a shows a labeled correlation matrix for these variables. The Pearson correlation between MS and GNP is .66; between GOVS and GNP it is .44; and between GOVS and MS it is .50. The lower and upper triangular halves of the correlation matrix have been enclosed so that you can see that they contain redundant information. Frequently, only the upper or lower triangular half of this matrix is found in a research report because it is assumed the reader knows of the redundancy between the two halves and that the Pearson correlation between a variable and itself is one.
GNP MS GOVS
GNP 1.00 0.66 0.44
MS 0.66 1.00 0.50
GOVS 0.44 0.50 1.00
Note that because deviation scores are simply linear transformations (every score minus the mean), the correlation between deviation scores for X and Y is exactly the same as the correlation for X and Y. We’ll talk more about this is a moment.
CORRELATION MATRIX
Correlation Matrix
─────────────────────────────────────────────
XDEV YDEV
─────────────────────────────────────────────
XDEV Pearson's r —
df —
p-value —
N —
YDEV Pearson's r 0.77731 —
df 8 —
p-value 0.00814 —
N 10 —
─────────────────────────────────────────────
Since the scale of measurement selected for a variable is often arbitrary (for example, an interval, where Y = a + (b * X) and a and b are arbitrarily chosen), it is valuable to know what effect a transformation of the X or Y scores has on r XY. Fortunately, the size of the Pearson correlation coefficient remains the same as long as the transformation of either or both variables is linear. A linear transformation has form a + (b * V), where a and b are constants and V (variable) is either X or Y. This means you can add any constant a to X and/or multiply X by any nonzero constant b. You can also do the same thing to Y, using either the same or different constants. In neither case will you change the size (i.e., magnitude, strength) of the Pearson correlation between X and Y. You may or may not change the sign of the correlation coefficient, however, as the following examples illustrate.
In a linear transformation, if both variables are multiplied by positive constants or both variables are multiplied by negative constants, the sign of the correlation coefficient will not change. For example, in Figure 7e(i), the X1 scores from Figure 7a(i) are transformed using the transformation X = -5 + 10 * X1, and the Y1 scores are transformed using the transformation Y = 2 + 5 * Y1. In Figure 7e(i), the Pearson correlation between the newly transformed X and Y scores is still a positive 1.00. This example illustrates two points: (a) when both variables are multiplied by positive constants, the sign of the resulting correlation does not change, and (b) the values of X and Y do not have to be equal for the Pearson correlation between them to be 1. Sign Change: Multiply One Variable by a Negative Constant
We can change the sign of the Pearson correlation if b is negative for only one of the variables. For example, if we multiply the X2 scores in Figure 7a(ii) by -2 and do not change the sign of the Y2 scores, we will change the Pearson correlation from .78 to -.78. Figure 7e(ii) illustrates this.
An example to illustrate that adding a constant to and/or multiplying a constant times each scores does not change the size of the correlation (compare to Figure 7a(i))
scatr::scat(
data = data,
x = '(X*10)-5',
y = '(Y*5)+2',
line = 'linear')
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 2700/10 = 270 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 1395/10 = 139.5 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= 1.03125\times 10^{5}/9 = 1.14583\times 10^{4} \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 2.0625\times 10^{5}/9 = 2.29167\times 10^{4} \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 5.15625\times 10^{4}/9 = 5729.167 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {1.14583\times 10^{4}} {(151.383)(75.691)} = 1 \end{align} \]
scatr::scat(
data = data,
x = 'X*-2',
y = 'Y',
line = 'linear')
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= -534/10 = -53.4 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 251/10 = 25.1 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= -2582.6/9 = -286.956 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 7000.4/9 = 777.822 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 1576.9/9 = 175.211 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {-286.956} {(27.889)(13.237)} = -0.777 \end{align} \]
Note that we changed the sign of the Pearson correlation coefficient but not its size. Since scales are generally arbitrarily chosen, this presentation gives you further insight into why both positive and negative correlations provide the same information on the relationship between two variables. The sign of a correlation coefficient only indicates the direction of the relationship (direct or indirect) and not the degree of the relationship. Depending on measurement choices made, the direction may be completely arbitrary.
Figure 7f illustrates why a scatterplot is always necessary when using the Pearson correlation coefficient. Figure 7f shows a Pearson correlation that is near 0, but there is obviously a relationship between the variables. The problem is that the Pearson correlation coefficient is a measure of linear relationship, and in Figure 7f, we have an example of a curvilinear relationship. The scatterplot identifies situations like this one, when the Pearson correlation coefficient is an inappropriate measure of relationship, that is, situations when there is a relationship, but it is not linear. Fortunately, measures of curvilinear relationship exist.
We might find data like that shown in Figure 7f if X were to represent the amount of rainfall measured in millimeters and Y were to represent the number of bushels of corn harvested per acre. With a small amount of rain, there is a small yield of corn; but as the amount of rain increases, so does the crop yield. At a maximum of about 30 millimeters, however, the crop yield starts to drop as the acres are exposed to too much rain. Since the curve in Figure 7f has dots that have a positive Pearson correlation for X between 2 and 30, but a negative Pearson correlation for X between 30 and 58, the result is a Pearson correlation of about zero.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
\[ \begin{align} M_X & = \overline X = MeanX = &\frac {\sum X} n &= 301/10 = 30.1 \\ M_Y & = \overline Y = MeanY = &\frac {\sum X} n &= 196/10 = 19.6 \\ s_{XY} & = CovarianceXY = &\frac {\sum xy} {n-1} &= -40.6/9 = -4.511 \\ s_X^2 & = VarianceX = &\frac {\sum x^2} {n-1} &= 2706.9/9 = 300.767 \\ s_Y^2 & = VarianceY = &\frac {\sum y^2} {n-1} &= 1196.4/9 = 132.933 \\ r_{XY} & = CorrelationXY = &\frac {s_{XY}} {s_X s_Y} &= \frac {-4.511} {(17.343)(11.53)} = -0.023 \end{align} \]
The size that a Pearson correlation coefficient can attain is directly related to the dispersion of the X and Y variables considered. Given that there is a linear relationship between two variables, the sample Pearson correlation coefficient will estimate its maximum population value when both X and Y are considered over their full ranges. However, if either X or Y, or both, are limited to partial range, the correlation coefficient will estimate a population value that is less than the maximum value. The assumption of a linear relationship is important here. In the curvilinear relationship shown in Figure 7f, the correlation coefficient would increase if the range on X were limited to values between 2 and 30.
If we consider the relationship between intelligence and achievement, we will find the highest Pearson correlation when we consider subjects with a full range of possible scores on these variables. If, however, we only consider subjects who have IQs between 116 and 124, we will find a lower Pearson correlation. This latter case is an example of a spuriously low Pearson correlation between two variables.
The data in Figure 7j can be used to illustrate a spuriously low correlation. The scattergram for these data is shown in Figure 7k. For the full range of data in Figure 7j, we find a positive Pearson correlation of .59. When only the low scores of the first ten subjects are considered, however, the Pearson correlation is reduced to .05 (see Figure 7a(iv)).
Another example of the reduction in correlation due to restriction in range is found if you study the relationship between the Miller’s Analogy test (an instrument often used to help determine entrance into graduate school) and grade-point average. You find the relationship between these two measures is generally low when only Ph.D. students are considered. This low Pearson correlation occurs partially because Ph.D. students usually have only high scores on the Miller’s Analogies test. A larger Pearson correlation is found if you include students who were not admitted to graduate school in the study.
In this section, we will consider examples of spuriously high correlations to illustrate that a large Pearson correlation coefficient does not necessarily imply causation; that is, if two variables are linearly related, it does not necessarily mean that one caused the other.
By changing the scenario associated with the data in Figure 7j, we can see how data can be used to illustrate a spuriously high Pearson correlation. Let the X scores in Figure 7j represent the salaries, to the nearest thousand dollars, of a random sample of men and women. Let the Y scores represent attitudes toward sports that are dangerous for children.
Here, the Pearson correlation between X and Y is a moderate, positive Pearson correlation of .59. On closer examination, however, we find that the first ten subjects are women and the next ten are men. When we examine the Pearson correlation within each of these groups it is nearly zero. [The Pearson correlation among the women’s scores is shown in Figure 7a(iv), and the men’s scores were made up by taking the women’s scores and adding 30 points to them.]
Figure 7k shows that there is no relationship between the two variables when group composition is considered, but that a spuriously high Pearson correlation is found because the women in this study generally made lower salaries than the men. In this example, a third variable, group composition, caused a spuriously high Pearson correlation.
A second example of a spuriously high correlation is illustrated using the data from table 7b. Table 7b contains the number of preachers and the number of alcoholics in a small mid-western town as recorded every ten years for a 100-year period. For this data, the Pearson correlation between the number of alcoholics and the number of preachers is .99. Does this mean that preachers cause people to become alcoholics or that alcoholics cause people to become preachers? Perhaps, but probably not. The reason for this strong relationship is probably a third variable, population growth–which is also highly correlated with both.
Year Preachers Alcoholics
Year 1.00000 0.99742 0.99171
Preachers 0.99742 1.00000 0.99288
Alcoholics 0.99171 0.99288 1.00000
Since a large Pearson correlation does not necessarily imply causation, you must be careful in interpreting the meaning of the correlation. A large correlation between two variables does not necessarily mean that the correlation is spurious, however. A strong Pearson correlation between two variables may indicate that one variable did cause, or partially cause, the other. That is, a causal relationship does indeed require that the variables be correlated. The point is that we cannot be sure of causation simply because of a large correlation coefficient. Correlational analysis can be seen as a first step in identifying causation; experiments can be seen as a second step.
scatr::scat(
data = data,
x = 'X',
y = 'Y',
group = 'Group',
line = "linear")
scatr::scat(
data = data,
x = 'X',
y = 'Y',
line = 'linear')
For example, the strong Pearson correlation found between smoking and cancer led researchers to investigate this relationship more closely. At first, these researchers were not sure if X caused Y or if Y caused X. Perhaps cancer, or some third variable, such as an unknown chemical within the body, caused people to smoke. Further study involving experiments, however, led researchers to conclude that smoking could indeed cause cancer. This is a good example of the use of the Pearson correlation coefficient. The Pearson correlation coefficient by itself did not allow researchers to relate two variables in a causal manner, but it did serve as a signal to spur further research involving experiments that could more clearly establish cause.
There are several measures of relationship based on the Pearson product-moment correlation coefficient that you may find in the research literature of your discipline. These measures of relationship are: the phi coefficient, the point-biserial correlation, and the Spearman rank correlation coefficient. The values of all three of these measures of relationship can be found using the Pearson product-moment correlation equation, equation (7-3). They can also be found using special equations based on the known scales of measurement of the variables. These special equations are all algebraically derivable from the Pearson product-moment equation. The special equations for these three measures of relationship were derived before the widespread use of computers. Therefore, they are useful when you are forced to use hand calculations. These special equations generated special names that were useful in identifying the levels of measurement of the variables considered. Today, the names of these measures of relationship are still useful, although the calculation of these measures can be easily done using the Pearson product-moment equation on a computer.
The phi coefficient is generally used with data where both X and Y are measured on a dichotomous scale (that is, a nominal or ordinal scale). In jamovi, the Phi Coefficient can be obtained through several packages and functions. An example follows.
In a market survey, a researcher was interested in the possible relationship between a person’s marital status and whether that person purchased cable television. Based on responses to a questionnaire, the researcher coded the subjects as 1 if they were married and 2 if they were single, and as 1 if they had purchased cable television and 2 if they had not purchased cable television. The results were as follows:
The frequency count for these data can be arranged in a two-by-two table as follows:
No(0) Yes(1)
Married(1) 2 6
Single(2) 4 3
There were two respondents who were married and had not purchased cable TV, six respondents who were married and had purchased cable TV, three respondents who were single and had not purchased cable TV, and four respondents who were single and had purchased cable TV. We can denote these frequencies by the letters A, B, C, and D:
No(0) Yes(1)
Married(1) A B
Single(2) C D
The equation for finding the phi coefficient is:
\[ \begin{equation} \text {phi} = \phi = \frac {(BC-AD)} {\sqrt {(A+B)(C+D)(A+C)(B+D)}} \tag{7-7} \end{equation} \] We can find the phi coefficient using this equation:
\[ \text {phi} = \phi = \frac {6*4-2*3} {\sqrt {(2+6)(4+3)(2+4)(6+3)}} = \frac {24-6} {\sqrt {8*7*6*9}} = \frac {18} {\sqrt {3024}} = .327 \]
jmv::contTables(
formula = ~ Marital_Status:Purchased_Cable_TV,
data = data,
phiCra = TRUE,
ci = FALSE)
CONTINGENCY TABLES
Contingency Tables
──────────────────────────────────────────────
Marital_Status Yes(1) No(0) Total
──────────────────────────────────────────────
Married(1) 2 6 8
Single(2) 4 3 7
Total 6 9 15
──────────────────────────────────────────────
χ² Tests
─────────────────────────────────
Value df p
─────────────────────────────────
χ² 1.6071 1 0.20489
N 15
─────────────────────────────────
Nominal
──────────────────────────────
Value
──────────────────────────────
Phi-coefficient 0.32733
Cramer's V 0.32733
──────────────────────────────
The point-biserial correlation is generally used with data where X is measured on a dichotomous scale (that is, nominal or interval) and Y is measured on a continuous scale (that is, interval or ratio). In jamovi, the point-biserial correlation is obtained whenever a binary (or dichotomous) nominal or interval variable is correlated with a scale variable using Bivariate Correlations (it is typically recommended to code the binary variable as 0/1, but other coding will work).
Note that the item-total correlation provided by the Reliability Analysis in jamovi is a special form of the point-biserial correlation when the items are coded as binary (e.g., 0/1 for Incorrect/Correct). This special form is called the “Corrected Item-Total Correlation” and reflects that for each item, that item is excluded from the total score (i.e., the total score is calculated using all other items).
The point-biserial correlation is frequently used to measure the relationship between a person’s score on an item from a test and the person’s total score on the test (also more generally called the item-total correlation, which applies to both binary and scale items). The logic is that if the item is “good” it will have a strong positive point-biserial correlation; that is, subjects who score high on an item will also score high on the total test. If this is the case, the item is said to discriminate between subjects who score high and subjects who score low on the test. For example, consider the following data, which was collected on a 40-item test given to 14 subjects.
The equation for the point-biserial correlation is:
\[ \begin{equation} r_{PBIS} = \frac {M_1 - M_0} {s_Y} \sqrt {\frac {n_1 n_0} {n (n-1)}} \tag{7-8} \end{equation} \] Here, M1 is the mean on Y (total test score) for those who scored 1 on Item X, M0 is the mean on Y for those who scored 0 on Item X, sY is the standard deviation of all Y scores, n1 is the number of units that scored 1 on Item X, n0 is the number of units that scored 0 on Item X, and n = n1 + n0.
We can calculate the point-biserial correlation using information from Figure 7p as follows:
\[ r_{PBIS} = \frac {24.3-17} {5.452} \sqrt {\frac {10(4)} {14(13)}} = \frac {7.3} {5.452} \sqrt {0.220} = 1.339(0.469) = .628 \] Output for Reliability and for the Pearson Correlation is provided in Figure 7p. Note that the Corrected-Item Total Correlation for item X is not the same as the Item-Total Correlation (i.e., the point-biserial correlation). This is because item X has been removed from the Total score that is used to calculate the correlations between Item X and Total scores. Items are removed in this way to calculate the Corrected Item-Total Correlation because a small part of the correlation between an item and the total score would otherwise be due to the fact that the item is included in the calculation of the total score. Removing the item in this way helps us see how strongly each item is correlated with all the other items. Most measurement scholars prefer the Corrected Item-Total Correlation, but the uncorrected point-biserial correlation is still very informative about the quality of an item.
DESCRIPTIVES
Descriptives
──────────────────────────────────────────────────────────────────────────────────
N Mean Median SD Minimum Maximum
──────────────────────────────────────────────────────────────────────────────────
Score_on_Item 14 0.71429 1.0000 0.46881 0.0000 1.0000
Total_Test_Score 14 22.21429 22.5000 5.45159 12.0000 29.0000
──────────────────────────────────────────────────────────────────────────────────
DESCRIPTIVES
Descriptives
───────────────────────────────────────────────────────────────────────────────────────────────
Score_on_Item N Mean Median SD Minimum Maximum
───────────────────────────────────────────────────────────────────────────────────────────────
Total_Test_Score 0 4 17.000 17.000 4.1633 12.000 22.000
1 10 24.300 25.000 4.4981 17.000 29.000
───────────────────────────────────────────────────────────────────────────────────────────────
CORRELATION MATRIX
Correlation Matrix
────────────────────────────────────────────────────────────────────────
Score_on_Item Total_Test_Score
────────────────────────────────────────────────────────────────────────
Score_on_Item Pearson's r —
df —
p-value —
Total_Test_Score Pearson's r 0.62776 —
df 12 —
p-value 0.01623 —
────────────────────────────────────────────────────────────────────────
RELIABILITY ANALYSIS
Scale Reliability Statistics
───────────────────────────────────────────────────────────────
Mean SD Cronbach's α McDonald's ω
───────────────────────────────────────────────────────────────
scale 0.56000 0.38644 0.84077 0.85021
───────────────────────────────────────────────────────────────
Item Reliability Statistics
─────────────────────────────────────────────────────────────────────────────────────────
Mean SD Item-rest correlation Cronbach's α McDonald's ω
─────────────────────────────────────────────────────────────────────────────────────────
item01 0.50000 0.52705 0.77588 0.76949 0.78605
item02 0.40000 0.51640 0.77174 0.77124 0.80069
item03 0.70000 0.48305 0.62007 0.81514 0.82584
item04 0.40000 0.51640 0.44430 0.86339 0.86627
item05 0.80000 0.42164 0.64550 0.81111 0.83410
─────────────────────────────────────────────────────────────────────────────────────────
The Item-Rest Correlations are essentially point-biserial correlations because these items are marked 0 or 1 (for incorrect and correct). These are often called “corrected” or “adjusted” point-biserial correlations (or item-total correlations) for binary test data.
The Spearman rank correlation, often called Spearman’s rho, is based on data where both X and Y are measured on an ordinal scale. The Spearman rank order correlation coefficient is used with units that have been rank ordered on two variables. For example, consider the following data. Two judges rank ordered ten patients on their ability to deal with test anxiety after they had experienced an intensive two-day coping-skills treatment.
The equation for the Spearman rank correlation coefficient is:
\[ \begin{equation} r_S = 1 - {\frac {6 \sum d^2} {n (n^2-1)}} \tag{7-9} \end{equation} \] Here, d equals the difference between the ranks (for example, d = 8 - 4 for subject A), and n is the number of units ranked. Not that Carl Spearman proposed the rank coefficient which he denoted by the Greek letter rho (ρ). In this book, the Spearman coefficient is denoted by rS, because rho is reserved for the population Pearson product-moment correlation coefficient.
We can calculate the Spearman rank correlation using information from Figure 7t as follows:
\[ r_S = 1 - {\frac {6 (106)} {10 (99)}} = 1 - {\frac {636} {990}} = 1-.642 = .358 \]
DESCRIPTIVES
Descriptives
──────────────────────────────────────────────────────────────────────
N Mean Sum SD Minimum Maximum
──────────────────────────────────────────────────────────────────────
d 10 0.0000 0.0000 3.4319 -8.0000 5.0000
d_sq 10 10.6000 106.0000 20.1726 0.0000 64.0000
──────────────────────────────────────────────────────────────────────
In jamovi, the Spearman Correlation can be obtained as part of the Correlations analysis (see Figure 7u). Note that because the data for our judges is already ranked, the Pearson and the Spearman correlations produce the same result. However, when we have scale data that we later rank, the two correlations will be different.
CORRELATION MATRIX
Correlation Matrix
───────────────────────────────────────────────────
Judge_1 Judge_2
───────────────────────────────────────────────────
Judge_1 Pearson's r —
df —
p-value —
Spearman's rho —
df —
p-value —
Judge_2 Pearson's r 0.35758 —
df 8 —
p-value 0.31038 —
Spearman's rho 0.35758 —
df 8 —
p-value 0.31280 —
───────────────────────────────────────────────────
This chapter explained how to describe the linear relationship between two variables using a descriptive statistic called the Pearson product-moment correlation coefficient. It also explained how to calculate this correlation. Another measure of relationship is denoted by sXY, and called the covariance. You have learned how to find the covariance and the Pearson correlation coefficient both by hand and using jamovi. You also found that when examining several variables, researchers frequently report the Pearson correlations among these variables in a matrix called the correlation matrix.
Several examples illustrated that the size of the Pearson correlation coefficient is unaffected when either the X variable or the Y variable is transformed using a linear transformation, but that the sign of the Pearson correlation coefficient may be changed by changing the signs of the scores on only one variable. Also, the size of a Pearson correlation coefficient is directly related to the range of the X and Y scores considered, and restricting the range of the X and/or Y values reduces the size of the Pearson correlation coefficient.
Examples illustrated that spuriously high Pearson correlations may be due to the existence of one or more other variables. For all examples, scatterplots are necessary both as visual aids for interpretation and as a means of identifying nonlinear relationships where the Pearson correlation coefficient is an inappropriate measure of relationship.
In considering causality, a high Pearson correlation does not necessarily imply, that one variable caused another. Examples illustrated both the problems of interpreting the Pearson correlation coefficient and its value as a spur to further investigation and possible experimentation. Some of the difficulties in implying cause are due to the possibility of a relationship that is dependent on one or more third party variables. There is also difficulty in deciding if X caused Y or if Y caused X.
Chapter 7 Appendix A Study Guide for Correlation
SECTION 1: Bivariate Correlation (Descriptive) Analyses to Run • Use a CATEGORICAL variable W (Note that we CANNOT include NOMINAL/ORDINAL VARIABLES with MORE THAN 2 GROUPS/LEVELS here for W) • Use a SCALE variable Y • Use a SCALE variable X (an ORDINAL variable with a large RANGE and large VARIATION may be okay but is NOT really desirable – an ORDINAL predictor would require very careful interpretation) • Run correlations with all THREE variables (W, X, and Y) • Run descriptive statistics for both Y and X together • Run a scatterplot for X and Y o After creating the graph, add the “Fit Line” (regression line) • Run a matrix scatterplot with all THREE variables (W, X, and Y)
Using the output, respond to the following DESCRIPTIVE items
Explain the difference between a positive correlation and a negative correlation.
When is the correlation coefficient ZERO?
What is Covariance?
Show or explain how Covariance is calculated from Sum of Cross Products (use N-1 where necessary)
What is Correlation?
Show or explain how the Pearson correlation is calculated from Covariance and SD (use N-1 where necessary)
What is the difference between Spearman Correlation and Pearson Correlation?
Report and interpret the Pearson’s r for all pairs of variables you have chosen.
Specifically, what are the degrees of freedom used for the correlation between Y and X in this analysis?
Generally, what are the degrees of freedom used for a correlation analysis? That is, what formula is used to calculate degrees of freedom in Pearson correlation? Explain why.
Skip any of the next four items if it is not possible to answer them (e.g., if there are no negative correlations). Do NOT include any correlations of a variable with itself (that always have r = 1.0).
Which pair of variables has the strongest positive relationship (if any)
Which pair of variables has the strongest negative relationship (if any)
Which pair of variables has the weakest positive relationship (if any)
Which pair of variables has the weakest negative relationship (if any)
Report and interpret how much variation is shared by the two variables with the strongest correlation
Report and interpret the Spearman’s rho for all pairs of variables you have chosen.
Skip any of the next four items if it is not possible to answer them (e.g., if there are no negative correlations). Do NOT include any correlations of a variable with itself (that always have r = 1.0).
Using SCATTERPLOT outputs, respond to the following items
Using ALL the output in this section above, respond to the following item