Pairs of correlated scores are frequently encountered in research projects. In this chapter, we will consider statistical tests of hypotheses concerned with the population means of such score pairs and with their population Pearson product-moment correlation. In considering the statistical tests of these hypotheses, we will find that we must assume that the score pairs follow what is referred to as a bivariate normal distribution in the population of interest. Therefore, before we consider these hypotheses we will discuss the bivariate normal distribution.
In this chapter we will consider three test statistics. We will use the t statistic to consider hypotheses concerning the means of dependent measures. This t statistic has several names: paired t test, paired-samples t test, dependent t test are all usually interchangeably by scholars and researchers for the name of the t test used with paired data. The z statistic and t statistic are the statistics mot commonly used to test the statistical significance of a correlation. We will also use a t statistic when we test the statistical significance of a regression coefficient. Our discussion will be concluded by considering a test statistic called the chi- square statistic, which we will use to test the hypothesis that the population variance of a single group of scores is equal to a constant.
In considering each test statistic, we will continue the saga of Mary Barth and her United Nations delegates, but we will add a second column of data to our data set. The dependent variable will again be intelligence as measured by the World Adult Intelligence Scale (WAIS). We will consider only two data sets for all of our statistical tests, but, as in chapter 12, each time we consider a new situation or test statistic we will change the scenario. This will again free us to emphasize the test statistics and the hypothesis test process and not the data per se as we consider new situations. In chapter 12, each research problem was illustrated using all four of the possible research hypotheses. In this chapter space will permit an example of only one research hypothesis, but the others will be presented as exercise.
In the next two sections, we will consider statistical tests that are used when two measurements have been taken. We will label these measurements as X1 and X2. Here the subscripts 1 and 2 will be used to identify where a measurement took place, that is, in group 1 or in group 2, or the point in time that a measurement occurred, that is, first (time 1) or second (time 2). In the situations that we considered in chapter 12, we found a single measurement and we assumed that it was normally distributed in the population of interest. In the next two sections, we will make the assumption that the two measurements (dependent variables) follow what is called a bivariate normal distribution.
A normal frequency distribution is a two dimensional Figure which is determined by scores on the x-axis and by the score frequencies (heights) on the y-axis. A bivariate normal frequency distribution is a three dimensional Figure which is determined by pairs of scores which locate points in the (x, y) coordinate plane and by the frequencies of these points which are measured on a third axis labeled the z-axis. A bivariate normal distribution is illustrated in Figure 13a(i). If you consider Figure 13a(i), you can see why the bivariate normal distribution is sometimes compared to a Mexican sombrero (a type of hat).
An aerial view of a bivariate normal distribution is displayed in the contour map of Figure 13a(ii). In Figure 13a(ii), the elliptical contours are similar to those found on a topological map, that is, they are an indication of the heights of the score points below them. Therefore, the center ellipse would show the highest frequency (density) of scores with the frequency decreasing as one moves out from the center. Researchers frequently use scatterplots such as those shown in chapter 7 to check on the assumption of bivariate normality. If the assumption of bivariate normality is reasonable, a researcher would expect an elliptical pattern of points which is darkest at the center (showing high frequency) with decreasing darkness as one moves out from the center.
The following are some points that you should be aware of in dealing with a bivariate normal distribution both in general and in this chapter.
In this section, we will discuss data wherein the scores are paired and therefore presumed to be correlated. Here, we will consider the conditions under which you would use Student’s t test to test the null hypothesis that the difference between the population means of these paired scores is equal to a constant. We will refer to this t test as the dependent t test because it is used with scores that are correlated, or dependent on one another. The usual value chosen for the constant in the null hypothesis which is tested using the dependent t test is zero: \(c_0 = 0\). When this is the case, you are testing a null hypothesis about a mean difference, and the null hypothesis can be written in several equivalent ways (recall that 1 and 2 here stand for group or time, e.g., group 1 or 2, time 1 or 2):
\[ \begin{align} H_0&: \mu_1 = \mu_2 \\ H_0&: \mu_1 – \mu_2 = 0 \\ H_0&: \mu_D = 0 \end{align} \]
where \(\mu_D = \mu_1 – \mu_2\) (i.e., D stands for difference).
The first situation in which the dependent t test is used is found when the same unit is measured at two different times with the same or a commensurate instrument. Commensurate instruments are instruments that have the same measurement scale. For example, different forms of the Miller’s Analogy test may be described as commensurate instruments. Because the instruments have the same scale, we would expect them to also have similar means (otherwise, comparing means would make very little sense).
This situation represents the most elementary form of what statisticians refer to as a repeated measurements design. In the example that we will consider shortly, we will again focus on the intelligence of the delegates to the United Nations, however, here the intelligence of the delegates to the United Nations will be measured twice; at the start of the study and then six months later.
The second situation in which the dependent t test is used is found when units are matched on a variable and then are randomly assigned to one of two treatments. This situation represents the most elementary form of what statisticians refer to as a randomized block design. In the example considered here, 32 delegations to the U.N. were randomly selected from the population of delegations, and two delegates were randomly sampled from each delegation and placed either into a group that was given lessons on how to take intelligence tests or a group that was given no lessons. Here, the delegates were matched (or blocked) on delegation, that is, they were from the same delegations.
In Figure 13b(i), four subjects are illustrated twice, once at time 1 and once at time 2. This Figure is meant to illustrate the repeated measures design where the same subject is measured twice, once at time 1 and then again at time 2. In Figure 13b(ii), eight different subjects are illustrated, but the subjects are matched based on similar appearance. This Figure is meant to illustrate a randomized block design, where subjects are matched on one or more characteristics. Note, however, that even though the subjects in Figure 13b(ii) are matched on similar appearance, there are other characteristics on which the subjects are not matched.
The state of affairs that must exist before you can consider the use of Student’s dependent t test statistic are:
You have a random sample of paired measurements which were derived from either a random sample of a single group of units measured at two different times (Situation 1), or a random sample of units that were matched on one or more variables and then randomly placed into one of two treatments (Situation 2).
You are interested in the difference between the means of your two measurements.
The population variance of the mean differences is unknown. (Note that if the variance of the mean differences is known, the z statistic described in chapters 11 and 12 would be used here. In practice, however, the population variance of the mean differences is rarely known, so we will not consider the z statistic.)
Given the preceding state of affairs, there are two equations which are commonly used to calculate the t statistic.
\[ \begin{equation} t = \frac{M_D - c_0} {s_D ⁄\sqrt{n}} \tag{13-1} \end{equation} \]
Using Equation 13-1, the difference of each pair of scores is first found; then the mean, \(M_D\), and standard deviation, \(s_D\) of the n pairs of differences are found.
\[ \begin{equation} t = \frac {(M_1-M_2 )-c_0} {\sqrt{(s_1^2⁄n)+(s_2^2⁄n)-(2r_{12} s_1 s_2⁄n)}} \tag{13-2} \end{equation} \] Equation 13-2 is based on the sample means, M1 and M2, variances, \(s_1^2\) and \(s_2^2\), standard deviations, \(s_1\) and \(s_2\), and the correlation between the measures, \(r_{12}\). Both equations yield the same result.
When the null hypothesis is true, the sampling distribution of the t statistic follows a t distribution with v = n – 1 degrees of freedom, where n is the number of pairs of measurements.
The two equations are based on the following points:
A population of score differences has a mean of \(c_0 = \mu_D\) and a variance of \(\sigma_D^2\).
The underlying sampling distribution consists of sample mean differences, that is, MD, where MD is the difference between two sample means of paired data (i.e., \(M_D = M_1 - M_2\)).
Under the Central Limit Theorem, the mean of the sampling distribution of mean differences is \(c_0\) (i.e., the mean of the original difference score population), and the variance is \(σ_D^2⁄n\) (i.e., the variance of the original difference score population divided by the number of pairs in a sample).
The variance of the sampling distribution is called the variance of the mean differences, and is denoted by:
\[ \begin{equation} \sigma_{M_1-M_2}^2 = \frac{σ_D^2} {n} \tag{13-3} \end{equation} \]
\[ \begin{equation} \sigma_{M_1-M_2} = \frac{σ_D} {\sqrt{n}} \tag{13-4} \end{equation} \]
\[ \begin{equation} s_D = \sqrt{(s_1^2) + (s_2^2) - (2r_{12} s_1 s_2)} \tag{13-5} \end{equation} \]
and since \(M_D = M_1 – M_2\), the numerators and denominators of equations 1 and 2 are the same.
The paired t test statistic will be valid when:
The importance of the preceding assumptions is as follows:
The assumption that the pairs are independent of one another is extremely important, because if it is violated, the level of significance (i.e., the probability of rejecting a true null hypothesis) can increase dramatically (e.g., from .05 to .40). b. Given a small sample size, a violation of the assumption of bivariate normality does not seriously affect the level of significance for the two-tailed test. However, it may be a problem for one-tailed tests (see Srivastava, 1959). Given large samples (n > 25), the assumption of normality in the population can generally be ignored.
In this situation, Mary Barth has decided to see if there is a difference between the mean intelligence scores found at the beginning of her study and those found six months later. Since the World Adult Intelligence Scale has been found to yield reliable results in the past, and since an adult’s intelligence level is not known to change over a short period of time, Mary predicted that there would be no change in the mean IQ level of the U.N. delegates. Let us now consider the elements of hypothesis testing in this situation. Even though Mary may have thought there would be no difference, she needed to set the Null Hypothesis that the means are equal. Her hope was that the difference would not be statistically significant, which would cause her to reject the null hypothesis of equal means. However, it is important to note that failing to reject the null hypothesis does not mean that it is true—it only means that there was no evidence to reject it. Therefore, this analysis proceeds in exactly the same way as if Mary had predicted that there would be a difference—just with different outcomes desired.
Is there a change in the mean intelligence level of delegates to the United Nations when measured on the World Adult Intelligence Scale from one time period to another?
There will be no change in the mean intelligence level of the delegates to the United Nations when measured on the World Adult Intelligence Scale from one time period to another.
\[ \begin{align} H_0&: \mu_D = 0 \\ H_A&: \mu_D \ne 0 \end{align} \]
Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed in a previous chapter.
The probability of rejecting a true null hypothesis (the probability of making a Type I error) was set at .05.
Mary Barth decided that she would like her power to reject the null hypothesis if she should reject it to be at least .85; she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.
In a confirmatory study, a modification of equation (11-2) is used to determine the a priori effect size. We will denote the effect size for the dependent t test as \(d_d\); the subscript “d” is for dependent. The effect size for the dependent t test is:
\[ \begin{equation} d_d = \large{\frac{|\mu_D - c_0|} {\sigma_D}} \tag{13-6} \end{equation} \]
The relationship between Cohen’s two-group effect size d and \(d_d\) is:
\[ \begin{align} d &= d_d(\sqrt{2}) \\ &or \\ d_d &= d⁄\sqrt{2} \tag{13-7} \end{align} \]
In an exploratory study, Cohen’s (1988) two-group effect sizes: small, \(d = .20\) (\(d_d = .14\)), medium, \(d = .50\) (\(d_d = .35\)), or large, \(d = .80\) (\(d_d = .57\)), may be used with Table 13a to select sample sizes.
In this confirmatory study, Mary Barth had decided that if the means differed by 3.0 points or more, she would like to have a high probability of detecting a difference this large or larger. Also, past evidence had indicated that a reasonable estimate of the population difference standard deviation was approximately 5.0. Substituting these values into equation (13-6) Mary found that the dependent a priori effect size was:
\[ \begin{equation} d_d = \frac{3} {5} = .60 \end{equation} \]
Sample size is found as a modification of equation (11-3) as:
\[ \begin{equation} n = \left[\frac{\sigma_D(z_{(1-\alpha_P)} + z_{(1-\beta)})} {|\mu_D - c_0|}\right]^2 \tag{13-8} \end{equation} \]
where
Substituting these values into equation (13-8), we have:
\[ n = \left[\frac{5(1.96 + 1.04)} {3}\right]^2 = 25 \]
Therefore, 25 subjects would yield power of .85. Mary decided to use 32 subjects, however, because this number could be easily sampled, and 32 subjects would assure her of power greater than .85.
In an exploratory study, sample size can be found using Table 13a1-4, given Cohen’s a priori effect size, \(d\), the level of significance. \(\alpha\), and a two-tailed test. See exercise 13.3.
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 365 | 94 | 44 | 26 | 18 | 14 | 11 | 9 | 7 | 6 |
0.5 | 667 | 170 | 78 | 45 | 30 | 22 | 17 | 14 | 10 | 8 |
0.6 | 804 | 204 | 93 | 54 | 36 | 26 | 20 | 16 | 12 | 9 |
0.65 | 881 | 223 | 101 | 59 | 39 | 28 | 22 | 18 | 13 | 10 |
0.7 | 965 | 244 | 111 | 64 | 42 | 31 | 23 | 19 | 13 | 11 |
0.75 | 1060 | 268 | 121 | 70 | 46 | 33 | 25 | 20 | 14 | 11 |
0.8 | 1172 | 296 | 134 | 77 | 51 | 36 | 28 | 22 | 16 | 12 |
0.85 | 1309 | 330 | 149 | 85 | 56 | 40 | 31 | 24 | 17 | 13 |
0.9 | 1492 | 376 | 169 | 97 | 63 | 45 | 34 | 27 | 19 | 14 |
0.95 | 1785 | 449 | 202 | 115 | 75 | 53 | 40 | 32 | 22 | 16 |
0.99 | 2407 | 605 | 271 | 154 | 100 | 71 | 53 | 41 | 28 | 21 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 276 | 71 | 34 | 20 | 14 | 11 | 9 | 7 | 6 | 5 |
0.5 | 544 | 139 | 63 | 37 | 25 | 18 | 14 | 12 | 9 | 7 |
0.6 | 669 | 170 | 77 | 45 | 30 | 22 | 17 | 14 | 10 | 8 |
0.65 | 739 | 187 | 85 | 49 | 33 | 24 | 18 | 15 | 11 | 8 |
0.7 | 816 | 206 | 94 | 54 | 36 | 26 | 20 | 16 | 11 | 9 |
0.75 | 904 | 228 | 103 | 60 | 39 | 28 | 22 | 17 | 12 | 10 |
0.8 | 1007 | 254 | 115 | 66 | 43 | 31 | 24 | 19 | 13 | 10 |
0.85 | 1134 | 286 | 129 | 74 | 48 | 35 | 26 | 21 | 15 | 11 |
0.9 | 1305 | 329 | 148 | 85 | 55 | 39 | 30 | 24 | 16 | 12 |
0.95 | 1580 | 397 | 178 | 102 | 66 | 47 | 35 | 28 | 19 | 14 |
0.99 | 2168 | 544 | 244 | 139 | 90 | 63 | 47 | 37 | 25 | 18 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 167 | 44 | 21 | 13 | 9 | 7 | 6 | 5 | 4 | 4 |
0.5 | 387 | 98 | 45 | 26 | 18 | 13 | 10 | 9 | 6 | 5 |
0.6 | 492 | 125 | 57 | 33 | 22 | 16 | 13 | 10 | 7 | 6 |
0.65 | 552 | 140 | 64 | 37 | 24 | 18 | 14 | 11 | 8 | 6 |
0.7 | 620 | 157 | 71 | 41 | 27 | 20 | 15 | 12 | 9 | 7 |
0.75 | 696 | 176 | 80 | 46 | 30 | 22 | 17 | 13 | 10 | 7 |
0.8 | 787 | 199 | 90 | 52 | 34 | 24 | 19 | 15 | 10 | 8 |
0.85 | 900 | 227 | 102 | 59 | 38 | 27 | 21 | 17 | 12 | 9 |
0.9 | 1053 | 265 | 119 | 68 | 44 | 32 | 24 | 19 | 13 | 10 |
0.95 | 1302 | 327 | 147 | 84 | 54 | 39 | 29 | 23 | 16 | 12 |
0.99 | 1840 | 462 | 207 | 117 | 76 | 54 | 40 | 31 | 21 | 15 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 93 | 25 | 12 | 8 | 6 | 5 | 4 | 3 | 3 | 3 |
0.5 | 272 | 69 | 32 | 19 | 13 | 9 | 8 | 6 | 5 | 4 |
0.6 | 362 | 92 | 42 | 24 | 16 | 12 | 9 | 8 | 6 | 5 |
0.65 | 414 | 105 | 48 | 28 | 18 | 13 | 10 | 8 | 6 | 5 |
0.7 | 472 | 119 | 54 | 31 | 21 | 15 | 12 | 9 | 7 | 5 |
0.75 | 540 | 136 | 62 | 36 | 23 | 17 | 13 | 10 | 7 | 6 |
0.8 | 620 | 156 | 71 | 41 | 27 | 19 | 15 | 12 | 8 | 6 |
0.85 | 721 | 182 | 82 | 47 | 31 | 22 | 17 | 13 | 9 | 7 |
0.9 | 858 | 216 | 97 | 55 | 36 | 26 | 19 | 15 | 11 | 8 |
0.95 | 1084 | 272 | 122 | 70 | 45 | 32 | 24 | 19 | 13 | 10 |
0.99 | 1579 | 396 | 177 | 100 | 65 | 46 | 34 | 27 | 18 | 13 |
Using JAMOVI module distrACTION, the critical values of the t statistic for the two-tailed test with alpha = .05 and degrees of freedom = 31 are found. Note that sometimes we use tables that do not include the exact degrees of freedom we desire. In such cases, we recommend a conservative approach, that is, using df = 20 or df = 30 when df = 31 is not available in the table. We call it conservative because a smaller df typically implies a smaller sample size and, other things being equal, it is harder to obtain statistical significance with smaller samples and smaller df.
\[ \begin{align} t_{(0.025, 31)} &= -2.0395 \\ \text{and} \\ t_{(0.975, 31)} &= +2.0395 \end{align} \]
Box 13.1
At this point the a priori parameters
of hypothesis testing have been
established (i.e., your "bet" is made).
The two measurements of the 32 randomly selected United Nations delegates are shown in Table 13b, along with their difference scores. Here are the pretest and posttest scores Mary collected (Table 13b).
Most importantly, we need to check for outliers on the difference scores (see Figure 13c). However, outliers on the individual variables may also be informative and are usually worth investigating at a univariate level.
jmv::descriptives(
data = data,
vars = Difference,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
box = TRUE,
extreme = TRUE)
DESCRIPTIVES
EXTREME VALUES
Extreme values of Difference
──────────────────────────────────────────
Row number Value
──────────────────────────────────────────
Highest 1 4 5.0000
2 11 5.0000
3 1 4.0000
4 2 4.0000
5 19 4.0000
Lowest 1 6 -12.0000
2 12 -8.0000
3 3 -6.0000
4 10 -6.0000
5 16 -5.0000
──────────────────────────────────────────
We need to check the assumption that the difference scores are normally distributed (see Figure 13d1). This will also be provided as output from the paired t test in JAMOVI. We should also check that there is a non-zero correlation between the two measures (see Figure 13d2). The correlation between X1 and X2 needed for equation 2 was found to be r = 0.975 using Bivariate Correlation, as described in chapter 7. We can still run the paired-samples t test if the correlation is low, but we are using the dependent t test, in part, because we expect there to be a correlation (a dependency) between the measures.
jmv::descriptives(
data = data,
vars = Difference,
hist = TRUE,
boxLabelOutliers = FALSE,
qq = TRUE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
skew = TRUE,
kurt = TRUE,
sw = TRUE)
DESCRIPTIVES
Descriptives
─────────────────────────────────────
Difference
─────────────────────────────────────
N 32
Missing 0
Mean -1.2500
Skewness -0.57441
Std. error skewness 0.41446
Kurtosis 0.74812
Std. error kurtosis 0.80937
Shapiro-Wilk W 0.95702
Shapiro-Wilk p 0.22725
─────────────────────────────────────
jmv::corrMatrix(
data = data,
vars = vars(IQ_Posttest, IQ_Pretest),
n = TRUE)
CORRELATION MATRIX
Correlation Matrix
───────────────────────────────────────────────────────────
IQ_Posttest IQ_Pretest
───────────────────────────────────────────────────────────
IQ_Posttest Pearson's r —
df —
p-value —
N —
IQ_Pretest Pearson's r 0.97541 —
df 30 —
p-value < .00001 —
N 32 —
───────────────────────────────────────────────────────────
In Figure 13e, the test statistic is found to be t = -1.8396 or t = -1.84 using either equation 1 or equation 2 with descriptive statistics shown in Figure 13f. Later in this chapter, you will be shown how to arrive at the dependent t test directly.
jmv::ttestPS(
data = data,
pairs = list(
list(
i1="IQ_Pretest",
i2="IQ_Posttest")),
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = TRUE,
desc = TRUE)
PAIRED SAMPLES T-TEST
Paired Samples T-Test
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
statistic df p Mean difference SE difference Lower Upper Effect Size
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ_Pretest IQ_Posttest Student's t -1.8396 31.000 0.07541 -1.2500 0.67948 -2.6358 0.13581 Cohen's d -0.32521
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> ≠ 0
Normality Test (Shapiro-Wilk)
────────────────────────────────────────────────────────
W p
────────────────────────────────────────────────────────
IQ_Pretest - IQ_Posttest 0.95702 0.22725
────────────────────────────────────────────────────────
Note. A low p-value suggests a violation of the
assumption of normality
Descriptives
─────────────────────────────────────────────────────────────
N Mean Median SD SE
─────────────────────────────────────────────────────────────
IQ_Pretest 32 92.031 93.500 16.532 2.9226
IQ_Posttest 32 93.281 97.000 14.902 2.6344
─────────────────────────────────────────────────────────────
jmv::descriptives(
data = data,
vars = vars(IQ_Pretest, IQ_Posttest, Difference),
hist = TRUE,
boxLabelOutliers = FALSE,
qq = TRUE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
se = TRUE,
ci = TRUE,
skew = TRUE,
kurt = TRUE,
sw = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────────────────────────────────────────
IQ_Pretest IQ_Posttest Difference
──────────────────────────────────────────────────────────────────────
N 32 32 32
Missing 0 0 0
Mean 92.031 93.281 -1.2500
Std. error mean 2.9226 2.6344 0.67948
95% CI mean lower bound 86.071 87.908 -2.6358
95% CI mean upper bound 97.992 98.654 0.13581
Skewness -0.20762 -0.42112 -0.57441
Std. error skewness 0.41446 0.41446 0.41446
Kurtosis -0.67077 -0.43464 0.74812
Std. error kurtosis 0.80937 0.80937 0.80937
Shapiro-Wilk W 0.97668 0.96954 0.95702
Shapiro-Wilk p 0.69899 0.48691 0.22725
──────────────────────────────────────────────────────────────────────
Note. The CI of the mean assumes sample means follow a
t-distribution with N - 1 degrees of freedom
The decision was made to fail to reject the null hypothesis because the absolute value of the test statistic was less than the absolute value of the critical values; |-1.83964| < 2.0395. Our t statistic was negative, so we compare the absolute value (i.e., 1.83964 to 2.0395. Alternatively we could compare the -1.83964 to the lower critical value of -2.0395. The two-tailed p value of .07541 was larger than the \(\alpha = .05\) level of significance (and therefore not statistically significant). Importantly, Mary cannot conclude that the difference in the population is exactly zero (i.e., there is no difference before and after training in the population), but she has no evidence to reject the null hypothesis and therefore must not reject it.
The two-tailed \((1 – α)\%\) confidence interval may be found using the equation:
\[ \begin{equation} M_D - t_{(1-\alpha⁄2,v)} \frac{s_D} {\sqrt{n}} < \mu_D < M_D + t_{(1-\alpha⁄2,v)} \frac{s_D} {\sqrt{n}} \tag{13-9} \end{equation} \]
Here, the \((1 – .05)\%\) or 95% confidence interval can be found using \(M_D = -1.25\), \(s_D = 3.8437\), \(t_{(.025,30)} = -2.042\), \(t_{(.975,30)} = 2.042\), \(n = 32\), and \(v = n – 1 = 31\) (but df = 30 was used here because we were using a table that did not have 31 degrees of freedom included) to be:
\[ \begin{equation} -1.25 - 2.402(3.8437⁄\sqrt{32}) < \mu _D < -1.25 + 2.402(3.8437⁄\sqrt{32}) \end{equation} \]
or \(-2.6358 < \mu_D < 0.1358\) or approximately \((-2.64, 0.14)\).
Based on the confidence interval, Mary was relatively confident that the true population mean difference fell between -2.64 and 0.14, and therefore, that the difference between the intelligence means from one test to another was not statistically significantly different from zero. Remember that when a two-tailed test indicates that the null hypothesis should not be rejected, the constant from the null hypothesis, here zero, will be between the limits of the confidence interval.
We have calculated the paired mean difference to be 1.25 (as an absolute value) and the standard deviation of the paired mean difference to be 3.8437. Therefore the standardized mean difference for the sample is \(d = 1.25 / 3.8437 = 0.325\), which is a relatively small mean difference (about one-third of a standard deviation). But more importantly, because we could not reject that the mean difference is 0, we cannot estimate what the population standardized effect size (i.e., Cohen’s d) is anything other than 0.
In this situation, Mary Barth felt that the intelligence level at the United Nations could be improved if the delegates were taught how to take intelligence tests. Therefore, she planned to randomly select n delegations, and then to randomly select two delegates from each of these delegations. She would then randomly place one delegate from each delegation into a treatment where the delegates were taught how to take intelligence tests, and the other delegate from each delegation into a no-treatment control group. Note that this is not a strong matching design—it is for illustrative purposes only.
Will United Nations delegates who are taught how to take intelligence tests score higher on the World Adult Intelligence Scale than delegates who have not received such training?
United Nations delegates who are taught how to take intelligence tests will score higher on the World Adult Intelligence Scale than delegates who do not receive such training.
Here group 1 will be the no-training group, and group 2 will be the training group. Therefore, since the research hypothesis predicts that \(\mu_1\) will be less than \(\mu_2\), Mary expected the mean difference \(\mu_D = \mu_1 - \mu_2\) to be negative. Then, the null and alternative hypotheses are:
\[ \begin{align} H_0&: \mu_D = 0 \\ H_A&: \mu_D < 0 \end{align} \]
Mary decided that she wanted to detect a mean difference of one-half of a standard deviation. Using equation (13-6), her a priori effect size was found to be \(d_d\) = 0.50. Using equation (13-8), Mary found that she needed 29 subject to have power of .85. Therefore, her actual sample size of 32 delegates, shown in Table 13c, gave her a power greater than .85.
The critical value for the one-tailed test with \(\alpha = .05\) is \(t = -1.6955\). The test statistic, shown in Figure 13f, was calculated to be \(t = -4.57514\) using either equation 1 or 2. This means Mary can reject the null hypothesis. A difference existed between those who received training and those who did not (\(p = 7.23061\times 10^{-5}\)). On the basis of her 95% confidence interval, Mary concluded that the true population difference, \(\mu_1 - \mu_2\), fell below 0.
jmv::ttestPS(
data = data,
pairs = list(
list(
i1="IQ",
i2="Match_IQ")),
hypothesis = "twoGreater",
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = TRUE,
desc = TRUE)
PAIRED SAMPLES T-TEST
Paired Samples T-Test
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
statistic df p Mean difference SE difference Lower Upper Effect Size
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ Match_IQ Student's t -4.5751 31.000 0.00004 -7.7812 1.7008 -Inf -4.8976 Cohen's d -0.80878
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> < 0
Normality Test (Shapiro-Wilk)
─────────────────────────────────────────────
W p
─────────────────────────────────────────────
IQ - Match_IQ 0.95653 0.22029
─────────────────────────────────────────────
Note. A low p-value suggests a
violation of the assumption of
normality
Descriptives
───────────────────────────────────────────────────────────
N Mean Median SD SE
───────────────────────────────────────────────────────────
IQ 32 92.031 93.500 16.532 2.9226
Match_IQ 32 99.812 102.000 18.252 3.2265
───────────────────────────────────────────────────────────
In considering the test of the null hypothesis that the difference between two related means is equal to a constant (that is, \(\mu_1 – \mu_2 = \mu_D = c_0\)), we will consider two cases: Case 1, where the constant, \(c_0\), in the null hypothesis is equal to zero, and Case 2, where the constant, \(c_0\), in the null hypothesis is not equal to zero. In the following presentations, you are shown how to deal with both of these cases.
We will consider this case first, since it is the one that is usually of interest to most researchers. Enter the data in Table 13c in a dataset. There are actually two related ways to complete this analysis. The first is to perform the paired-samples t test (see Figure 13g). The second is to perform a one-sample t test on the difference scores using zero as the test value (see Figure 13h).
jmv::ttestOneS(
data = data,
vars = Difference,
testValue = 0,
hypothesis = "lt",
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = TRUE,
desc = TRUE)
ONE SAMPLE T-TEST
One Sample T-Test
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Lower Upper Effect Size
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Difference Student's t -4.5751 31.000 0.00004 -7.7812 -Inf -4.8976 Cohen's d -0.80878
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ < 0
Normality Test (Shapiro-Wilk)
────────────────────────────────────
W p
────────────────────────────────────
Difference 0.95653 0.22029
────────────────────────────────────
Note. A low p-value suggests a
violation of the assumption of
normality
Descriptives
──────────────────────────────────────────────────────────────
N Mean Median SD SE
──────────────────────────────────────────────────────────────
Difference 32 -7.7812 -9.0000 9.6210 1.7008
──────────────────────────────────────────────────────────────
Since we have not previously analyzed data that would meet this case, we will reconsider the data in Table 3b for the step-by-step illustration. Note that this is for instructional convenience—it would be inappropriate to analyze the same data with two different null hypotheses in a research project. Here, the statistical hypotheses will be:
\[ \begin{align} H_0&: \mu_D = -10 \\ H_A&: \mu_D \ne -10 \end{align} \]
The researcher expected the U.N. delegates with training to have a mean IQ score (\(\mu_2\)) that is 10 points higher than the mean IQ score (\(\mu_1\)) of the U.N. delegates with no training. Note that instead of having a negative mean difference, you could make treatment 1 “with training” and treatment 2 “without training” so that the mean difference would be positive. The order of the treatments effects the sign of the resultant t statistic and critical value (if you don’t use absolute values), but is arbitrary and therefore has no effect on the conclusions. In order to perform this test we can calculate difference scores and then use the non-zero (non-nil) Null Hypothesis value as the test value in the one-sample t test (see Figure 13i1). The other way is to perform the paired t test and use the confidence interval for the mean difference for our decision making (see Figure 13i2). Both Figures below show that we fail to reject the difference being equal to -10 in the population (ignore the p value in Figure 13i2).
jmv::ttestOneS(
data = data,
vars = Difference,
testValue = -10,
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = TRUE,
desc = TRUE)
ONE SAMPLE T-TEST
One Sample T-Test
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Lower Upper Effect Size
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Difference Student's t 1.3046 31.000 0.20165 2.2188 -1.2500 5.6875 Cohen's d 0.23062
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ ≠ -10
Normality Test (Shapiro-Wilk)
────────────────────────────────────
W p
────────────────────────────────────
Difference 0.95653 0.22029
────────────────────────────────────
Note. A low p-value suggests a
violation of the assumption of
normality
Descriptives
──────────────────────────────────────────────────────────────
N Mean Median SD SE
──────────────────────────────────────────────────────────────
Difference 32 -7.7812 -9.0000 9.6210 1.7008
──────────────────────────────────────────────────────────────
jmv::ttestPS(
data = data,
pairs = list(
list(
i1="IQ",
i2="Match_IQ")),
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = TRUE,
desc = TRUE)
PAIRED SAMPLES T-TEST
Paired Samples T-Test
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
statistic df p Mean difference SE difference Lower Upper Effect Size
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ Match_IQ Student's t -4.5751 31.000 0.00007 -7.7812 1.7008 -11.250 -4.3125 Cohen's d -0.80878
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> ≠ 0
Normality Test (Shapiro-Wilk)
─────────────────────────────────────────────
W p
─────────────────────────────────────────────
IQ - Match_IQ 0.95653 0.22029
─────────────────────────────────────────────
Note. A low p-value suggests a
violation of the assumption of
normality
Descriptives
───────────────────────────────────────────────────────────
N Mean Median SD SE
───────────────────────────────────────────────────────────
IQ 32 92.031 93.500 16.532 2.9226
Match_IQ 32 99.812 102.000 18.252 3.2265
───────────────────────────────────────────────────────────
In this section we will discuss the conditions under which you would use either the z statistic or Student’s t test to test the null hypothesis that the population Pearson product-moment correlation is equal to a constant. In this regard we will consider two situations. Situation #1 will be the case where the constant in the null hypothesis is not equal to zero (i.e., \(c_0 \ne 0\)) and situation #2 will be the case where this constant is equal to zero (i.e., \(c_0 = 0\)).
The latter two situations are considered separately in this chapter because the sampling distribution of the correlation coefficient is negatively skewed when \(\rho > 0\) and is positively skewed when \(\rho < 0\), but follows a t distribution when \(\rho = 0\). Therefore, in situation #1, where the correlation is not equal to zero, we will consider the use of a transformation to normalize the sampling distribution of the correlation coefficient. In situation #2, where the correlation is equal to zero, no such transformation will be necessary. However, for both situations you will have the same state of affairs, assumptions, and considerations of violations of these assumptions; therefore, these common elements are considered next.
The state of affairs that must exist before you can consider the following situations is: You are interested in the value of a population Pearson product-moment correlation coefficient.
The statistical tests, which will be described for each situation, will be valid when:
The importance of the preceding assumptions is as follows:
When the population Pearson product-moment correlation coefficient is not equal to zero the sampling distribution of the correlation coefficient is skewed. For example, if the population correlation is greater than zero (e.g., .80), then most of the sample values will be close to this positive correlation with only a very few found as being negative. This will lead to a negatively skewed sampling distribution of the correlations such as that shown in Figure 13j. When the correlation is negative, the opposite situation occurs. That is, when the population correlation is negative (e.g., -.80), most of the sample correlations will be negative and there will be very few positive correlations. This will lead to a positively skewed sampling distribution of the correlations such as that shown in Figure 13k.
Since there is a different sampling distribution for every correlation coefficient, and every sample size, a large number of tables containing critical values would be necessary to test hypotheses concerning nonzero correlations. Fortunately, R.A. Fisher ( 1921) introduced a transformation which when used on each sample correlation in a sampling distribution causes the resulting sampling distribution of transformed correlations to be normally distributed with standard deviation:
\[ \begin{equation} s_{z_F} = \frac{1} {\sqrt{n-3}} \tag{13-10} \end{equation} \]
That is, the standard error of the transformed sample correlations is simply a function of n, the number of pairs of units.
Fisher’s z-transformation is:
\[ \begin{equation} z_F = \frac{1}{2} \ln \left(\frac{1+|r|} {1-|r|} \right) \tag{13-11} \end{equation} \]
where, \(z_F\) is the resultant transformed correlation (F for Fisher); r is a sample correlation coefficient, and In is the natural log to the base e (i.e., loge or ln). The ln function is available through R’s Transform/create menu and on many calculators. However, you may easily transform a given correlation by using Table 13d. For example, a sample correlation of .450 is found in Table 13d as Fisher’s \(z_F\) of .485.
|
|
|
|
|
Since \(z_F\) follows a normal distribution with standard deviation equal to \(1⁄\sqrt{n-3}\), critical values for \(z_F\) can be found using the standard normal distribution, that is, z scores from table A.1(b). Therefore, the test statistic is written as:
\[ \begin{equation} z = \frac{z_{F_1} - z_{F_2}}{s_{z_F}} \tag{13-12} \end{equation} \]
Here, \(z_{F_1}\) is Fisher’s \(z_F\) for the sample correlation coefficient, \(z_{F_0}\) is Fisher’s \(z_F\) for the constant correlation in the null hypothesis, and \(s_{z_F}\) is the standard deviation, defined in Equation (13-10), of the transformed correlation. This test statistic is illustrated in the following example.
In this example we will consider the situation where Mary Barth would like to find out if the correlation between the intelligence scores of the U.N. delegates, based on test-retest scores, is significantly different from .95. Since this correlation is a measure of the reliability of the intelligence test, Mary hopes to find that the correlation is greater than or equal to .95. In this example we will use the data given in Table 13b. (Remember that when we previously considered the data in Table 13b our focus was on the mean difference, here we have a different scenario with a focus on the Pearson product-moment correlation coefficient.)
The research problem that we will consider in this situation is: Is the reliability (test-retest) of the World Adult Intelligence Scale equal to .95 when used with the delegates to the United Nations (i.e., is \(\rho = .95\))?
The reliability (test-retest) of the World Adult Intelligence Scale is equal to .95 when used with the delegates to the United Nations (i.e., \(\rho = .95\)).
\[ \begin{align} H_0&: \rho = .95 \\ H_A&: \rho \ne .95 \end{align} \]
Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed chapter 11.
The probability of rejecting a true null hypothesis (the probability of making a Type I error) was set at .05.
Mary Barth decided she would like her power to be at least .85; that is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.
For the null hypothesis that a correlation is equal to a nonzero constant the effect size is measured as the absolute difference between Fisher’s transformation of the correlation that is important to detect, denoted by \(z_{F_A}\), and Fisher’s transformation of the correlation in the null hypothesis, \(z_{F_0}\). That is, the effect size, \(d_r\) (“r” for the correlation coefficient), is found as:
\[ \begin{equation} d_r = | z_{F_A} - z_{F_0}| \tag{13-13} \end{equation} \]
In considering the reliability of her World Adult Intelligence Scale, Mary Barth decided that the minimum reliability that would be acceptable to here was .85. That is, if the reliability of her World Adult Intelligence Scale was not at least as large as .85, then she would like to reject the null hypothesis. In this example differences that are larger than .95 where not of concern to Mary, so she based her a priori effect size on r = .85. (Remember that the minimally acceptable reliability, here of .85, is decided on based on past research, theory and the genius of the researcher.)
In this example Table 13d was used to transform the population correlation from .95 to 1.832 and the minimally acceptable population correlation from .85 to 1.256 (i.e., \(z_{F_0}\) = 1.832, and \(z_{F_A}\) = 1.256). Therefore, the a priori effect size was found as:
\[ \begin{equation} d_r = |11.832-1.2561| = 0.576 \end{equation} \]
NOTE: In considering a two-tailed test where a researcher is interested in detecting specific correlational values both above and below the population value, two different a priori effect sizes may be found. If this occurs you should select the smallest a priori effect size as the one that you would like to detect. This is because if you can detect a small effect size with high power, you will also be able to detect the larger effect size – see exercise 13.13.
In an exploratory analysis Cohen’s “small” \(d_r\) = .14, “medium” \(d_r\) = .42, and “large” \(d_r\) = . 71, effect sizes may be chosen. (See exercise 13.12.)
Given the preceding two-tailed alternative hypothesis, level of significance, power and a priori effect size, sample size is found as:
\[ \begin{equation} n = \left[ \frac{z_{(1-α_P)} + z_{(1-β)}} {|z_{F_1} - z_{F_2} |)}\right]^2 + 3 \tag{13-14} \end{equation} \]
Here, \(z_{(1 – α_P)}\) and \(z_{(1 – β)}\) are defined as for Equation 13-7). Therefore, Mary found her sample size as:
\[ \begin{equation} n = \left[ \frac{1.96 + 1.04} {|0.576|)}\right]^2+3 = 30.13 \approx 31 \end{equation} \]
Based on this result, Mary sampled 32 subjects so that the power of her statistical test would be slightly more than .85.
In an exploratory analysis sample size would be found by substituting Cohen’s small, medium, and large effect sizes into equation (13-14). Since this process is straightforward, no sample size tables are provided for this case.
The critical z values for the two-tailed test with \(\alpha = .05\) are found from the tables as \(z_{(.025)}\) = -1.96 and \(z_{(.975)}\) = +1.96.
Box 13.2
At this point the a priori parameters
of hypothesis testing have been
established (i.e., your "bet" is made).
The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13b. Checks for outliers and assumptions, and computations of descriptive statistics are considered in exercise 13.2.
In Figure 13l, the test statistic is found to be 1.94 using equation (13-12).
The decision was made to fail to reject the null hypothesis because the absolute value of the test statistic was less than the absolute value of the critical values (1.94 < 1.96). Using a probability calculator (e.g., JAMOVI’s distrACTION module), we find the two-tailed p-value to be .0524 and therefore larger than the .05 level of significance (and therefore, not statistically significant).
The two-tailed 100(1 – α)% = 95% confidence interval may be found using the following two steps:
Equation 13-15 \[ \begin{equation} z_{F_1} - z_{(1-α⁄2)} \left(1/\sqrt{n-3}\right) < z_{F_P} < z_{F_1} + z_{(1-α⁄2)} \left(1/\sqrt{n-3}\right) \tag{13-15} \end{equation} \]
Here we have that the sample correlation of .97541 (see Figure 13l) has a Fisher’s \(z_F\), labeled \(z_{F_1}\), of 2.1931, \((1 – \alpha/2)\) = 1 – .05/2 = .975, \(z_{(.975)}\) = 1.96, and n = 32. Therefore, the 95% confidence interval is:
\[ 2.1931 - 1.96(0.1857) < z_{F_P} < 2.1931 + 1.96(0.1857) \] or
\(1.8291 < z_{F_P} < 2.5571\) or (1.829, 2.557).
\[ \begin{equation} |r| = \frac{e^{2z}-1} {e^{2z}+1} \tag{13-16} \end{equation} \]
where z is the value of Fisher’s \(z_F\) which is to be transformed to r. The exponential function “e” is available on many calculators either directly or as the inverse function of \(\ln\). On some calculators the function \(y^X\) can be used where \(y = e = 2.718281828\) and \(x = 2 * z\).
For our example the limits of the 95% confidence interval may be transformed back into correlational values using equation (13-16) as:
\[ |r| = \frac{e^{2(1.8291)}-1} {e^{2(1.8291)}+1} = \frac{38.7915-1} {38.7915+1} = .9497 \]
\[ |r| = \frac{e^{2(2.5571)}-1} {e^{2(2.5571)}+1} = \frac{166.3676-1} {166.3676+1} = .9881 \]
Therefore, the 95% confidence interval for the population correlation coefficient is \(.9497 < \rho < .9881\) or (.9497, .9881), which contains the hypothesized value of .95.
It should be noted that many statistical programs will now provide “bootstrapped” confidence intervals, which do not assume normality like these confidence intervals. Bootstrapping randomly draws many (e.g., 10,000) samples of the same size as your original sample from the cases in your sample, but always with replacement. That is, each case in your sample can be chosen multiple times, which allows for a very large number of possible bootstrapped samples to be drawn. Then, the samples drawn are used to create an empirical sampling distribution from which the 95% confidence interval is determined. This approach allows the sample you have to represent the population, including the shape of the distribution.
The confidence limits found using equation (13-16) can be checked by comparing them with those found using Table 13d Using interpolation, Table 13d yields a reasonably accurate confidence interval of: \(.945 < \rho < .990\).
Mary was confident that the true population correlation fell between .9497 and .9881, and therefore, that her World Adult Intelligence Scale was sufficiently reliable to meet her needs.
In correlation, the effect used is generally the correlation coefficient itself. However, some researchers prefer to report the r2 value because it has a more absolute interpretation (i.e., the amount of shared variation). Here, the effect would be reported as \(r = .975\) or \(r^2 = .951\).
To illustrate the situation where you will test the null hypothesis that the Pearson product-moment correlation is equal to zero, the data in Table 13c will be reconsidered, but the scenario will change from an interest in the mean difference to an interest in the correlation between the intelligence scores. That is, our interest will focus on the correlation of the intelligence scores between the paired delegates from the same delegation. The method of sampling is the same as that described for Situation 2 of the dependent t test, but no treatments will be considered to have been administered.
Mary Barth has randomly selected thirty-two delegations, and then randomly selected two delegates from each delegation. Mary expected that the correlation between the paired delegates’ scores would be positive, since the delegates came from the same delegation; since past research in this area was unavailable, however, she was unsure of what the correlation would be, and so she decided to use a two-tailed alternative hypothesis. (Although a one-tailed test might be more reasonable for this example, the two-tailed test is illustrated here because it is the one most commonly used in the research literature.)
Is there a correlation between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations?
The correlation between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations will differ from zero.
\[ \begin{align} H_0&: \rho = 0 \\ H_A&: \rho \ne 0 \end{align} \]
The validity and reliability of the measures of the dependent and independent variables where discussed in the previous section. The level of significance was set at .05 and power was set at .85.
In exploratory studies, Cohen (1977) recommends that a priori effect sizes be considered in terms of the correlation coefficient. He indicates that a small effect size is considered as \(\rho = \pm .10\); a medium effect size as \(\rho = \pm .30\), and a large effect size as \(\rho = \pm .50\). In confirmatory studies, values of \(\rho\) would be determined based on past research, theory, or the genius of the researcher.
In Mary’s study, she felt that a large effect size was possible; therefore, she set \(\rho = \pm .50\). This meant that if the correlation differed from zero by as much as .50 or more, she would like to be able to detect this difference with high probability.
In table C.3, for \(\alpha_2 = .05\), \(\text{power} = .85\), and \(\rho = .50\), Mary found she needed 32 pairs of subjects. Mary’s value of \(\rho = .50\) was found in table C.3; however, equation (13-15) may be used for values of p which are not in this table.
jmv::corrMatrix(
data = data,
vars = vars(IQ_Pretest, IQ_Posttest),
flag = TRUE,
n = TRUE,
ci = TRUE,
plots = TRUE,
plotDens = TRUE,
plotStats = TRUE)
CORRELATION MATRIX
Correlation Matrix
────────────────────────────────────────────────────────────
IQ_Pretest IQ_Posttest
────────────────────────────────────────────────────────────
IQ_Pretest Pearson's r —
df —
p-value —
95% CI Upper —
95% CI Lower —
N —
IQ_Posttest Pearson's r 0.97541 —
df 30 —
p-value < .00001 —
95% CI Upper 0.98805 —
95% CI Lower 0.94974 —
N 32 —
────────────────────────────────────────────────────────────
Note. * p < .05, ** p < .01, *** p < .001
Note that the p value (Sig. 2-tailed) reported is for the Null Hypothesis that \(\rho = 0\), not \(\rho = .95\) as we wish to test. However, we need the correlation in order to perform the text using Fisher z as described above, so we ignore the p value provided.
Given Situation 2, the equation used to calculate the t statistic is:
\[ \begin{equation} t = \frac{r_{12}^2} {\sqrt{(1 - r_{12}^2)⁄(n-2)}} \tag{13-17} \end{equation} \]
When the null hypothesis is true, the sampling distribution of this statistic follows a t distribution with n-2 degrees of freedom, where n is the number of pairs of measurements.
In table A2, the critical values of the t statistic for the two-tailed test with \(\alpha = .05\) and \(\text{df} = 30\) are \(t_{(.025,30)} = -2.042\) and \(t_{(.975,30)} = +2.042\).
Box 13.3
At this point the a priori parameters
of hypothesis testing have been
established (i.e., your "bet" is made).
The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13c. (Note that the differences, shown in column 6, are not necessary for this problem.) Checks for outliers and of assumptions, and computations of descriptive statistics are considered in exercise 13.2.
In Figure 13m, the statistical significance of the test statistic is shown to be p < .001, but R does not provide the t statistic in its default output. The t statistics can be found to be 8.895 or 8.90 using equation (13-17).
jmv::corrMatrix(
data = data,
vars = vars(IQ, Match_IQ),
flag = TRUE,
n = TRUE,
ci = TRUE,
plots = TRUE,
plotDens = TRUE,
plotStats = TRUE)
CORRELATION MATRIX
Correlation Matrix
────────────────────────────────────────────────────
IQ Match_IQ
────────────────────────────────────────────────────
IQ Pearson's r —
df —
p-value —
95% CI Upper —
95% CI Lower —
N —
Match_IQ Pearson's r 0.85152 —
df 30 —
p-value < .00001 —
95% CI Upper 0.92543 —
95% CI Lower 0.71517 —
N 32 —
────────────────────────────────────────────────────
Note. * p < .05, ** p < .01, *** p < .001
The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values (8.90 > 2.042). This same decision was reached using output (Figure 13m), where the two-tailed p-value is less than .001 and therefore was less than the .05 level of significance.
Here, since the sample correlation is usually not going to be exactly equal to zero, the confidence interval is established using Fisher’s \(z_F\) transformation just as before. Therefore, the 100(1 – α)% confidence interval is found using the following two steps. (Even if r = 0, the following steps will yield the correct confidence interval.)
\[ 1.2617 - 1.96(0.1857) < z_{F_P} < 1.2617 + 1.96(0.1857) \]
or 0.8977 < \(z_{F_P}\) < 1.6257 or (0.8977, 1.6257).
\[ |r| = \frac{e^{2(0.8977)}-1} {e^{2(0.8977)}+1} = \frac{6.0219-1} {6.0219+1} = .7152 \] \[ |r| = \frac{e^{2(1.6257)}-1} {e^{2(1.6257)}+1} = \frac{25.8782-1} {25.8782+1} = .9256 \]
Therefore, the 95% confidence interval for the population correlation coefficient is: \(.7152 < \rho < .9256\) or (.7152, .9256).
The latter confidence limits can be checked by comparing them with those found in Table 13d. Our \(z_F\) value of .8977 is close to the tabled value of .897, which would yield an r of .715; using equation (13-16) we found . 7152. Also, our z value of 1.6257 falls between the tabled \(z_F\) values of 1.623 with an r of .925 and 1.658 with an r of .930; our calculated value of r = .9256 falls between these two values. You can see that by only using Table 13d (not using Equation 13-16), a reasonably accurate confidence interval of \(.715 < \rho < .925\) would be found (we would choose .925 because it is the smaller, more conservative choice).
Mary was confident that the true population correlation fell between .7152 and .9256, and therefore, that a statistically significant nonzero correlation existed between the intelligence scores of delegates from the same delegations.
The correlation effect size that should be reported is r = .852, or \(r^2\) = .726.
To illustrate the situation where you will test the null hypothesis that the Pearson product-moment correlation is equal to zero, the data in Table 13c will be reconsidered, but the scenario will change from an interest in the mean difference to an interest in the correlation between the intelligence scores. That is, our interest will focus on the correlation of the intelligence scores between the paired delegates from the same delegation. The method of sampling is the same as that described for Situation 2 of the dependent t test, but no treatments will be considered to have been administered.
Mary Barth has randomly selected thirty-two delegations, and then randomly selected two delegates from each delegation. Mary expected that the correlation between the paired delegates’ scores would be positive, since the delegates came from the same delegation; since past research in this area was unavailable, however, she was unsure of what the correlation would be, and so she decided to use a two-tailed alternative hypothesis. (Although a one-tailed test might be more reasonable for this example, the two-tailed test is illustrated here because it is the one most commonly used in the research literature.)
Is there a predictive relationship between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations?
The regression slope between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations will differ from zero.
\[ \begin{align} H_0&: β = 0 \\ H_A&: β \ne 0 \end{align} \]
Note that the Greek letter β here is different from the standardized regression coefficient, Beta, provided by the output in some statistical programs. This Greek β in the hypotheses represents the population regression coefficients.
The validity and reliability of the measures of the dependent and independent variables where discussed in the previous section. The level of significance was set at .05 and power was set at .85.
Effect size in bivariate regression (i.e., one predictor and one dependent variable) is probably best thought of in terms of correlation. Bivariate regression and bivariate correlation are essentially the same analyses with a slightly different focus. Recall that the standardized regression slope, Beta, is equal to the correlation between the variables. Therefore, for a bivariate regression, effect size can be determined in the same way as it was for correlation (this will not be true for multiple regression, but is appropriate when there is just one predictor).
In Mary’s study, she felt that a large effect size was possible; therefore, she set \(\rho = \pm .50\). This meant that if the correlation differed from zero by as much as .50 or more, she would like to be able to detect this difference with high probability.
There will be other ways to determine sample size when there are multiple predictors, but with just one predictor, we can use the same approach as we did with correlation. In table C.3, for \(\alpha_2 = .05\), \(\text{power} = .85\), and \(\rho = .50\), Mary found she needed 32 pairs of subjects. Mary’s value of \(\rho = .50\) was found in table C.3; however, equation (13-15) may be used for values of p which are not in this table.
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | |
---|---|---|---|---|---|---|---|---|
0.5 | 662 | 165 | 72 | 40 | 25 | 17 | 12 | 9 |
0.6 | 798 | 198 | 87 | 48 | 30 | 20 | 14 | 10 |
0.65 | 874 | 216 | 95 | 52 | 32 | 21 | 15 | 11 |
0.7 | 958 | 237 | 103 | 57 | 35 | 23 | 16 | 11 |
0.75 | 1052 | 260 | 113 | 62 | 38 | 25 | 17 | 12 |
0.8 | 1163 | 287 | 125 | 68 | 42 | 27 | 19 | 13 |
0.85 | 1299 | 320 | 139 | 76 | 46 | 30 | 21 | 14 |
0.9 | 1481 | 365 | 158 | 86 | 52 | 34 | 23 | 16 |
0.95 | 1772 | 436 | 189 | 102 | 62 | 40 | 27 | 18 |
0.99 | 2390 | 588 | 254 | 137 | 83 | 53 | 35 | 23 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | |
---|---|---|---|---|---|---|---|---|
0.5 | 540 | 135 | 59 | 33 | 21 | 14 | 10 | 8 |
0.6 | 664 | 165 | 72 | 40 | 25 | 17 | 12 | 9 |
0.65 | 733 | 182 | 80 | 44 | 27 | 18 | 13 | 9 |
0.7 | 810 | 201 | 88 | 48 | 30 | 20 | 14 | 10 |
0.75 | 897 | 222 | 97 | 53 | 33 | 22 | 15 | 11 |
0.8 | 1000 | 247 | 108 | 59 | 36 | 24 | 16 | 11 |
0.85 | 1126 | 278 | 121 | 66 | 40 | 26 | 18 | 12 |
0.9 | 1296 | 319 | 139 | 75 | 46 | 30 | 20 | 14 |
0.95 | 1569 | 386 | 167 | 91 | 55 | 36 | 24 | 16 |
0.99 | 2153 | 529 | 229 | 123 | 75 | 48 | 32 | 21 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | |
---|---|---|---|---|---|---|---|---|
0.5 | 384 | 96 | 43 | 24 | 16 | 11 | 8 | 6 |
0.6 | 489 | 122 | 54 | 30 | 19 | 13 | 9 | 7 |
0.65 | 549 | 136 | 60 | 33 | 21 | 14 | 10 | 8 |
0.7 | 616 | 153 | 67 | 37 | 23 | 16 | 11 | 8 |
0.75 | 692 | 171 | 75 | 41 | 26 | 17 | 12 | 9 |
0.8 | 782 | 194 | 85 | 46 | 29 | 19 | 13 | 9 |
0.85 | 894 | 221 | 96 | 53 | 32 | 21 | 15 | 10 |
0.9 | 1046 | 258 | 112 | 61 | 38 | 25 | 17 | 12 |
0.95 | 1293 | 319 | 138 | 75 | 46 | 30 | 20 | 14 |
0.99 | 1828 | 450 | 194 | 105 | 64 | 41 | 27 | 18 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | |
---|---|---|---|---|---|---|---|---|
0.5 | 271 | 68 | 31 | 18 | 12 | 8 | 6 | 5 |
0.6 | 360 | 90 | 40 | 23 | 15 | 10 | 8 | 6 |
0.65 | 412 | 103 | 46 | 26 | 16 | 11 | 8 | 6 |
0.7 | 470 | 117 | 52 | 29 | 18 | 12 | 9 | 7 |
0.75 | 537 | 133 | 59 | 33 | 20 | 14 | 10 | 7 |
0.8 | 617 | 153 | 67 | 37 | 23 | 16 | 11 | 8 |
0.85 | 717 | 177 | 78 | 43 | 26 | 18 | 12 | 9 |
0.9 | 853 | 211 | 92 | 50 | 31 | 21 | 14 | 10 |
0.95 | 1077 | 266 | 115 | 63 | 38 | 25 | 17 | 12 |
0.99 | 1569 | 386 | 167 | 90 | 55 | 35 | 24 | 16 |
In bivariate regression, the equation used to calculate the t statistic is:
\[ \begin{equation} t = \frac{b - c_0} {s_b} \tag{13-18} \end{equation} \] where b is the regression coefficient that we are testing against some value, \(c_0\), which is almost always zero (and therefore the t statistic usually simplifies to \(t = b/sb\)). The value \(s_b\) is the standard error for the regression coefficient. We will obtain sb from output, but it can be calculated as a function of the standard error of the estimate.
When the null hypothesis is true in bivariate regression, the sampling distribution of this statistic follows a t distribution with n – 2 degrees of freedom, where n is the number of pairs of measurements. You may recall that n – 2 is the degrees of freedom for the residual in the regression model.
In table A2, the critical values of the t statistic for the two-tailed test with alpha = .05 and degrees of freedom= 30 are \(t_{(.025,30)} = -2.042\) and \(t_{(.975,30)} = +2.042\).
Box 13.4
At this point the a priori parameters
of hypothesis testing have been
established (i.e., your "bet" is made).
The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13c. (Note that the differences are not necessary for this problem.) Checks for outliers and of assumptions, and computations of descriptive statistics are considered in exercises.
In Figure 13m, the test statistic is found to be 8.895 or 8.90 using statistics from the regression output. The same result is found using equation (13-17).
jmv::linReg(
data = data,
dep = Match_IQ,
covs = IQ,
blocks = list(
list(
"IQ")),
refLevels = list(),
r2Adj = TRUE,
rmse = TRUE,
modelTest = TRUE,
anova = TRUE,
ci = TRUE,
stdEst = TRUE)
LINEAR REGRESSION
Model Fit Measures
────────────────────────────────────────────────────────────────────────────────────────────
Model R R² Adjusted R² RMSE F df1 df2 p
────────────────────────────────────────────────────────────────────────────────────────────
1 0.85152 0.72508 0.71592 9.4191 79.124 1 30 < .00001
────────────────────────────────────────────────────────────────────────────────────────────
Note. Models estimated using sample size of N=32
MODEL SPECIFIC RESULTS
MODEL 1
Omnibus ANOVA Test
──────────────────────────────────────────────────────────────────────────
Sum of Squares df Mean Square F p
──────────────────────────────────────────────────────────────────────────
IQ 7487.8 1 7487.837 79.124 < .00001
Residuals 2839.0 30 94.635
──────────────────────────────────────────────────────────────────────────
Note. Type 3 sum of squares
Model Coefficients - Match_IQ
────────────────────────────────────────────────────────────────────────────────────────────────────
Predictor Estimate SE Lower Upper t p Stand. Estimate
────────────────────────────────────────────────────────────────────────────────────────────────────
Intercept 13.29664 9.87704 -6.87497 33.4683 1.3462 0.18832
IQ 0.94007 0.10568 0.72424 1.1559 8.8951 < .00001 0.85152
────────────────────────────────────────────────────────────────────────────────────────────────────
The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values (8.90 > 2.042). This same decision was reached using p values, where the two-tailed p-value was less than .001 and therefore was less than the .05 level of significance.
The 100(1 – α)% confidence interval is found using the following steps. The confidence interval for all regression coefficients can be calculated this way.
\[ \begin{equation} b - t_{(1-α⁄2)}(s_b) < β < b + t_{(1-α⁄2)}(s_b) \tag{13-19} \end{equation} \]
where b is the sample regression coefficient (i.e., estimate of the population coefficient) and \(s_b\) is the standard error of the sample coefficient estimate.
We can find the regression coefficients from Figure 13n above:
\[ \begin{align} b_0 &= 13.297 \\ b_1 &= 0.940 \end{align} \]
where \(b_0\) is the y-intercept and \(b_1\) is the slope for the predictor IQ. Recall that the slope tells us how much the Match IQ variable tends to change due to its relationship with the IQ variable. Because the regression coefficients can be larger than 1.0 we always include the leading 0 before the decimals.
We can also find the standard errors for the regression coefficients from Figure 13n above:
\[ \begin{align} s_{b_0} &= 9.877 \\ s_{b_1} &= 0.106 \end{align} \]
We fill these values in for the equation above. We will only perform this calculation for the slope, because rarely is it of interest whether the y-intercept equals zero, and therefore we rarely interpret the significance of the y-intercept.
\[ 0.940 - 2.042(0.106) < β < 0.940+2.042(0.106) \]
or (as shown by the output) 0.724 < β < 1.156 or (0.724, 1.156).
Mary was confident that the true population slope fell between 0.724 and 1.156, and therefore, that a statistically significant nonzero predictive relationship existed between the intelligence scores of delegates from the same delegations (specifically, using the UN delegate to predict their staff member’s IQ score).
Generally, in regression, the standardized regression coefficient, Beta, is reported as the effect size. Therefore, here, the effect reported would be Beta = .852.
We began this chapter with a discussion of the bivariate normal distribution. This distribution was discussed first because in this chapter, situations where the data consisted of related pairs of scores were to be presented. In these situations it would be assumed that the related pairs of scores were sampled from a bivariate normal distribution. Therefore, our discussion of the bivariate normal distribution included a presentation of how you might check on this assumption by viewing the densities of contours in a scatterplot, and by checking to see that the marginal distributions of the measurements were normally distributed.
Following our discussion of the bivariate normal distribution, we considered what is referred to as the dependent t test. This test is used to test the hypothesis that the difference between the means of related scores is equal to a constant. We found that the dependent t test is used in two situations. The first situation occurs when a group of units is randomly sampled and the same unit is measured twice, either with the same or with commensurate instruments. The process of hypothesis testing used in this situation is referred to as a repeated measures design. The second situation occurs when a random sample of n pairs of units are drawn and measured. The process of hypothesis testing used in this situation is referred to as a randomized blocks design.
Our discussion of the dependent t test was followed by a presentation of two situations where we considered the null hypothesis that the population Pearson product-moment correlation coefficient was equal to a constant. We first considered the situation where this constant was not equal to zero. In this situation, we found that the sampling distribution of the correlation coefficient was positively skewed when the population correlation was negative, and negatively skewed when the population correlation was positive. Therefore, in this situation, Fisher’s z-transformation was used to transform the correlational values to scores that follow a normal distribution. This transformation enabled us to use the z statistic and critical values from the standard normal distribution to test the null hypothesis. We then considered the second situation, where the constant in the null hypothesis is zero. In this situation we found that the t statistic may be used to test the null hypothesis.
Statistical Procedures
Analyses to Run
• Use a SCALE variable Y • Use a SCALE variable X • COMPUTE a DIFFERENCE SCORE as Y – X (e.g., DIFF = Y – X) • Run descriptive statistics for the DIFFERENCE SCORE • Run a paired-samples t test with X and Y as the paired variables • Run a one-sample t test using the computed DIFFERENCE SCORE as the variable (Test Value = 0) • Run nonparametric test for related samples for the paired variables X and Y (e.g., Wilcoxon Signed Ranks test) • Run an error bar plot for the separate variables X and Y
Using the output, respond to the following items
Using the PAIRED-SAMPLES T TEST output, respond to the following items
Provide the most appropriate research question for this analysis
Provide the statistical null hypothesis using both words and appropriate symbols.
What is the paired mean difference between X and Y?
Which mean was higher in the sample, X or Y?
What was the standard deviation for the paired mean difference between X and Y?
What is the variance of the paired difference scores?
What was the standard error for the paired mean difference between X and Y?
What are the degrees of freedom and the critical value for this paired-samples t test?
Show or explain how the t statistic for paired mean differences between X and Y is calculated.
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use a confidence interval for the paired mean difference as evidence to explain your answer
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use the calculated t statistic compared to a t critical value as evidence to explain your answer (include both the degrees of freedom and the critical value used for this test)
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use a Sig. or p value to explain your answer
Explain what the specific p (Sig.) value you obtained in your results means as a probability (i.e., do not talk about whether it is statistically significant, but rather what probability the p represents).
Would this paired t test be statistically significant as a one-tailed (i.e., directional) test? Explain and provide evidence.
Is there a statistically significant positive correlation between the two variables in the paired t test?
Does it matter?
Which mean would you estimate to be higher in the population, X or Y?
Show or explain how to calculate the standardized paired mean difference effect size (i.e., Cohen’s d) between X and Y is calculated. That is, calculate Cohen’s d for this paired-samples t test.
Using ALL the output in this section above, respond to the following item
Using the NONPARAMETRIC PAIRED-SAMPLES WILCOXON SIGNED-RANK TEST output, respond to the following items
How many cases in the sample had higher X scores than Y scores? Provide specific evidence.
How many cases in the sample had lower X scores than Y scores? Provide specific evidence.
How many cases in the sample had the same X and Y scores? Provide specific evidence.
Approximately what is the two-tailed probability of obtaining the resulting z statistic (as an absolute value) if the null hypothesis is true?
What do you conclude based on these results?
Interpret the results for the Wilcoxon Signed-Rank test in an APA-style report to answer the research question and to describe in detail the mean difference between Y and X. Whether statistically significant or not, refer to descriptive statistics, (e.g., mean ranks but also report means, standard deviations, mean differences, effect sizes, and/or confidence intervals), graphs, inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between variables/measures. Be sure to discuss assumptions and outliers and their potential impact.
Analyses to Run
• Use the same analyses you ran in Chapter 7: Bivariate Correlation (Descriptive)
Using the output, respond to the following INFERENTIAL items
Using ALL the output in this section above (as well as Section 11 as needed), respond to the following item
Skip any of the next eight items if it is not possible to answer them (e.g., if there are no negative correlations). Do NOT include any correlations of a variable with itself (that always have r = 1.0).
Is the strongest positive Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the strongest negative Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the weakest positive Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the weakest negative Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the strongest positive Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the strongest negative Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the weakest positive Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the weakest negative Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
• Use the same analyses you ran in Chapter 8: Bivariate (one-predictor) Linear Regression (Descriptive)
Using the output, respond to the following INFERENTIAL items
Provide the most appropriate research question for the analysis
What is the Statistical Null Hypothesis for the regression model (i.e., correlation and/or shared variance)?
Is the regression model statistically significant?
Is the amount of explained variation considered statistically significantly different from ZERO?
Based on your statistical significance decision in the previous items, what type of error might you have made? Why?
Can we conclude that there is a relationship between Y and X in the population?
Can we conclude that the value of X causes the change in Y?
Can we conclude that a case’s Y score in the population can be predicted by using the value of the case’s X score?
Show or explain how the F statistic is calculated
Show or explain how all 3 degrees of freedom in the ANOVA table are calculated
Show or explain how to test the statistical significance of the F statistic as compared to the F critical value
Report and interpret the p value associated with the F statistic
Calculate or report and interpret the effect size for this analysis.
Is the assumption of homoscedasticity met for this regression? Show and describe your evidence.
Is the assumption of normally distributed residuals met for this regression? Show/describe your evidence.
Is the assumption of linearity met for this regression? Show and describe your evidence.
What is the Statistical Null Hypothesis for the regression slope?
Show or explain how the t statistic is calculated for the slope.
Is the slope statistically significant?
Show or explain how to test the statistical significance of the t statistic as compared to the t critical value (report the degrees of freedom and recall that the df used for the critical t value is based on the Residual df from the ANOVA table)
Report and interpret the p value associated with the t statistic for the slope
Show you to use the Confidence Interval for the regression coefficient to decide if the slope is statistically significantly different from 0
Using ALL the output in this section above (as well as Section 12 as needed), respond to the following item
Interpret the results for the bivariate regression in an APA-style report to answer the research question and to describe in detail the predictive relationship between Y and X. Whether statistically significant or not, use descriptive statistics, (e.g., Pearson correlation, shared variation, R2), inferential statistics, degrees of freedom, assumptions, outliers, and statistical significance to describe the size and direction of the relationship.
Respond to the following items about null hypothesis testing
FOR PAIRED T TESTS: If the true paired mean difference (i.e., reality) between the Y and X scores is exactly equal to ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain.
FOR PAIRED T TESTS: If the true paired mean difference (i.e., reality) between the Y and X scores is larger than ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain. FOR PAIRED T TESTS: Use the following options to respond to the four items below:
Discuss why each of the following is or is not possible in Null Hypothesis Significance Testing:
Explain the Research Design and Statistical Analysis terms below BRIEFLY but SUFFICIENTLY and IN YOUR OWN WORDS (don’t just give another name for them). Some may require finding additional readings. If you use resources, paraphrase in your own words AND provide a citation of the resource you used (including page numbers).
Respond to the following items about null hypothesis testing
Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/
This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for
most semesters 1987-1991 and again 2018-2025.