Chapter 13: Statistics Based on Pairs of Scores

Back to Chapter Index Page

INTRODUCTION

Pairs of correlated scores are frequently encountered in research projects. In this chapter, we will consider statistical tests of hypotheses concerned with the population means of such score pairs and with their population Pearson product-moment correlation. In considering the statistical tests of these hypotheses, we will find that we must assume that the score pairs follow what is referred to as a bivariate normal distribution in the population of interest. Therefore, before we consider these hypotheses we will discuss the bivariate normal distribution.

In this chapter we will consider three test statistics. We will use the t statistic to consider hypotheses concerning the means of dependent measures. This t statistic has several names: paired t test, paired-samples t test, dependent t test are all usually interchangeably by scholars and researchers for the name of the t test used with paired data. The z statistic and t statistic are the statistics mot commonly used to test the statistical significance of a correlation. We will also use a t statistic when we test the statistical significance of a regression coefficient. Our discussion will be concluded by considering a test statistic called the chi- square statistic, which we will use to test the hypothesis that the population variance of a single group of scores is equal to a constant.

In considering each test statistic, we will continue the saga of Mary Barth and her United Nations delegates, but we will add a second column of data to our data set. The dependent variable will again be intelligence as measured by the World Adult Intelligence Scale (WAIS). We will consider only two data sets for all of our statistical tests, but, as in chapter 12, each time we consider a new situation or test statistic we will change the scenario. This will again free us to emphasize the test statistics and the hypothesis test process and not the data per se as we consider new situations. In chapter 12, each research problem was illustrated using all four of the possible research hypotheses. In this chapter space will permit an example of only one research hypothesis, but the others will be presented as exercise.

THE BIVARIATE NORMAL DISTRIBUTION

In the next two sections, we will consider statistical tests that are used when two measurements have been taken. We will label these measurements as X1 and X2. Here the subscripts 1 and 2 will be used to identify where a measurement took place, that is, in group 1 or in group 2, or the point in time that a measurement occurred, that is, first (time 1) or second (time 2). In the situations that we considered in chapter 12, we found a single measurement and we assumed that it was normally distributed in the population of interest. In the next two sections, we will make the assumption that the two measurements (dependent variables) follow what is called a bivariate normal distribution.

A normal frequency distribution is a two dimensional Figure which is determined by scores on the x-axis and by the score frequencies (heights) on the y-axis. A bivariate normal frequency distribution is a three dimensional Figure which is determined by pairs of scores which locate points in the (x, y) coordinate plane and by the frequencies of these points which are measured on a third axis labeled the z-axis. A bivariate normal distribution is illustrated in Figure 13a(i). If you consider Figure 13a(i), you can see why the bivariate normal distribution is sometimes compared to a Mexican sombrero (a type of hat).

Figure 13a(i) A 3-D histogram of a bivariate normal distribution with a positive correlation between X and Y

Figure 13a(ii) A theoretical (smoothed) bivariate normal distribution with a positive correlation between X and Y

Figure 13a(iii) A contour map of a bivariate normal distribution with a positive correlation between X and Y

Figure 13a(iv) A heat map of a bivariate normal distribution with a positive correlation between X and Y

Figure 13a(v) A combined 2-D scatterplot and univariate histograms of a bivariate normal distribution with a positive correlation between X and Y

An aerial view of a bivariate normal distribution is displayed in the contour map of Figure 13a(ii). In Figure 13a(ii), the elliptical contours are similar to those found on a topological map, that is, they are an indication of the heights of the score points below them. Therefore, the center ellipse would show the highest frequency (density) of scores with the frequency decreasing as one moves out from the center. Researchers frequently use scatterplots such as those shown in chapter 7 to check on the assumption of bivariate normality. If the assumption of bivariate normality is reasonable, a researcher would expect an elliptical pattern of points which is darkest at the center (showing high frequency) with decreasing darkness as one moves out from the center.

The following are some points that you should be aware of in dealing with a bivariate normal distribution both in general and in this chapter.

The major axis of any ellipse from a contour map of a bivariate normal distribution reflects the sign of the correlation of the score pairs, and the minor axis of any ellipse reflects the magnitude of the correlation of the score pairs. (At this point, you may want to review the discussion of the correlation coefficient given in chapter 7.)
Up to this point in this book, the score pairs have usually been labeled as X and Y and the correlation as \(r_{XY}\) . In this chapter, the score pairs will also be labeled as X1 and X2 and the correlation as \(r_{12}\).
If a set of score pairs (X1, X2) has a bivariate normal distribution, then the distribution of X1 alone will be a normal distribution as will be the distribution X2 alone. The distribution of one of the variables in a pair by itself (i.e., alone) is referred to as a marginal distribution.
If the marginal distributions of the pairs of scores are normally distributed, this does not necessarily imply that the pairs of scores will have a bivariate normal distribution. However, in a research project the marginal distributions of the individual variables are usually inspected since they should be normally distributed if the score pairs have a bivariate normal distribution.
A slice of the bivariate normal distribution which is perpendicular to either the x-axis or the y-axis will yield a normal distribution.

PAIRED SAMPLES STUDENT’S t TEST (DEPENDENT SAMPLES)

TESTS OF THE NULL HYPOTHESIS THAT THE DIFFERENCE BETWEEN TWO POPULATION MEANS IS EQUAL TO A CONSTANT, \(H_0: \mu_1 – \mu_2 = c_0\)

In this section, we will discuss data wherein the scores are paired and therefore presumed to be correlated. Here, we will consider the conditions under which you would use Student’s t test to test the null hypothesis that the difference between the population means of these paired scores is equal to a constant. We will refer to this t test as the dependent t test because it is used with scores that are correlated, or dependent on one another. The usual value chosen for the constant in the null hypothesis which is tested using the dependent t test is zero: \(c_0 = 0\). When this is the case, you are testing a null hypothesis about a mean difference, and the null hypothesis can be written in several equivalent ways (recall that 1 and 2 here stand for group or time, e.g., group 1 or 2, time 1 or 2):

\[ \begin{align} H_0&: \mu_1 = \mu_2 \\ H_0&: \mu_1 – \mu_2 = 0 \\ H_0&: \mu_D = 0 \end{align} \]

where \(\mu_D = \mu_1 – \mu_2\) (i.e., D stands for difference).

Same Units (Cases) Measured Twice.

The first situation in which the dependent t test is used is found when the same unit is measured at two different times with the same or a commensurate instrument. Commensurate instruments are instruments that have the same measurement scale. For example, different forms of the Miller’s Analogy test may be described as commensurate instruments. Because the instruments have the same scale, we would expect them to also have similar means (otherwise, comparing means would make very little sense).

This situation represents the most elementary form of what statisticians refer to as a repeated measurements design. In the example that we will consider shortly, we will again focus on the intelligence of the delegates to the United Nations, however, here the intelligence of the delegates to the United Nations will be measured twice; at the start of the study and then six months later.

Matched Units.

The second situation in which the dependent t test is used is found when units are matched on a variable and then are randomly assigned to one of two treatments. This situation represents the most elementary form of what statisticians refer to as a randomized block design. In the example considered here, 32 delegations to the U.N. were randomly selected from the population of delegations, and two delegates were randomly sampled from each delegation and placed either into a group that was given lessons on how to take intelligence tests or a group that was given no lessons. Here, the delegates were matched (or blocked) on delegation, that is, they were from the same delegations.

Illustrated.

In Figure 13b(i), four subjects are illustrated twice, once at time 1 and once at time 2. This Figure is meant to illustrate the repeated measures design where the same subject is measured twice, once at time 1 and then again at time 2. In Figure 13b(ii), eight different subjects are illustrated, but the subjects are matched based on similar appearance. This Figure is meant to illustrate a randomized block design, where subjects are matched on one or more characteristics. Note, however, that even though the subjects in Figure 13b(ii) are matched on similar appearance, there are other characteristics on which the subjects are not matched.

State of Affairs

The state of affairs that must exist before you can consider the use of Student’s dependent t test statistic are:

You have a random sample of paired measurements which were derived from either a random sample of a single group of units measured at two different times (Situation 1), or a random sample of units that were matched on one or more variables and then randomly placed into one of two treatments (Situation 2).
You are interested in the difference between the means of your two measurements.
The population variance of the mean differences is unknown. (Note that if the variance of the mean differences is known, the z statistic described in chapters 11 and 12 would be used here. In practice, however, the population variance of the mean differences is rarely known, so we will not consider the z statistic.)

Figure 13b(i) Repeated Measures Design where the same units (here, persons) are measured at different times with the same or a commensurate instrument

Figure 13b(ii) Randomized Block Design where units (here, persons) are matched on one or more characteristics (here, similar appearance)

Test Statistic in the Sampling Distribution

Given the preceding state of affairs, there are two equations which are commonly used to calculate the t statistic.

\[ \begin{equation} t = \frac{M_D - c_0} {s_D ⁄\sqrt{n}} \tag{13-1} \end{equation} \]

Using Equation 13-1, the difference of each pair of scores is first found; then the mean, \(M_D\), and standard deviation, \(s_D\) of the n pairs of differences are found.

\[ \begin{equation} t = \frac {(M_1-M_2 )-c_0} {\sqrt{(s_1^2⁄n)+(s_2^2⁄n)-(2r_{12} s_1 s_2⁄n)}} \tag{13-2} \end{equation} \] Equation 13-2 is based on the sample means, M1 and M2, variances, \(s_1^2\) and \(s_2^2\), standard deviations, \(s_1\) and \(s_2\), and the correlation between the measures, \(r_{12}\). Both equations yield the same result.

Degrees of Freedom.

When the null hypothesis is true, the sampling distribution of the t statistic follows a t distribution with v = n – 1 degrees of freedom, where n is the number of pairs of measurements.

The Bases For Equations 1 And 2.

The two equations are based on the following points:

A population of score differences has a mean of \(c_0 = \mu_D\) and a variance of \(\sigma_D^2\).
The underlying sampling distribution consists of sample mean differences, that is, MD, where MD is the difference between two sample means of paired data (i.e., \(M_D = M_1 - M_2\)).
Under the Central Limit Theorem, the mean of the sampling distribution of mean differences is \(c_0\) (i.e., the mean of the original difference score population), and the variance is \(σ_D^2⁄n\) (i.e., the variance of the original difference score population divided by the number of pairs in a sample).
The variance of the sampling distribution is called the variance of the mean differences, and is denoted by:

\[ \begin{equation} \sigma_{M_1-M_2}^2 = \frac{σ_D^2} {n} \tag{13-3} \end{equation} \]

The standard deviation of the sampling distribution is called the standard error of the mean differences, and is denoted by:

\[ \begin{equation} \sigma_{M_1-M_2} = \frac{σ_D} {\sqrt{n}} \tag{13-4} \end{equation} \]

Because the population variance in unknown, it is estimated by \(s_D\), and therefore the Student’s t statistic (and not the z statistic) is used as the test statistic.
It can be shown that:

\[ \begin{equation} s_D = \sqrt{(s_1^2) + (s_2^2) - (2r_{12} s_1 s_2)} \tag{13-5} \end{equation} \]

and since \(M_D = M_1 – M_2\), the numerators and denominators of equations 1 and 2 are the same.

Assumptions

The paired t test statistic will be valid when:

The units are independent of one another; that is, the scores that one pair receives do not affect the scores that another pair receives. There may be a relationship between the units but none within the units.
The pairs of scores follow a bivariate normal distribution in the population of interest. This implies that the distribution of the differences of these scores will follow a normal distribution.

Violation of the Assumptions

The importance of the preceding assumptions is as follows:

The assumption that the pairs are independent of one another is extremely important, because if it is violated, the level of significance (i.e., the probability of rejecting a true null hypothesis) can increase dramatically (e.g., from .05 to .40). b. Given a small sample size, a violation of the assumption of bivariate normality does not seriously affect the level of significance for the two-tailed test. However, it may be a problem for one-tailed tests (see Srivastava, 1959). Given large samples (n > 25), the assumption of normality in the population can generally be ignored.

EXAMPLE FOR AN EXPLORATORY ANALYSIS WHERE THE RESEARCH HYPOTHESIS AGREES WITH \(H_A: \mu_D \ne 0\)

In this situation, Mary Barth has decided to see if there is a difference between the mean intelligence scores found at the beginning of her study and those found six months later. Since the World Adult Intelligence Scale has been found to yield reliable results in the past, and since an adult’s intelligence level is not known to change over a short period of time, Mary predicted that there would be no change in the mean IQ level of the U.N. delegates. Let us now consider the elements of hypothesis testing in this situation. Even though Mary may have thought there would be no difference, she needed to set the Null Hypothesis that the means are equal. Her hope was that the difference would not be statistically significant, which would cause her to reject the null hypothesis of equal means. However, it is important to note that failing to reject the null hypothesis does not mean that it is true—it only means that there was no evidence to reject it. Therefore, this analysis proceeds in exactly the same way as if Mary had predicted that there would be a difference—just with different outcomes desired.

Research Problem:

Is there a change in the mean intelligence level of delegates to the United Nations when measured on the World Adult Intelligence Scale from one time period to another?

Research Hypothesis:

There will be no change in the mean intelligence level of the delegates to the United Nations when measured on the World Adult Intelligence Scale from one time period to another.

Statistical Hypotheses:

\[ \begin{align} H_0&: \mu_D = 0 \\ H_A&: \mu_D \ne 0 \end{align} \]

Determine valid and reliable measures of the dependent and Independent variables.

Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed in a previous chapter.

Level of Significance.

The probability of rejecting a true null hypothesis (the probability of making a Type I error) was set at .05.

Power.

Mary Barth decided that she would like her power to reject the null hypothesis if she should reject it to be at least .85; she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.

A priori effect size.

In a confirmatory study, a modification of equation (11-2) is used to determine the a priori effect size. We will denote the effect size for the dependent t test as \(d_d\); the subscript “d” is for dependent. The effect size for the dependent t test is:

\[ \begin{equation} d_d = \large{\frac{|\mu_D - c_0|} {\sigma_D}} \tag{13-6} \end{equation} \]

The relationship between Cohen’s two-group effect size d and \(d_d\) is:

\[ \begin{align} d &= d_d(\sqrt{2}) \\ &or \\ d_d &= d⁄\sqrt{2} \tag{13-7} \end{align} \]

In an exploratory study, Cohen’s (1988) two-group effect sizes: small, \(d = .20\) (\(d_d = .14\)), medium, \(d = .50\) (\(d_d = .35\)), or large, \(d = .80\) (\(d_d = .57\)), may be used with Table 13a to select sample sizes.

In this confirmatory study, Mary Barth had decided that if the means differed by 3.0 points or more, she would like to have a high probability of detecting a difference this large or larger. Also, past evidence had indicated that a reasonable estimate of the population difference standard deviation was approximately 5.0. Substituting these values into equation (13-6) Mary found that the dependent a priori effect size was:

\[ \begin{equation} d_d = \frac{3} {5} = .60 \end{equation} \]

Sample Size.

Sample size is found as a modification of equation (11-3) as:

\[ \begin{equation} n = \left[\frac{\sigma_D(z_{(1-\alpha_P)} + z_{(1-\beta)})} {|\mu_D - c_0|}\right]^2 \tag{13-8} \end{equation} \]

where

\(\sigma_D\) = the standard deviation of the population of difference scores. In Mary’s study, this standard deviation was estimated to be 5.
\(\alpha_P = \alpha_1 = \alpha\), the level of significance (alpha) for a one-tailed test, or \(\alpha_P = \alpha_2 = \alpha/2\) for a two-tailed test. Mary Barth had \(\alpha_P = \alpha_2 = .05/2 = .025\) for a two-tailed test.
\(z_{(1-αp)}\) = the z score at the \((1 – \alpha_P)\) percentile. Mary Barth had \(1 – \alpha_P = 1 – \alpha/2 = 1 – .025 = .975\), and therefore \(z_{(.975)} = 1.960\) for the two-tailed test.
\(c_0\) = the value of the population mean difference in the null hypothesis. Here (and most commonly), \(c_0 = \mu_0 = 0\).
Power = \(1 – \beta = .85\), so that \(\beta = .15\) (i.e., the probability of a Type II error = .15)
\(z_{(1-\beta)}\) = the z score at the \((1 – \beta)\) percentile = \(z_{(.85)} \approx 1.04\).
\(\mu_D\) is the population mean difference that you would consider as being significantly different from \(c_0\). Here, based on past experience, Mary selected \(\mu_D\) = 3.

Substituting these values into equation (13-8), we have:

\[ n = \left[\frac{5(1.96 + 1.04)} {3}\right]^2 = 25 \]

Therefore, 25 subjects would yield power of .85. Mary decided to use 32 subjects, however, because this number could be easily sampled, and 32 subjects would assure her of power greater than .85.

In an exploratory study, sample size can be found using Table 13a1-4, given Cohen’s a priori effect size, \(d\), the level of significance. \(\alpha\), and a two-tailed test. See exercise 13.3.

Table 13a1 N to detect d by t test (2-tailed \(\alpha = .01\), 1-tailed \(\alpha = .005\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	1	1.2
0.25	365	94	44	26	18	14	11	9	7	6
0.5	667	170	78	45	30	22	17	14	10	8
0.6	804	204	93	54	36	26	20	16	12	9
0.65	881	223	101	59	39	28	22	18	13	10
0.7	965	244	111	64	42	31	23	19	13	11
0.75	1060	268	121	70	46	33	25	20	14	11
0.8	1172	296	134	77	51	36	28	22	16	12
0.85	1309	330	149	85	56	40	31	24	17	13
0.9	1492	376	169	97	63	45	34	27	19	14
0.95	1785	449	202	115	75	53	40	32	22	16
0.99	2407	605	271	154	100	71	53	41	28	21

Table 13a2 N to detect d by t test (2-tailed \(\alpha = .02\), 1-tailed \(\alpha = .01\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	1	1.2
0.25	276	71	34	20	14	11	9	7	6	5
0.5	544	139	63	37	25	18	14	12	9	7
0.6	669	170	77	45	30	22	17	14	10	8
0.65	739	187	85	49	33	24	18	15	11	8
0.7	816	206	94	54	36	26	20	16	11	9
0.75	904	228	103	60	39	28	22	17	12	10
0.8	1007	254	115	66	43	31	24	19	13	10
0.85	1134	286	129	74	48	35	26	21	15	11
0.9	1305	329	148	85	55	39	30	24	16	12
0.95	1580	397	178	102	66	47	35	28	19	14
0.99	2168	544	244	139	90	63	47	37	25	18

Table 13a3 N to detect d by t test (2-tailed \(\alpha = .05\), 1-tailed \(\alpha = .025\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	1	1.2
0.25	167	44	21	13	9	7	6	5	4	4
0.5	387	98	45	26	18	13	10	9	6	5
0.6	492	125	57	33	22	16	13	10	7	6
0.65	552	140	64	37	24	18	14	11	8	6
0.7	620	157	71	41	27	20	15	12	9	7
0.75	696	176	80	46	30	22	17	13	10	7
0.8	787	199	90	52	34	24	19	15	10	8
0.85	900	227	102	59	38	27	21	17	12	9
0.9	1053	265	119	68	44	32	24	19	13	10
0.95	1302	327	147	84	54	39	29	23	16	12
0.99	1840	462	207	117	76	54	40	31	21	15

Table 13a4 N to detect d by t test (2-tailed \(\alpha = .10\), 1-tailed \(\alpha = .05\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	1	1.2
0.25	93	25	12	8	6	5	4	3	3	3
0.5	272	69	32	19	13	9	8	6	5	4
0.6	362	92	42	24	16	12	9	8	6	5
0.65	414	105	48	28	18	13	10	8	6	5
0.7	472	119	54	31	21	15	12	9	7	5
0.75	540	136	62	36	23	17	13	10	7	6
0.8	620	156	71	41	27	19	15	12	8	6
0.85	721	182	82	47	31	22	17	13	9	7
0.9	858	216	97	55	36	26	19	15	11	8
0.95	1084	272	122	70	45	32	24	19	13	10
0.99	1579	396	177	100	65	46	34	27	18	13

Critical Values.

Using JAMOVI module distrACTION, the critical values of the t statistic for the two-tailed test with alpha = .05 and degrees of freedom = 31 are found. Note that sometimes we use tables that do not include the exact degrees of freedom we desire. In such cases, we recommend a conservative approach, that is, using df = 20 or df = 30 when df = 31 is not available in the table. We call it conservative because a smaller df typically implies a smaller sample size and, other things being equal, it is harder to obtain statistical significance with smaller samples and smaller df.

\[ \begin{align} t_{(0.025, 31)} &= -2.0395 \\ \text{and} \\ t_{(0.975, 31)} &= +2.0395 \end{align} \]

Box 13.1
At this point the a priori parameters
of hypothesis testing have been 
established (i.e., your "bet" is made).

Randomly Select and Measure the Sample Units.

The two measurements of the 32 randomly selected United Nations delegates are shown in Table 13b, along with their difference scores. Here are the pretest and posttest scores Mary collected (Table 13b).

Table 13b Intelligence measures taken at the beginning of a study, after six months, and the difference between the two measures

Check: Outliers

Most importantly, we need to check for outliers on the difference scores (see Figure 13c). However, outliers on the individual variables may also be informative and are usually worth investigating at a univariate level.

Figure 13c Outliers for Difference Scores

  jmv::descriptives(
    data = data,
    vars = Difference,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    box = TRUE,
    extreme = TRUE)


 DESCRIPTIVES

 EXTREME VALUES

 Extreme values of Difference               
 ────────────────────────────────────────── 
                   Row number    Value      
 ────────────────────────────────────────── 
   Highest    1             4      5.0000   
              2            11      5.0000   
              3             1      4.0000   
              4             2      4.0000   
              5            19      4.0000   
   Lowest     1             6    -12.0000   
              2            12     -8.0000   
              3             3     -6.0000   
              4            10     -6.0000   
              5            16     -5.0000   
 ──────────────────────────────────────────

Check: Assumptions

We need to check the assumption that the difference scores are normally distributed (see Figure 13d1). This will also be provided as output from the paired t test in JAMOVI. We should also check that there is a non-zero correlation between the two measures (see Figure 13d2). The correlation between X1 and X2 needed for equation 2 was found to be r = 0.975 using Bivariate Correlation, as described in chapter 7. We can still run the paired-samples t test if the correlation is low, but we are using the dependent t test, in part, because we expect there to be a correlation (a dependency) between the measures.

Figure 13d1 Assumptions for Difference Scores for Paired t test (Normality)

  jmv::descriptives(
    data = data,
    vars = Difference,
    hist = TRUE,
    boxLabelOutliers = FALSE,
    qq = TRUE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                          
 ───────────────────────────────────── 
                          Difference   
 ───────────────────────────────────── 
   N                              32   
   Missing                         0   
   Mean                      -1.2500   
   Skewness                 -0.57441   
   Std. error skewness       0.41446   
   Kurtosis                  0.74812   
   Std. error kurtosis       0.80937   
   Shapiro-Wilk W            0.95702   
   Shapiro-Wilk p            0.22725   
 ─────────────────────────────────────

Figure 13d2 Assumptions for Difference Scores for Paired t test (Correlation)

  jmv::corrMatrix(
    data = data,
    vars = vars(IQ_Posttest, IQ_Pretest),
    n = TRUE)


 CORRELATION MATRIX

 Correlation Matrix                                          
 ─────────────────────────────────────────────────────────── 
                                 IQ_Posttest    IQ_Pretest   
 ─────────────────────────────────────────────────────────── 
   IQ_Posttest    Pearson's r              —                 
                  df                       —                 
                  p-value                  —                 
                  N                        —                 
                                                             
   IQ_Pretest     Pearson's r        0.97541             —   
                  df                      30             —   
                  p-value           < .00001             —   
                  N                       32             —   
 ───────────────────────────────────────────────────────────

Calculate the Test Statistic.

In Figure 13e, the test statistic is found to be t = -1.8396 or t = -1.84 using either equation 1 or equation 2 with descriptive statistics shown in Figure 13f. Later in this chapter, you will be shown how to arrive at the dependent t test directly.

Figure 13e t test statistic for the data in Table 13b with non-directional null hypothesis (two-tailed)

  jmv::ttestPS(
    data = data,
    pairs = list(
        list(
            i1="IQ_Pretest",
            i2="IQ_Posttest")),
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = TRUE,
    desc = TRUE)


 PAIRED SAMPLES T-TEST

 Paired Samples T-Test                                                                                                                                                
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                               statistic    df        p          Mean difference    SE difference    Lower      Upper                   Effect Size   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ_Pretest    IQ_Posttest    Student's t      -1.8396    31.000    0.07541            -1.2500          0.67948    -2.6358    0.13581    Cohen's d       -0.32521   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> ≠ 0


 Normality Test (Shapiro-Wilk)                            
 ──────────────────────────────────────────────────────── 
                                     W          p         
 ──────────────────────────────────────────────────────── 
   IQ_Pretest    -    IQ_Posttest    0.95702    0.22725   
 ──────────────────────────────────────────────────────── 
   Note. A low p-value suggests a violation of the
   assumption of normality


 Descriptives                                                  
 ───────────────────────────────────────────────────────────── 
                  N     Mean      Median    SD        SE       
 ───────────────────────────────────────────────────────────── 
   IQ_Pretest     32    92.031    93.500    16.532    2.9226   
   IQ_Posttest    32    93.281    97.000    14.902    2.6344   
 ─────────────────────────────────────────────────────────────

Figure 13f Parametric statistics used to calculate the t test statistic for the data in Table 13b

  jmv::descriptives(
    data = data,
    vars = vars(IQ_Pretest, IQ_Posttest, Difference),
    hist = TRUE,
    boxLabelOutliers = FALSE,
    qq = TRUE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    se = TRUE,
    ci = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                           
 ────────────────────────────────────────────────────────────────────── 
                              IQ_Pretest    IQ_Posttest    Difference   
 ────────────────────────────────────────────────────────────────────── 
   N                                  32             32            32   
   Missing                             0              0             0   
   Mean                           92.031         93.281       -1.2500   
   Std. error mean                2.9226         2.6344       0.67948   
   95% CI mean lower bound        86.071         87.908       -2.6358   
   95% CI mean upper bound        97.992         98.654       0.13581   
   Skewness                     -0.20762       -0.42112      -0.57441   
   Std. error skewness           0.41446        0.41446       0.41446   
   Kurtosis                     -0.67077       -0.43464       0.74812   
   Std. error kurtosis           0.80937        0.80937       0.80937   
   Shapiro-Wilk W                0.97668        0.96954       0.95702   
   Shapiro-Wilk p                0.69899        0.48691       0.22725   
 ────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a
   t-distribution with N - 1 degrees of freedom

Make A Decision About the Null Hypothesis.

The decision was made to fail to reject the null hypothesis because the absolute value of the test statistic was less than the absolute value of the critical values; |-1.83964| < 2.0395. Our t statistic was negative, so we compare the absolute value (i.e., 1.83964 to 2.0395. Alternatively we could compare the -1.83964 to the lower critical value of -2.0395. The two-tailed p value of .07541 was larger than the \(\alpha = .05\) level of significance (and therefore not statistically significant). Importantly, Mary cannot conclude that the difference in the population is exactly zero (i.e., there is no difference before and after training in the population), but she has no evidence to reject the null hypothesis and therefore must not reject it.

Construct a Confidence Interval and calculate an actual effect size.

The two-tailed \((1 – α)\%\) confidence interval may be found using the equation:

\[ \begin{equation} M_D - t_{(1-\alpha⁄2,v)} \frac{s_D} {\sqrt{n}} < \mu_D < M_D + t_{(1-\alpha⁄2,v)} \frac{s_D} {\sqrt{n}} \tag{13-9} \end{equation} \]

Here, the \((1 – .05)\%\) or 95% confidence interval can be found using \(M_D = -1.25\), \(s_D = 3.8437\), \(t_{(.025,30)} = -2.042\), \(t_{(.975,30)} = 2.042\), \(n = 32\), and \(v = n – 1 = 31\) (but df = 30 was used here because we were using a table that did not have 31 degrees of freedom included) to be:

\[ \begin{equation} -1.25 - 2.402(3.8437⁄\sqrt{32}) < \mu _D < -1.25 + 2.402(3.8437⁄\sqrt{32}) \end{equation} \]

or \(-2.6358 < \mu_D < 0.1358\) or approximately \((-2.64, 0.14)\).

Based on the confidence interval, Mary was relatively confident that the true population mean difference fell between -2.64 and 0.14, and therefore, that the difference between the intelligence means from one test to another was not statistically significantly different from zero. Remember that when a two-tailed test indicates that the null hypothesis should not be rejected, the constant from the null hypothesis, here zero, will be between the limits of the confidence interval.

We have calculated the paired mean difference to be 1.25 (as an absolute value) and the standard deviation of the paired mean difference to be 3.8437. Therefore the standardized mean difference for the sample is \(d = 1.25 / 3.8437 = 0.325\), which is a relatively small mean difference (about one-third of a standard deviation). But more importantly, because we could not reject that the mean difference is 0, we cannot estimate what the population standardized effect size (i.e., Cohen’s d) is anything other than 0.

EXAMPLE FOR A CONFIRMATORY ANALYSIS WHERE THE RESEARCH HYPOTHESIS AGREES WITH \(H_A: \mu_D < 0\)

In this situation, Mary Barth felt that the intelligence level at the United Nations could be improved if the delegates were taught how to take intelligence tests. Therefore, she planned to randomly select n delegations, and then to randomly select two delegates from each of these delegations. She would then randomly place one delegate from each delegation into a treatment where the delegates were taught how to take intelligence tests, and the other delegate from each delegation into a no-treatment control group. Note that this is not a strong matching design—it is for illustrative purposes only.

Research Problem:

Will United Nations delegates who are taught how to take intelligence tests score higher on the World Adult Intelligence Scale than delegates who have not received such training?

Table 13c Intelligence measures for matched pairs of United Nations delegates and their difference scores

Research Hypothesis:

United Nations delegates who are taught how to take intelligence tests will score higher on the World Adult Intelligence Scale than delegates who do not receive such training.

Statistical Hypotheses.

Here group 1 will be the no-training group, and group 2 will be the training group. Therefore, since the research hypothesis predicts that \(\mu_1\) will be less than \(\mu_2\), Mary expected the mean difference \(\mu_D = \mu_1 - \mu_2\) to be negative. Then, the null and alternative hypotheses are:

\[ \begin{align} H_0&: \mu_D = 0 \\ H_A&: \mu_D < 0 \end{align} \]

Mary decided that she wanted to detect a mean difference of one-half of a standard deviation. Using equation (13-6), her a priori effect size was found to be \(d_d\) = 0.50. Using equation (13-8), Mary found that she needed 29 subject to have power of .85. Therefore, her actual sample size of 32 delegates, shown in Table 13c, gave her a power greater than .85.

The critical value for the one-tailed test with \(\alpha = .05\) is \(t = -1.6955\). The test statistic, shown in Figure 13f, was calculated to be \(t = -4.57514\) using either equation 1 or 2. This means Mary can reject the null hypothesis. A difference existed between those who received training and those who did not (\(p = 7.23061\times 10^{-5}\)). On the basis of her 95% confidence interval, Mary concluded that the true population difference, \(\mu_1 - \mu_2\), fell below 0.

Figure 13g t test statistic for the data in Table 13b with directional null hypothesis (one-tailed)

  jmv::ttestPS(
    data = data,
    pairs = list(
        list(
            i1="IQ",
            i2="Match_IQ")),
    hypothesis = "twoGreater",
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = TRUE,
    desc = TRUE)


 PAIRED SAMPLES T-TEST

 Paired Samples T-Test                                                                                                                                   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                    statistic    df        p          Mean difference    SE difference    Lower    Upper                   Effect Size   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ    Match_IQ    Student's t      -4.5751    31.000    0.00004            -7.7812           1.7008     -Inf    -4.8976    Cohen's d       -0.80878   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> < 0


 Normality Test (Shapiro-Wilk)                 
 ───────────────────────────────────────────── 
                          W          p         
 ───────────────────────────────────────────── 
   IQ    -    Match_IQ    0.95653    0.22029   
 ───────────────────────────────────────────── 
   Note. A low p-value suggests a
   violation of the assumption of
   normality


 Descriptives                                                
 ─────────────────────────────────────────────────────────── 
               N     Mean      Median     SD        SE       
 ─────────────────────────────────────────────────────────── 
   IQ          32    92.031     93.500    16.532    2.9226   
   Match_IQ    32    99.812    102.000    18.252    3.2265   
 ───────────────────────────────────────────────────────────

Test of \(H_0: \mu_D = c_0\)

In considering the test of the null hypothesis that the difference between two related means is equal to a constant (that is, \(\mu_1 – \mu_2 = \mu_D = c_0\)), we will consider two cases: Case 1, where the constant, \(c_0\), in the null hypothesis is equal to zero, and Case 2, where the constant, \(c_0\), in the null hypothesis is not equal to zero. In the following presentations, you are shown how to deal with both of these cases.

Case 1: \(c_0 = 0\).

We will consider this case first, since it is the one that is usually of interest to most researchers. Enter the data in Table 13c in a dataset. There are actually two related ways to complete this analysis. The first is to perform the paired-samples t test (see Figure 13g). The second is to perform a one-sample t test on the difference scores using zero as the test value (see Figure 13h).

Figure 13h Testing whether the Difference is 0 using One-Sample t test

  jmv::ttestOneS(
    data = data,
    vars = Difference,
    testValue = 0,
    hypothesis = "lt",
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = TRUE,
    desc = TRUE)


 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                                                  
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                Statistic    df        p          Mean difference    Lower    Upper                   Effect Size   
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Difference    Student's t      -4.5751    31.000    0.00004            -7.7812     -Inf    -4.8976    Cohen's d       -0.80878   
 ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ < 0


 Normality Test (Shapiro-Wilk)        
 ──────────────────────────────────── 
                 W          p         
 ──────────────────────────────────── 
   Difference    0.95653    0.22029   
 ──────────────────────────────────── 
   Note. A low p-value suggests a
   violation of the assumption of
   normality


 Descriptives                                                   
 ────────────────────────────────────────────────────────────── 
                 N     Mean       Median     SD        SE       
 ────────────────────────────────────────────────────────────── 
   Difference    32    -7.7812    -9.0000    9.6210    1.7008   
 ──────────────────────────────────────────────────────────────

Case 2: \(c_0 \ne 0\).

Since we have not previously analyzed data that would meet this case, we will reconsider the data in Table 3b for the step-by-step illustration. Note that this is for instructional convenience—it would be inappropriate to analyze the same data with two different null hypotheses in a research project. Here, the statistical hypotheses will be:

\[ \begin{align} H_0&: \mu_D = -10 \\ H_A&: \mu_D \ne -10 \end{align} \]

The researcher expected the U.N. delegates with training to have a mean IQ score (\(\mu_2\)) that is 10 points higher than the mean IQ score (\(\mu_1\)) of the U.N. delegates with no training. Note that instead of having a negative mean difference, you could make treatment 1 “with training” and treatment 2 “without training” so that the mean difference would be positive. The order of the treatments effects the sign of the resultant t statistic and critical value (if you don’t use absolute values), but is arbitrary and therefore has no effect on the conclusions. In order to perform this test we can calculate difference scores and then use the non-zero (non-nil) Null Hypothesis value as the test value in the one-sample t test (see Figure 13i1). The other way is to perform the paired t test and use the confidence interval for the mean difference for our decision making (see Figure 13i2). Both Figures below show that we fail to reject the difference being equal to -10 in the population (ignore the p value in Figure 13i2).

Figure 13i1 Output from One-Sample T Test for the test of the null hypothesis that the mean Difference (IQ – Match_IQ) is -10

  jmv::ttestOneS(
    data = data,
    vars = Difference,
    testValue = -10,
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = TRUE,
    desc = TRUE)


 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                                                   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                Statistic    df        p          Mean difference    Lower      Upper                  Effect Size   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Difference    Student's t       1.3046    31.000    0.20165             2.2188    -1.2500    5.6875    Cohen's d        0.23062   
 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ ≠ -10


 Normality Test (Shapiro-Wilk)        
 ──────────────────────────────────── 
                 W          p         
 ──────────────────────────────────── 
   Difference    0.95653    0.22029   
 ──────────────────────────────────── 
   Note. A low p-value suggests a
   violation of the assumption of
   normality


 Descriptives                                                   
 ────────────────────────────────────────────────────────────── 
                 N     Mean       Median     SD        SE       
 ────────────────────────────────────────────────────────────── 
   Difference    32    -7.7812    -9.0000    9.6210    1.7008   
 ──────────────────────────────────────────────────────────────

Figure 13i2 Output from Paired-Samples T Test for the test of the null hypothesis of the mean paired difference (IQ – MatchIQ): The confidence interval includes -10 (i.e., ignore p value)

  jmv::ttestPS(
    data = data,
    pairs = list(
        list(
            i1="IQ",
            i2="Match_IQ")),
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = TRUE,
    desc = TRUE)


 PAIRED SAMPLES T-TEST

 Paired Samples T-Test                                                                                                                                     
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                    statistic    df        p          Mean difference    SE difference    Lower      Upper                   Effect Size   
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ    Match_IQ    Student's t      -4.5751    31.000    0.00007            -7.7812           1.7008    -11.250    -4.3125    Cohen's d       -0.80878   
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ <sub>Measure 1 - Measure 2</sub> ≠ 0


 Normality Test (Shapiro-Wilk)                 
 ───────────────────────────────────────────── 
                          W          p         
 ───────────────────────────────────────────── 
   IQ    -    Match_IQ    0.95653    0.22029   
 ───────────────────────────────────────────── 
   Note. A low p-value suggests a
   violation of the assumption of
   normality


 Descriptives                                                
 ─────────────────────────────────────────────────────────── 
               N     Mean      Median     SD        SE       
 ─────────────────────────────────────────────────────────── 
   IQ          32    92.031     93.500    16.532    2.9226   
   Match_IQ    32    99.812    102.000    18.252    3.2265   
 ───────────────────────────────────────────────────────────

CORRELATION

TESTS OF THE NULL HYPOTHESIS THAT THE POPULATION CORRELATION IS EQUAL TO A CONSTANT, \(H_0: \rho = c_0\)

In this section we will discuss the conditions under which you would use either the z statistic or Student’s t test to test the null hypothesis that the population Pearson product-moment correlation is equal to a constant. In this regard we will consider two situations. Situation #1 will be the case where the constant in the null hypothesis is not equal to zero (i.e., \(c_0 \ne 0\)) and situation #2 will be the case where this constant is equal to zero (i.e., \(c_0 = 0\)).

The latter two situations are considered separately in this chapter because the sampling distribution of the correlation coefficient is negatively skewed when \(\rho > 0\) and is positively skewed when \(\rho < 0\), but follows a t distribution when \(\rho = 0\). Therefore, in situation #1, where the correlation is not equal to zero, we will consider the use of a transformation to normalize the sampling distribution of the correlation coefficient. In situation #2, where the correlation is equal to zero, no such transformation will be necessary. However, for both situations you will have the same state of affairs, assumptions, and considerations of violations of these assumptions; therefore, these common elements are considered next.

State of Affairs

The state of affairs that must exist before you can consider the following situations is: You are interested in the value of a population Pearson product-moment correlation coefficient.

Assumptions

The statistical tests, which will be described for each situation, will be valid when:

The units are independent of one another; that is, the scores that one pair receives do not affect the scores that another pair receives. There may be a relationship between the units but none within the units.
The pairs of scores (X1, X2) follow a bivariate normal distribution in the population of interest.

Violation of the Assumptions

The importance of the preceding assumptions is as follows:

The assumption that the pairs are independent of one another is extremely important, because if it is violated, the level of significance (i.e., the probability of rejecting a true null hypothesis) can increase dramatically, e.g., from .05 to .40.
Violation of the assumption of bivariate normality could double (or halve) your actual level of significance. This appears to be especially true if: (a) the population correlation is not equal to zero, (b) you have a small sample size, and (c) you are using a one-tailed test. Therefore, when the assumption of bivariate normality appears to be violated, it seems advisable to consider a transformation, such as Box-Cox, for each of the marginal distributions. For studies of the effects of violations of the assumption of bivariate normality, see Norris and Hjelm (1961) and Carroll (1961).

TEST OF THE NULL HYPOTHESIS THAT THE POPULATION CORRELATION COEFFICIENT IS EQUAL TO A NONZERO CONSTANT, \(H_0: \rho = c_0\)

The Test Statistic

When the population Pearson product-moment correlation coefficient is not equal to zero the sampling distribution of the correlation coefficient is skewed. For example, if the population correlation is greater than zero (e.g., .80), then most of the sample values will be close to this positive correlation with only a very few found as being negative. This will lead to a negatively skewed sampling distribution of the correlations such as that shown in Figure 13j. When the correlation is negative, the opposite situation occurs. That is, when the population correlation is negative (e.g., -.80), most of the sample correlations will be negative and there will be very few positive correlations. This will lead to a positively skewed sampling distribution of the correlations such as that shown in Figure 13k.

Since there is a different sampling distribution for every correlation coefficient, and every sample size, a large number of tables containing critical values would be necessary to test hypotheses concerning nonzero correlations. Fortunately, R.A. Fisher ( 1921) introduced a transformation which when used on each sample correlation in a sampling distribution causes the resulting sampling distribution of transformed correlations to be normally distributed with standard deviation:

\[ \begin{equation} s_{z_F} = \frac{1} {\sqrt{n-3}} \tag{13-10} \end{equation} \]

That is, the standard error of the transformed sample correlations is simply a function of n, the number of pairs of units.

Fisher’s z-transformation is:

\[ \begin{equation} z_F = \frac{1}{2} \ln \left(\frac{1+|r|} {1-|r|} \right) \tag{13-11} \end{equation} \]

where, \(z_F\) is the resultant transformed correlation (F for Fisher); r is a sample correlation coefficient, and In is the natural log to the base e (i.e., loge or ln). The ln function is available through R’s Transform/create menu and on many calculators. However, you may easily transform a given correlation by using Table 13d. For example, a sample correlation of .450 is found in Table 13d as Fisher’s \(z_F\) of .485.

Table 13d Fisher’s z transformations

r	zF
0.000	0.00000
0.005	0.00500
0.010	0.01000
0.015	0.01500
0.020	0.02000
0.025	0.02501
0.030	0.03001
0.035	0.03501
0.040	0.04002
0.045	0.04503
0.050	0.05004
0.055	0.05506
0.060	0.06007
0.065	0.06509
0.070	0.07011
0.075	0.07514
0.080	0.08017
0.085	0.08521
0.090	0.09024
0.095	0.09529
0.100	0.10034
0.105	0.10539
0.110	0.11045
0.115	0.11551
0.120	0.12058
0.125	0.12566
0.130	0.13074
0.135	0.13583
0.140	0.14093
0.145	0.14603
0.150	0.15114
0.155	0.15626
0.160	0.16139
0.165	0.16652
0.170	0.17167
0.175	0.17682
0.180	0.18198
0.185	0.18715
0.190	0.19234
0.195	0.19753

r	zF
0.200	0.20273
0.205	0.20795
0.210	0.21317
0.215	0.21841
0.220	0.22366
0.225	0.22892
0.230	0.23419
0.235	0.23948
0.240	0.24477
0.245	0.25009
0.250	0.25541
0.255	0.26075
0.260	0.26611
0.265	0.27148
0.270	0.27686
0.275	0.28226
0.280	0.28768
0.285	0.29312
0.290	0.29857
0.295	0.30403
0.300	0.30952
0.305	0.31502
0.310	0.32055
0.315	0.32609
0.320	0.33165
0.325	0.33723
0.330	0.34283
0.335	0.34845
0.340	0.35409
0.345	0.35976
0.350	0.36544
0.355	0.37115
0.360	0.37689
0.365	0.38264
0.370	0.38842
0.375	0.39423
0.380	0.40006
0.385	0.40592
0.390	0.41180
0.395	0.41771

r	zF
0.400	0.42365
0.405	0.42962
0.410	0.43561
0.415	0.44164
0.420	0.44769
0.425	0.45378
0.430	0.45990
0.435	0.46605
0.440	0.47223
0.445	0.47845
0.450	0.48470
0.455	0.49099
0.460	0.49731
0.465	0.50367
0.470	0.51007
0.475	0.51651
0.480	0.52298
0.485	0.52950
0.490	0.53606
0.495	0.54266
0.500	0.54931
0.505	0.55600
0.510	0.56273
0.515	0.56951
0.520	0.57634
0.525	0.58322
0.530	0.59015
0.535	0.59712
0.540	0.60416
0.545	0.61124
0.550	0.61838
0.555	0.62558
0.560	0.63283
0.565	0.64015
0.570	0.64752
0.575	0.65496
0.580	0.66246
0.585	0.67003
0.590	0.67767
0.595	0.68537

r	zF
0.600	0.69315
0.605	0.70100
0.610	0.70892
0.615	0.71692
0.620	0.72501
0.625	0.73317
0.630	0.74142
0.635	0.74975
0.640	0.75817
0.645	0.76669
0.650	0.77530
0.655	0.78401
0.660	0.79281
0.665	0.80172
0.670	0.81074
0.675	0.81987
0.680	0.82911
0.685	0.83847
0.690	0.84796
0.695	0.85756
0.700	0.86730
0.705	0.87717
0.710	0.88718
0.715	0.89734
0.720	0.90764
0.725	0.91811
0.730	0.92873
0.735	0.93952
0.740	0.95048
0.745	0.96162
0.750	0.97296
0.755	0.98448
0.760	0.99622
0.765	1.00816
0.770	1.02033
0.775	1.03273
0.780	1.04537
0.785	1.05827
0.790	1.07143
0.795	1.08488

r	zF
0.800	1.0986
0.805	1.1127
0.810	1.1270
0.815	1.1417
0.820	1.1568
0.825	1.1723
0.830	1.1881
0.835	1.2044
0.840	1.2212
0.845	1.2384
0.850	1.2562
0.855	1.2745
0.860	1.2933
0.865	1.3129
0.870	1.3331
0.875	1.3540
0.880	1.3758
0.885	1.3984
0.890	1.4219
0.895	1.4465
0.900	1.4722
0.905	1.4992
0.910	1.5275
0.915	1.5574
0.920	1.5890
0.925	1.6226
0.930	1.6584
0.935	1.6967
0.940	1.7381
0.945	1.7828
0.950	1.8318
0.955	1.8857
0.960	1.9459
0.965	2.0139
0.970	2.0923
0.975	2.1847
0.980	2.2976
0.985	2.4427
0.990	2.6467
0.995	2.9945

Since \(z_F\) follows a normal distribution with standard deviation equal to \(1⁄\sqrt{n-3}\), critical values for \(z_F\) can be found using the standard normal distribution, that is, z scores from table A.1(b). Therefore, the test statistic is written as:

\[ \begin{equation} z = \frac{z_{F_1} - z_{F_2}}{s_{z_F}} \tag{13-12} \end{equation} \]

Here, \(z_{F_1}\) is Fisher’s \(z_F\) for the sample correlation coefficient, \(z_{F_0}\) is Fisher’s \(z_F\) for the constant correlation in the null hypothesis, and \(s_{z_F}\) is the standard deviation, defined in Equation (13-10), of the transformed correlation. This test statistic is illustrated in the following example.

EXAMPLE FOR A CONFIRMATORY ANALYSIS WHERE THE RESEARCH HYPOTHESIS AGREES WITH \(H_0: \rho = c_0\)

In this example we will consider the situation where Mary Barth would like to find out if the correlation between the intelligence scores of the U.N. delegates, based on test-retest scores, is significantly different from .95. Since this correlation is a measure of the reliability of the intelligence test, Mary hopes to find that the correlation is greater than or equal to .95. In this example we will use the data given in Table 13b. (Remember that when we previously considered the data in Table 13b our focus was on the mean difference, here we have a different scenario with a focus on the Pearson product-moment correlation coefficient.)

Problem.

The research problem that we will consider in this situation is: Is the reliability (test-retest) of the World Adult Intelligence Scale equal to .95 when used with the delegates to the United Nations (i.e., is \(\rho = .95\))?

Research Hypothesis:

The reliability (test-retest) of the World Adult Intelligence Scale is equal to .95 when used with the delegates to the United Nations (i.e., \(\rho = .95\)).

Statistical Hypotheses:

\[ \begin{align} H_0&: \rho = .95 \\ H_A&: \rho \ne .95 \end{align} \]

Determine valid and reliable measures of the dependent and Independent variables.

Level of Significance.

The probability of rejecting a true null hypothesis (the probability of making a Type I error) was set at .05.

Power.

Mary Barth decided she would like her power to be at least .85; that is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.

A priori effect size.

For the null hypothesis that a correlation is equal to a nonzero constant the effect size is measured as the absolute difference between Fisher’s transformation of the correlation that is important to detect, denoted by \(z_{F_A}\), and Fisher’s transformation of the correlation in the null hypothesis, \(z_{F_0}\). That is, the effect size, \(d_r\) (“r” for the correlation coefficient), is found as:

\[ \begin{equation} d_r = | z_{F_A} - z_{F_0}| \tag{13-13} \end{equation} \]

In considering the reliability of her World Adult Intelligence Scale, Mary Barth decided that the minimum reliability that would be acceptable to here was .85. That is, if the reliability of her World Adult Intelligence Scale was not at least as large as .85, then she would like to reject the null hypothesis. In this example differences that are larger than .95 where not of concern to Mary, so she based her a priori effect size on r = .85. (Remember that the minimally acceptable reliability, here of .85, is decided on based on past research, theory and the genius of the researcher.)

Figure 13j Sampling distribution of the Pearson Product-Moment Correlation Coefficient when \(\rho = 0\) and \(N = 20\)

Figure 13k1 Sampling distribution of the Pearson Product-Moment Correlation Coefficient when \(\rho = -.80\) and \(N = 20\)

Figure 13k2 Sampling distribution of the Pearson Product-Moment Correlation Coefficient when \(\rho = .80\) and \(N = 20\)

In this example Table 13d was used to transform the population correlation from .95 to 1.832 and the minimally acceptable population correlation from .85 to 1.256 (i.e., \(z_{F_0}\) = 1.832, and \(z_{F_A}\) = 1.256). Therefore, the a priori effect size was found as:

\[ \begin{equation} d_r = |11.832-1.2561| = 0.576 \end{equation} \]

NOTE: In considering a two-tailed test where a researcher is interested in detecting specific correlational values both above and below the population value, two different a priori effect sizes may be found. If this occurs you should select the smallest a priori effect size as the one that you would like to detect. This is because if you can detect a small effect size with high power, you will also be able to detect the larger effect size – see exercise 13.13.

In an exploratory analysis Cohen’s “small” \(d_r\) = .14, “medium” \(d_r\) = .42, and “large” \(d_r\) = . 71, effect sizes may be chosen. (See exercise 13.12.)

Sample Size.

Given the preceding two-tailed alternative hypothesis, level of significance, power and a priori effect size, sample size is found as:

\[ \begin{equation} n = \left[ \frac{z_{(1-α_P)} + z_{(1-β)}} {|z_{F_1} - z_{F_2} |)}\right]^2 + 3 \tag{13-14} \end{equation} \]

Here, \(z_{(1 – α_P)}\) and \(z_{(1 – β)}\) are defined as for Equation 13-7). Therefore, Mary found her sample size as:

\[ \begin{equation} n = \left[ \frac{1.96 + 1.04} {|0.576|)}\right]^2+3 = 30.13 \approx 31 \end{equation} \]

Based on this result, Mary sampled 32 subjects so that the power of her statistical test would be slightly more than .85.

In an exploratory analysis sample size would be found by substituting Cohen’s small, medium, and large effect sizes into equation (13-14). Since this process is straightforward, no sample size tables are provided for this case.

Critical Values.

The critical z values for the two-tailed test with \(\alpha = .05\) are found from the tables as \(z_{(.025)}\) = -1.96 and \(z_{(.975)}\) = +1.96.

Box 13.2
At this point the a priori parameters
of hypothesis testing have been 
established (i.e., your "bet" is made).

Randomly Select and Measure the Sample Units.

The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13b. Checks for outliers and assumptions, and computations of descriptive statistics are considered in exercise 13.2.

Calculate the Test Statistic.

In Figure 13l, the test statistic is found to be 1.94 using equation (13-12).

Make A Decision About the Null Hypothesis.

The decision was made to fail to reject the null hypothesis because the absolute value of the test statistic was less than the absolute value of the critical values (1.94 < 1.96). Using a probability calculator (e.g., JAMOVI’s distrACTION module), we find the two-tailed p-value to be .0524 and therefore larger than the .05 level of significance (and therefore, not statistically significant).

Construct a Confidence Interval and Calculate an Actual Effect Size.

The two-tailed 100(1 – α)% = 95% confidence interval may be found using the following two steps:

Use the following equation to calculate a two-tailed 100(1 – α)% confidence interval for the population correlation transformed to a Fisher’s \(z_F\) and labeled \(z_{F_P}\):

Equation 13-15 \[ \begin{equation} z_{F_1} - z_{(1-α⁄2)} \left(1/\sqrt{n-3}\right) < z_{F_P} < z_{F_1} + z_{(1-α⁄2)} \left(1/\sqrt{n-3}\right) \tag{13-15} \end{equation} \]

Here we have that the sample correlation of .97541 (see Figure 13l) has a Fisher’s \(z_F\), labeled \(z_{F_1}\), of 2.1931, \((1 – \alpha/2)\) = 1 – .05/2 = .975, \(z_{(.975)}\) = 1.96, and n = 32. Therefore, the 95% confidence interval is:

\[ 2.1931 - 1.96(0.1857) < z_{F_P} < 2.1931 + 1.96(0.1857) \] or

\(1.8291 < z_{F_P} < 2.5571\) or (1.829, 2.557).

Transform the results found in Step 1 from Fisher’s \(z_F\) into correlation. Here, we can use Table 13d to convert \(z_F\) back to r or a calculator can be used to find r using the equation:

\[ \begin{equation} |r| = \frac{e^{2z}-1} {e^{2z}+1} \tag{13-16} \end{equation} \]

where z is the value of Fisher’s \(z_F\) which is to be transformed to r. The exponential function “e” is available on many calculators either directly or as the inverse function of \(\ln\). On some calculators the function \(y^X\) can be used where \(y = e = 2.718281828\) and \(x = 2 * z\).

For our example the limits of the 95% confidence interval may be transformed back into correlational values using equation (13-16) as:

\[ |r| = \frac{e^{2(1.8291)}-1} {e^{2(1.8291)}+1} = \frac{38.7915-1} {38.7915+1} = .9497 \]

\[ |r| = \frac{e^{2(2.5571)}-1} {e^{2(2.5571)}+1} = \frac{166.3676-1} {166.3676+1} = .9881 \]

Therefore, the 95% confidence interval for the population correlation coefficient is \(.9497 < \rho < .9881\) or (.9497, .9881), which contains the hypothesized value of .95.

It should be noted that many statistical programs will now provide “bootstrapped” confidence intervals, which do not assume normality like these confidence intervals. Bootstrapping randomly draws many (e.g., 10,000) samples of the same size as your original sample from the cases in your sample, but always with replacement. That is, each case in your sample can be chosen multiple times, which allows for a very large number of possible bootstrapped samples to be drawn. Then, the samples drawn are used to create an empirical sampling distribution from which the 95% confidence interval is determined. This approach allows the sample you have to represent the population, including the shape of the distribution.

Confidence Limits And Table 13d

The confidence limits found using equation (13-16) can be checked by comparing them with those found using Table 13d Using interpolation, Table 13d yields a reasonably accurate confidence interval of: \(.945 < \rho < .990\).

Mary was confident that the true population correlation fell between .9497 and .9881, and therefore, that her World Adult Intelligence Scale was sufficiently reliable to meet her needs.

In correlation, the effect used is generally the correlation coefficient itself. However, some researchers prefer to report the r2 value because it has a more absolute interpretation (i.e., the amount of shared variation). Here, the effect would be reported as \(r = .975\) or \(r^2 = .951\).

TEST OF THE NULL HYPOTHESIS THAT THE POPULATION CORRELATION COEFFICIENT IS EQUAL TO ZERO, \(H_0: \rho = 0\)

To illustrate the situation where you will test the null hypothesis that the Pearson product-moment correlation is equal to zero, the data in Table 13c will be reconsidered, but the scenario will change from an interest in the mean difference to an interest in the correlation between the intelligence scores. That is, our interest will focus on the correlation of the intelligence scores between the paired delegates from the same delegation. The method of sampling is the same as that described for Situation 2 of the dependent t test, but no treatments will be considered to have been administered.

Mary Barth has randomly selected thirty-two delegations, and then randomly selected two delegates from each delegation. Mary expected that the correlation between the paired delegates’ scores would be positive, since the delegates came from the same delegation; since past research in this area was unavailable, however, she was unsure of what the correlation would be, and so she decided to use a two-tailed alternative hypothesis. (Although a one-tailed test might be more reasonable for this example, the two-tailed test is illustrated here because it is the one most commonly used in the research literature.)

Research Problem:

Is there a correlation between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations?

Research Hypothesis:

The correlation between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations will differ from zero.

Statistical Hypotheses:

\[ \begin{align} H_0&: \rho = 0 \\ H_A&: \rho \ne 0 \end{align} \]

Determine valid and reliable measures of the dependent and Independent variables.

The validity and reliability of the measures of the dependent and independent variables where discussed in the previous section. The level of significance was set at .05 and power was set at .85.

A priori effect size.

In exploratory studies, Cohen (1977) recommends that a priori effect sizes be considered in terms of the correlation coefficient. He indicates that a small effect size is considered as \(\rho = \pm .10\); a medium effect size as \(\rho = \pm .30\), and a large effect size as \(\rho = \pm .50\). In confirmatory studies, values of \(\rho\) would be determined based on past research, theory, or the genius of the researcher.

In Mary’s study, she felt that a large effect size was possible; therefore, she set \(\rho = \pm .50\). This meant that if the correlation differed from zero by as much as .50 or more, she would like to be able to detect this difference with high probability.

Sample Size.

In table C.3, for \(\alpha_2 = .05\), \(\text{power} = .85\), and \(\rho = .50\), Mary found she needed 32 pairs of subjects. Mary’s value of \(\rho = .50\) was found in table C.3; however, equation (13-15) may be used for values of p which are not in this table.

Figure 13l Test of the null hypothesis of correlation \(\rho = .95\) for data shown in Table 13b

  jmv::corrMatrix(
    data = data,
    vars = vars(IQ_Pretest, IQ_Posttest),
    flag = TRUE,
    n = TRUE,
    ci = TRUE,
    plots = TRUE,
    plotDens = TRUE,
    plotStats = TRUE)


 CORRELATION MATRIX

 Correlation Matrix                                           
 ──────────────────────────────────────────────────────────── 
                                  IQ_Pretest    IQ_Posttest   
 ──────────────────────────────────────────────────────────── 
   IQ_Pretest     Pearson's r              —                  
                  df                       —                  
                  p-value                  —                  
                  95% CI Upper             —                  
                  95% CI Lower             —                  
                  N                        —                  
                                                              
   IQ_Posttest    Pearson's r        0.97541              —   
                  df                      30              —   
                  p-value           < .00001              —   
                  95% CI Upper       0.98805              —   
                  95% CI Lower       0.94974              —   
                  N                       32              —   
 ──────────────────────────────────────────────────────────── 
   Note. * p < .05, ** p < .01, *** p < .001

Note that the p value (Sig. 2-tailed) reported is for the Null Hypothesis that \(\rho = 0\), not \(\rho = .95\) as we wish to test. However, we need the correlation in order to perform the text using Fisher z as described above, so we ignore the p value provided.

Test Statistic In the Sampling Distribution

Given Situation 2, the equation used to calculate the t statistic is:

\[ \begin{equation} t = \frac{r_{12}^2} {\sqrt{(1 - r_{12}^2)⁄(n-2)}} \tag{13-17} \end{equation} \]

When the null hypothesis is true, the sampling distribution of this statistic follows a t distribution with n-2 degrees of freedom, where n is the number of pairs of measurements.

Critical Values.

In table A2, the critical values of the t statistic for the two-tailed test with \(\alpha = .05\) and \(\text{df} = 30\) are \(t_{(.025,30)} = -2.042\) and \(t_{(.975,30)} = +2.042\).

Box 13.3
At this point the a priori parameters
of hypothesis testing have been 
established (i.e., your "bet" is made).

Randomly Select and Measure the Sample Units.

The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13c. (Note that the differences, shown in column 6, are not necessary for this problem.) Checks for outliers and of assumptions, and computations of descriptive statistics are considered in exercise 13.2.

Calculate the Test Statistic.

In Figure 13m, the statistical significance of the test statistic is shown to be p < .001, but R does not provide the t statistic in its default output. The t statistics can be found to be 8.895 or 8.90 using equation (13-17).

Figure 13m Output for the test of the null hypothesis that the population correlation is zero using the data shown in Table 13c

  jmv::corrMatrix(
    data = data,
    vars = vars(IQ, Match_IQ),
    flag = TRUE,
    n = TRUE,
    ci = TRUE,
    plots = TRUE,
    plotDens = TRUE,
    plotStats = TRUE)


 CORRELATION MATRIX

 Correlation Matrix                                   
 ──────────────────────────────────────────────────── 
                               IQ          Match_IQ   
 ──────────────────────────────────────────────────── 
   IQ          Pearson's r            —               
               df                     —               
               p-value                —               
               95% CI Upper           —               
               95% CI Lower           —               
               N                      —               
                                                      
   Match_IQ    Pearson's r      0.85152           —   
               df                    30           —   
               p-value         < .00001           —   
               95% CI Upper     0.92543           —   
               95% CI Lower     0.71517           —   
               N                     32           —   
 ──────────────────────────────────────────────────── 
   Note. * p < .05, ** p < .01, *** p < .001

Make A Decision About the Null Hypothesis.

The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values (8.90 > 2.042). This same decision was reached using output (Figure 13m), where the two-tailed p-value is less than .001 and therefore was less than the .05 level of significance.

Construct a Confidence Interval and Calculate an Actual Effect Size.

Here, since the sample correlation is usually not going to be exactly equal to zero, the confidence interval is established using Fisher’s \(z_F\) transformation just as before. Therefore, the 100(1 – α)% confidence interval is found using the following two steps. (Even if r = 0, the following steps will yield the correct confidence interval.)

Fill in the values for equation (13-15). The sample correlation of .85152 is transformed, using equation (13-11), to Fisher’s \(z_{F_1}\) of 1.2617, \(z_{(.975)} = 1.96\), and \(n = 32\). Therefore, the 95% confidence interval is:

\[ 1.2617 - 1.96(0.1857) < z_{F_P} < 1.2617 + 1.96(0.1857) \]

or 0.8977 < \(z_{F_P}\) < 1.6257 or (0.8977, 1.6257).

Transform the results found in step 1 from Fisher’s \(z_F\) into correlation. We can use Table 13d to convert \(z_F\) back to r, or a calculator can be used to find r using equation (13-16). Our example yields:

\[ |r| = \frac{e^{2(0.8977)}-1} {e^{2(0.8977)}+1} = \frac{6.0219-1} {6.0219+1} = .7152 \] \[ |r| = \frac{e^{2(1.6257)}-1} {e^{2(1.6257)}+1} = \frac{25.8782-1} {25.8782+1} = .9256 \]

Therefore, the 95% confidence interval for the population correlation coefficient is: \(.7152 < \rho < .9256\) or (.7152, .9256).

The latter confidence limits can be checked by comparing them with those found in Table 13d. Our \(z_F\) value of .8977 is close to the tabled value of .897, which would yield an r of .715; using equation (13-16) we found . 7152. Also, our z value of 1.6257 falls between the tabled \(z_F\) values of 1.623 with an r of .925 and 1.658 with an r of .930; our calculated value of r = .9256 falls between these two values. You can see that by only using Table 13d (not using Equation 13-16), a reasonably accurate confidence interval of \(.715 < \rho < .925\) would be found (we would choose .925 because it is the smaller, more conservative choice).

Mary was confident that the true population correlation fell between .7152 and .9256, and therefore, that a statistically significant nonzero correlation existed between the intelligence scores of delegates from the same delegations.

The correlation effect size that should be reported is r = .852, or \(r^2\) = .726.

REGRESSION

TEST OF THE NULL HYPOTHESIS THAT THE POPULATION REGRESSION SLOPE IS EQUAL TO ZERO, \(H_0: β = 0\)

Research Problem:

Is there a predictive relationship between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations?

Research Hypothesis:

The regression slope between the World Adult Intelligence Scale scores of delegates from the same delegation to the United Nations will differ from zero.

Statistical Hypotheses:

\[ \begin{align} H_0&: β = 0 \\ H_A&: β \ne 0 \end{align} \]

Note that the Greek letter β here is different from the standardized regression coefficient, Beta, provided by the output in some statistical programs. This Greek β in the hypotheses represents the population regression coefficients.

Determine valid and reliable measures of the dependent and Independent variables.

The validity and reliability of the measures of the dependent and independent variables where discussed in the previous section. The level of significance was set at .05 and power was set at .85.

A priori effect size.

Effect size in bivariate regression (i.e., one predictor and one dependent variable) is probably best thought of in terms of correlation. Bivariate regression and bivariate correlation are essentially the same analyses with a slightly different focus. Recall that the standardized regression slope, Beta, is equal to the correlation between the variables. Therefore, for a bivariate regression, effect size can be determined in the same way as it was for correlation (this will not be true for multiple regression, but is appropriate when there is just one predictor).

Sample Size.

There will be other ways to determine sample size when there are multiple predictors, but with just one predictor, we can use the same approach as we did with correlation. In table C.3, for \(\alpha_2 = .05\), \(\text{power} = .85\), and \(\rho = .50\), Mary found she needed 32 pairs of subjects. Mary’s value of \(\rho = .50\) was found in table C.3; however, equation (13-15) may be used for values of p which are not in this table.

Table 13e1 N to detect r by t test (2-tailed \(\alpha = .01\), 1-tailed \(\alpha = .005\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
0.5	662	165	72	40	25	17	12	9
0.6	798	198	87	48	30	20	14	10
0.65	874	216	95	52	32	21	15	11
0.7	958	237	103	57	35	23	16	11
0.75	1052	260	113	62	38	25	17	12
0.8	1163	287	125	68	42	27	19	13
0.85	1299	320	139	76	46	30	21	14
0.9	1481	365	158	86	52	34	23	16
0.95	1772	436	189	102	62	40	27	18
0.99	2390	588	254	137	83	53	35	23

Table 13e2 N to detect r by t test (2-tailed \(\alpha = .02\), 1-tailed \(\alpha = .01\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
0.5	540	135	59	33	21	14	10	8
0.6	664	165	72	40	25	17	12	9
0.65	733	182	80	44	27	18	13	9
0.7	810	201	88	48	30	20	14	10
0.75	897	222	97	53	33	22	15	11
0.8	1000	247	108	59	36	24	16	11
0.85	1126	278	121	66	40	26	18	12
0.9	1296	319	139	75	46	30	20	14
0.95	1569	386	167	91	55	36	24	16
0.99	2153	529	229	123	75	48	32	21

Table 13e3 N to detect r by t test (2-tailed \(\alpha = .05\), 1-tailed \(\alpha = .025\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
0.5	384	96	43	24	16	11	8	6
0.6	489	122	54	30	19	13	9	7
0.65	549	136	60	33	21	14	10	8
0.7	616	153	67	37	23	16	11	8
0.75	692	171	75	41	26	17	12	9
0.8	782	194	85	46	29	19	13	9
0.85	894	221	96	53	32	21	15	10
0.9	1046	258	112	61	38	25	17	12
0.95	1293	319	138	75	46	30	20	14
0.99	1828	450	194	105	64	41	27	18

Table 13e4 N to detect r by t test (2-tailed \(\alpha = .10\), 1-tailed \(\alpha = .05\))

	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8
0.5	271	68	31	18	12	8	6	5
0.6	360	90	40	23	15	10	8	6
0.65	412	103	46	26	16	11	8	6
0.7	470	117	52	29	18	12	9	7
0.75	537	133	59	33	20	14	10	7
0.8	617	153	67	37	23	16	11	8
0.85	717	177	78	43	26	18	12	9
0.9	853	211	92	50	31	21	14	10
0.95	1077	266	115	63	38	25	17	12
0.99	1569	386	167	90	55	35	24	16

Test Statistic In the Sampling Distribution

In bivariate regression, the equation used to calculate the t statistic is:

\[ \begin{equation} t = \frac{b - c_0} {s_b} \tag{13-18} \end{equation} \] where b is the regression coefficient that we are testing against some value, \(c_0\), which is almost always zero (and therefore the t statistic usually simplifies to \(t = b/sb\)). The value \(s_b\) is the standard error for the regression coefficient. We will obtain sb from output, but it can be calculated as a function of the standard error of the estimate.

When the null hypothesis is true in bivariate regression, the sampling distribution of this statistic follows a t distribution with n – 2 degrees of freedom, where n is the number of pairs of measurements. You may recall that n – 2 is the degrees of freedom for the residual in the regression model.

Critical Values.

In table A2, the critical values of the t statistic for the two-tailed test with alpha = .05 and degrees of freedom= 30 are \(t_{(.025,30)} = -2.042\) and \(t_{(.975,30)} = +2.042\).

Box 13.4
At this point the a priori parameters
of hypothesis testing have been 
established (i.e., your "bet" is made).

Randomly Select and Measure the Sample Units.

The measurements of the 32 randomly selected pairs of United Nations delegates are shown in Table 13c. (Note that the differences are not necessary for this problem.) Checks for outliers and of assumptions, and computations of descriptive statistics are considered in exercises.

Calculate the Test Statistic.

In Figure 13m, the test statistic is found to be 8.895 or 8.90 using statistics from the regression output. The same result is found using equation (13-17).

Figure 13n Output for bivariate regression for data from Table 13c

jmv::linReg(
    data = data,
    dep = Match_IQ,
    covs = IQ,
    blocks = list(
        list(
            "IQ")),
    refLevels = list(),
    r2Adj = TRUE,
    rmse = TRUE,
    modelTest = TRUE,
    anova = TRUE,
    ci = TRUE,
    stdEst = TRUE)


 LINEAR REGRESSION

 Model Fit Measures                                                                           
 ──────────────────────────────────────────────────────────────────────────────────────────── 
   Model    R          R²         Adjusted R²    RMSE      F         df1    df2    p          
 ──────────────────────────────────────────────────────────────────────────────────────────── 
       1    0.85152    0.72508        0.71592    9.4191    79.124      1     30    < .00001   
 ──────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Models estimated using sample size of N=32


 MODEL SPECIFIC RESULTS

 MODEL 1

 Omnibus ANOVA Test                                                         
 ────────────────────────────────────────────────────────────────────────── 
                Sum of Squares    df    Mean Square    F         p          
 ────────────────────────────────────────────────────────────────────────── 
   IQ                   7487.8     1       7487.837    79.124    < .00001   
   Residuals            2839.0    30         94.635                         
 ────────────────────────────────────────────────────────────────────────── 
   Note. Type 3 sum of squares


 Model Coefficients - Match_IQ                                                                        
 ──────────────────────────────────────────────────────────────────────────────────────────────────── 
   Predictor    Estimate    SE         Lower       Upper      t         p           Stand. Estimate   
 ──────────────────────────────────────────────────────────────────────────────────────────────────── 
   Intercept    13.29664    9.87704    -6.87497    33.4683    1.3462     0.18832                      
   IQ            0.94007    0.10568     0.72424     1.1559    8.8951    < .00001            0.85152   
 ────────────────────────────────────────────────────────────────────────────────────────────────────

Make A Decision About the Null Hypothesis.

The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values (8.90 > 2.042). This same decision was reached using p values, where the two-tailed p-value was less than .001 and therefore was less than the .05 level of significance.

Construct a Confidence Interval.

The 100(1 – α)% confidence interval is found using the following steps. The confidence interval for all regression coefficients can be calculated this way.

\[ \begin{equation} b - t_{(1-α⁄2)}(s_b) < β < b + t_{(1-α⁄2)}(s_b) \tag{13-19} \end{equation} \]

where b is the sample regression coefficient (i.e., estimate of the population coefficient) and \(s_b\) is the standard error of the sample coefficient estimate.

We can find the regression coefficients from Figure 13n above:

\[ \begin{align} b_0 &= 13.297 \\ b_1 &= 0.940 \end{align} \]

where \(b_0\) is the y-intercept and \(b_1\) is the slope for the predictor IQ. Recall that the slope tells us how much the Match IQ variable tends to change due to its relationship with the IQ variable. Because the regression coefficients can be larger than 1.0 we always include the leading 0 before the decimals.

We can also find the standard errors for the regression coefficients from Figure 13n above:

\[ \begin{align} s_{b_0} &= 9.877 \\ s_{b_1} &= 0.106 \end{align} \]

We fill these values in for the equation above. We will only perform this calculation for the slope, because rarely is it of interest whether the y-intercept equals zero, and therefore we rarely interpret the significance of the y-intercept.

\[ 0.940 - 2.042(0.106) < β < 0.940+2.042(0.106) \]

or (as shown by the output) 0.724 < β < 1.156 or (0.724, 1.156).

Mary was confident that the true population slope fell between 0.724 and 1.156, and therefore, that a statistically significant nonzero predictive relationship existed between the intelligence scores of delegates from the same delegations (specifically, using the UN delegate to predict their staff member’s IQ score).

Generally, in regression, the standardized regression coefficient, Beta, is reported as the effect size. Therefore, here, the effect reported would be Beta = .852.

SUMMARY

We began this chapter with a discussion of the bivariate normal distribution. This distribution was discussed first because in this chapter, situations where the data consisted of related pairs of scores were to be presented. In these situations it would be assumed that the related pairs of scores were sampled from a bivariate normal distribution. Therefore, our discussion of the bivariate normal distribution included a presentation of how you might check on this assumption by viewing the densities of contours in a scatterplot, and by checking to see that the marginal distributions of the measurements were normally distributed.

Following our discussion of the bivariate normal distribution, we considered what is referred to as the dependent t test. This test is used to test the hypothesis that the difference between the means of related scores is equal to a constant. We found that the dependent t test is used in two situations. The first situation occurs when a group of units is randomly sampled and the same unit is measured twice, either with the same or with commensurate instruments. The process of hypothesis testing used in this situation is referred to as a repeated measures design. The second situation occurs when a random sample of n pairs of units are drawn and measured. The process of hypothesis testing used in this situation is referred to as a randomized blocks design.

Our discussion of the dependent t test was followed by a presentation of two situations where we considered the null hypothesis that the population Pearson product-moment correlation coefficient was equal to a constant. We first considered the situation where this constant was not equal to zero. In this situation, we found that the sampling distribution of the correlation coefficient was positively skewed when the population correlation was negative, and negatively skewed when the population correlation was positive. Therefore, in this situation, Fisher’s z-transformation was used to transform the correlational values to scores that follow a normal distribution. This transformation enabled us to use the z statistic and critical values from the standard normal distribution to test the null hypothesis. We then considered the second situation, where the constant in the null hypothesis is zero. In this situation we found that the t statistic may be used to test the null hypothesis.

Statistical Procedures

Test the hypothesis that a mean difference equals zero for a set of related scores
Test the hypothesis that a mean difference equals a nonzero constant for a set of related scores
Test the hypothesis that a population correlation coefficient is equal to zero

Chapter 13 Appendix A Study Guide for Paired Samples tests

SECTION 1: Paired-Samples t test

Analyses to Run

• Use a SCALE variable Y • Use a SCALE variable X • COMPUTE a DIFFERENCE SCORE as Y – X (e.g., DIFF = Y – X) • Run descriptive statistics for the DIFFERENCE SCORE • Run a paired-samples t test with X and Y as the paired variables • Run a one-sample t test using the computed DIFFERENCE SCORE as the variable (Test Value = 0) • Run nonparametric test for related samples for the paired variables X and Y (e.g., Wilcoxon Signed Ranks test) • Run an error bar plot for the separate variables X and Y

Using the output, respond to the following items

Discuss the normality of the DIFFERENCE SCORES and report whether you conclude that the DIFFERENCE between Y and X is normally distributed in the population (if not normal, describe the distribution)
Report whether the assumption of normality of the DIFFERENCE SCORES between Y and X is tenable (i.e., has been met, is defensible, is reasonable). Provide your evidence.
How many cases or pairs of cases were included in this analysis?

Using the PAIRED-SAMPLES T TEST output, respond to the following items

Provide the most appropriate research question for this analysis
Provide the statistical null hypothesis using both words and appropriate symbols.
What is the paired mean difference between X and Y?
Which mean was higher in the sample, X or Y?
What was the standard deviation for the paired mean difference between X and Y?
What is the variance of the paired difference scores?
What was the standard error for the paired mean difference between X and Y?
What are the degrees of freedom and the critical value for this paired-samples t test?
Show or explain how the t statistic for paired mean differences between X and Y is calculated.
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use a confidence interval for the paired mean difference as evidence to explain your answer
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use the calculated t statistic compared to a t critical value as evidence to explain your answer (include both the degrees of freedom and the critical value used for this test)
Assuming that these data represent a random sample from some population, on which variable (Y or X) would cases be expected to score higher in the population? That is, using a two-tailed level of significance of α = .05, was there a statistically significant difference between the means of X and Y? Use a Sig. or p value to explain your answer
Explain what the specific p (Sig.) value you obtained in your results means as a probability (i.e., do not talk about whether it is statistically significant, but rather what probability the p represents).
Would this paired t test be statistically significant as a one-tailed (i.e., directional) test? Explain and provide evidence.
Is there a statistically significant positive correlation between the two variables in the paired t test?
Does it matter?
Which mean would you estimate to be higher in the population, X or Y?
Show or explain how to calculate the standardized paired mean difference effect size (i.e., Cohen’s d) between X and Y is calculated. That is, calculate Cohen’s d for this paired-samples t test.

Using ALL the output in this section above, respond to the following item

Interpret the results for the paired-samples t test in an APA-style report to answer the research question and to describe in detail the mean difference between Y and X. Whether statistically significant or not, refer to descriptive statistics, (e.g., means, standard deviations, mean differences, effect sizes, and/or confidence intervals), graphs, inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between variables/measures. Be sure to discuss assumptions and outliers and their potential impact.

Using the ONE-SAMPLE T TEST output, respond to the following items

What would the statistical significance p value be if we were to run this analysis as a one-sample t test of the mean paired differences using ZERO as the test value?
What is the difference between doing this analysis as a PAIRED-SAMPLES T TEST versus doing it as a ONE-SAMPLE T TEST using difference scores as the variable? Provide statistics as evidence of the similarities.

Using the NONPARAMETRIC PAIRED-SAMPLES WILCOXON SIGNED-RANK TEST output, respond to the following items

How many cases in the sample had higher X scores than Y scores? Provide specific evidence.
How many cases in the sample had lower X scores than Y scores? Provide specific evidence.
How many cases in the sample had the same X and Y scores? Provide specific evidence.
Approximately what is the two-tailed probability of obtaining the resulting z statistic (as an absolute value) if the null hypothesis is true?
What do you conclude based on these results?
Interpret the results for the Wilcoxon Signed-Rank test in an APA-style report to answer the research question and to describe in detail the mean difference between Y and X. Whether statistically significant or not, refer to descriptive statistics, (e.g., mean ranks but also report means, standard deviations, mean differences, effect sizes, and/or confidence intervals), graphs, inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between variables/measures. Be sure to discuss assumptions and outliers and their potential impact.

SECTION 2: Bivariate Correlation (Inferential)

Analyses to Run

• Use the same analyses you ran in Chapter 7: Bivariate Correlation (Descriptive)

Using the output, respond to the following INFERENTIAL items

Provide the most appropriate research question for the correlation between Y and X
Provide the statistical null hypothesis using BOTH words AND appropriate symbols for the correlation between Y and X.
Report and interpret the statistical significance of the correlation between Y and X.
How much variation is shared by two variables, Y and X? Recall that this may be called the coefficient of determination. Is it statistically significant?
What would the slope be for the standardized regression model in which X predicts Y? Provide evidence from the Correlations output.

Using ALL the output in this section above (as well as Section 11 as needed), respond to the following item

Interpret the results for the correlation between Y and X, in an APA-style report to answer the research question. Whether statistically significant or not, report descriptive statistics, (e.g., Pearson correlation), inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the relationship, and any concern about assumptions or outliers (don’t just say positive or negative or small, medium, or large—explain how they co-vary).

Skip any of the next eight items if it is not possible to answer them (e.g., if there are no negative correlations). Do NOT include any correlations of a variable with itself (that always have r = 1.0).

Is the strongest positive Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the strongest negative Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the weakest positive Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the weakest negative Pearson correlation statistically significantly different from ZERO? As evidence, report r, df, and p.
Is the strongest positive Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the strongest negative Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the weakest positive Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.
Is the weakest negative Spearman correlation statistically significantly different from ZERO? As evidence, report rho, df, and p.

SECTION 3: Bivariate (one-predictor) Linear Regression (Inferential)

• Use the same analyses you ran in Chapter 8: Bivariate (one-predictor) Linear Regression (Descriptive)

Using the output, respond to the following INFERENTIAL items

Provide the most appropriate research question for the analysis
What is the Statistical Null Hypothesis for the regression model (i.e., correlation and/or shared variance)?
Is the regression model statistically significant?
Is the amount of explained variation considered statistically significantly different from ZERO?
Based on your statistical significance decision in the previous items, what type of error might you have made? Why?
Can we conclude that there is a relationship between Y and X in the population?
Can we conclude that the value of X causes the change in Y?
Can we conclude that a case’s Y score in the population can be predicted by using the value of the case’s X score?
Show or explain how the F statistic is calculated
Show or explain how all 3 degrees of freedom in the ANOVA table are calculated
Show or explain how to test the statistical significance of the F statistic as compared to the F critical value
Report and interpret the p value associated with the F statistic
Calculate or report and interpret the effect size for this analysis.
Is the assumption of homoscedasticity met for this regression? Show and describe your evidence.
Is the assumption of normally distributed residuals met for this regression? Show/describe your evidence.
Is the assumption of linearity met for this regression? Show and describe your evidence.
What is the Statistical Null Hypothesis for the regression slope?
Show or explain how the t statistic is calculated for the slope.
Is the slope statistically significant?
Show or explain how to test the statistical significance of the t statistic as compared to the t critical value (report the degrees of freedom and recall that the df used for the critical t value is based on the Residual df from the ANOVA table)
Report and interpret the p value associated with the t statistic for the slope
Show you to use the Confidence Interval for the regression coefficient to decide if the slope is statistically significantly different from 0

Using ALL the output in this section above (as well as Section 12 as needed), respond to the following item

Interpret the results for the bivariate regression in an APA-style report to answer the research question and to describe in detail the predictive relationship between Y and X. Whether statistically significant or not, use descriptive statistics, (e.g., Pearson correlation, shared variation, R2), inferential statistics, degrees of freedom, assumptions, outliers, and statistical significance to describe the size and direction of the relationship.

SECTION 4: Basic Statistical and Hypothesis Testing Concepts and Calculations

Respond to the following items about null hypothesis testing

FOR PAIRED T TESTS: If the true paired mean difference (i.e., reality) between the Y and X scores is exactly equal to ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain.
FOR PAIRED T TESTS: If the true paired mean difference (i.e., reality) between the Y and X scores is larger than ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain. FOR PAIRED T TESTS: Use the following options to respond to the four items below:

By rejecting the Null Hypothesis you would make a Type I Error
By failing to reject the Null Hypothesis you would make a Type I Error
By rejecting the Null Hypothesis you would make a Type II Error
By failing to reject the Null Hypothesis you would make a Type II Error
By rejecting the Null Hypothesis you would make NO error
By failing to reject the Null Hypothesis you would make NO error

If you reject the statistical null hypothesis but the true mean paired difference between the variables is ZERO in the population, then which option above is true?
If you fail to reject the statistical null hypothesis but the true mean paired difference between the variables is ZERO in the population, then which option above is true?
If you reject the statistical null hypothesis but the true mean paired difference between the variables is NOT ZERO in the population, then which option above is true?
If you fail to reject the statistical null hypothesis but the true mean paired difference between the variables is NOT ZERO in the population, then which option above is true?

Discuss why each of the following is or is not possible in Null Hypothesis Significance Testing:

Can you disprove a null hypothesis that a program does work?
Can you prove a null hypothesis that a program does work?
Can you disprove a null hypothesis that a program does not work?
Can you prove a null hypothesis that a program does not work?

Explain the Research Design and Statistical Analysis terms below BRIEFLY but SUFFICIENTLY and IN YOUR OWN WORDS (don’t just give another name for them). Some may require finding additional readings. If you use resources, paraphrase in your own words AND provide a citation of the resource you used (including page numbers).

“p value” (interpret the p value in your own words, that is, the p value is the probability of what?)
Steps in null hypothesis significance testing

Respond to the following items about null hypothesis testing

Why we do not like to use the word “prove” in research
Why we use standardized effect sizes
Why we test assumptions in statistical analyses
What impact violations of assumptions might have on statistical analyses (especially results)
What impact outliers might have on statistical analyses (especially results)
What evidence can you provide for statistical conclusion validity?
What it means if researchers say they rejected the null hypothesis at the .05 level?
What is a reasonable argument for using a two-tailed test even if there is a clear basis for predicting a result in a given direction?

Citation

Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/

This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for 
most semesters 1987-1991 and again 2018-2025.