INTRODUCTION

In chapters 14 and 15 we will study an analysis known as one- way fixed-effects analysis of variance. The term analysis of variance is denoted by the acronyms ANOVA or AOV. The purpose of this analysis is to determine if there are differences among population treatment groups and, if so, which treatment groups differ. In chapter 4 we began our consideration of this design, in its simplest two group form, when we examined the mean differences between a group of women who were taking herbal supplement pills and a group of women who were not taking herbal supplement pills. We will consider these same two groups of women again in this chapter, but we will add a third treatment group because the study of three treatment groups will allow us to more easily generalize to any number of groups.

We will begin this chapter by considering the Fisher’s F statistic which is used in analysis of variance to help determine if there are significant differences among the treatment groups. Then, using a small data set, we will study the calculations used in a one-way ANOVA in order to help you gain insight into why and how the analysis of variance works. You will find that the calculations for an ANOVA are not difficult, but are tedious and time consuming. Fortunately, R will easily perform these calculations, and the step-by-step procedure for using R in an ANOVA is given in this chapter. We will conclude this chapter by considering the meaning of the term fixed effects, and by examining the use of the t statistic in a one-way design with two treatment levels.

Your R objectives for this chapter will be to perform a one-way fixed-effects analysis of variance in order to compare the means of units from different treatment populations. You will also learn to use an independent t test when the data are from two groups and are assumed to have equal variances or unequal variances.

THE SAMPLING DISTRIBUTION OF THE F STATISTIC

Introduction: The Fisher’s F Statistic and the χ2 Statistic

In this chapter we will make use of a statistic which was first studied by R. A Fisher (1924) and is generally referred to as Fisher’s F statistic. Like the \(\chi^2\) statistic, the F distribution is dependent upon its degrees of freedom. The Fisher’s F statistic is related to the \(\chi^2\) statistic in that the Fisher’s F statistic may be written as the ratio of two independent \(\chi^2\) statistics. That is, the Fisher’s F statistic is defined by the following ratio (and indeed, is sometimes referred to as the “F ratio”):

\[ \begin{equation} F = \frac{\chi_{(v_1)}^2} {\chi_{(v_2)}^2} \tag{14-1} \end{equation} \]

where \(\chi_{(v_1)}^2\) is a chi-square statistic with \(v_1\) degrees of freedom and _{(v_2)}^2 is a chi-square statistic with \(v_2\) degrees of freedom. Since the Fisher’s F statistic can be defined as the ratio as two independent chi-square statistics, the F distribution is dependent upon two degrees of freedom, \(v_1\) and \(v_2\). This can also be seen when you consider the probability density function for the Fisher’s F statistic, which has as unknowns \(v_1\), \(v_2\), and x. Once you have the data, x, then only \(v_1\) and \(v_2\) remain as unknowns. In Figure 14a, three F distributions are displayed, one with 2 and 10 degrees of freedom, one with 3 and 10 degrees of freedom, and one with 4 and 30 degrees of freedom.

Note that the \(\chi^2\) itself can be thought of as the sum of squares for normally distributed variables. That is, if we set

\[ \begin{equation} Y = \sum_{i=1}^{k} z_i^2 = \sum \left( \frac{ X-\mu} {\sigma} \right)^2 \tag{14-2} \end{equation} \]

where zi is any standard normal variable and k is the total number of these standard normal variables, then Y is distributed as a χ2 with k degrees of freedom.

Critical Values: A Table of F Values

In appendix A, table A5 contains the upper percentile critical values of the F distribution. Each page of table A5 consists of the F values at a given percentile from F distributions with different pairs of degrees of freedom. For example, the F value at the 90th percentile in the F distribution having \(v_1 = 2\) and \(v_2 = 10\) degrees of freedom is found on the first page of table A5 as 2.92, and the F value at the 95th percentile in this same F distribution is found on the second page of table A5 as 4.10. We will use the notation \(F_{(1 – \alpha, v_1, v_2)}\) to denote the tabled F values. Therefore, the tabled values for these two examples would be written as: \(F_{(.90, 2, 10)} = 2.92\) and \(F_{(.95, 2, 10)} = 4.10\), for \(\alpha = .10\) and \(\alpha = .05\), respectively.

ONE-WAY ANALYSIS OF VARIANCE (ANOVA): A VIEW OF ITS FOUNDATIONS

The General ANOVA Paradigm.

In this section we will discuss a statistical analysis where the Fisher’s F statistic is used to discern differences among the population means of two or more groups. A general paradigm for this so-called one-way design is illustrated in Figure 14b(i) and the specific design that we will consider in the examples for this chapter is shown in Figure 14b(ii). The term one-way is used in the name of this design and in its analysis to indicate that there is one independent variable. Therefore, if there were two independent variables, we would be considering a two-way design.

The Example for this Chapter.

In Figure 14b(ii), you can see that in this chapter we will be considering an example similar to the pill consumption study that we discussed in chapter 4. Here, however, one additional treatment level has been added so that we will consider three treatments which represent the effects of different types of herbal supplement pill consumption. In the first treatment women are asked to take a herbal supplement pill made by the Gamma Nutrition Company, in the second treatment women are asked to take a herbal supplement pill made by the Delta Nutrition Company, and in the third treatment women are asked to take no herbal supplement pills. Note that researchers sometimes use true control treatments (i.e., no treatment t all) but sometimes use comparison treatments (e.g., a placebo treatment or a traditional treatment). When traits are used to define groups instead of treatments, there is often a reference group identified to which all other groups are compared.

These women were initially sampled at random from a population of women in a large Midwestern city and then randomly placed into one of the three treatments. Since this is a fabricated study we have that all of the women in the random sample agreed to participate in the study, and all of the women routinely took their pills over a one year period. At the end of one year the cholesterol levels of these women were measured and these measurements yielded the data for the examples of this chapter.

The Original Population Versus the Treatment Populations

In what follows the analysis of variance (ANOVA) will be presented so as to give you an understanding of how and why it works. During this presentation we will consider the cholesterol experiment where only nine units have been randomly sampled from a population and then randomly placed into one of the three treatments.

During the presentation it will be important to distinguish between the original population from which the units were sampled and the potential treatment populations. When there are no treatment effects, the units in the original population and treatment populations will not differ. That is, since the treatments have not changed the units that were randomly placed into them, the units in the treatments will not differ from the units in the original population. However, if the treatments change their units (i.e., there are treatment effects), then the units in the treatments are considered to represent the population of units which has received that treatment (even though such a population does not typically actually exist). During the following presentation we will emphasize these important logical points and illustrate the computations involved. Note that sometimes we are interested in comparing different populations on traits that have not been manipulated by the researcher (e.g., college degree versus high school diploma). In such cases, the samples do represent existing populations, not a hypothetical population that has been treated.

Figure 14a Graphs of three different F distributions

Figure 14b(i) General paradigm for a one-way analysis of variance (ANOVA) design

Figure 14b(ii) Specific data for a one-way analysis of variance (ANOVA) model considered in this chapter

x Treatment_1 Treatment_2 Treatment_3
Gamma Brand Delta Brand None
Y11 = 179 Y12 = 250 Y13 = 221
Y21 = 204 Y22 = 220 Y23 = 246
Y31 = 184 Y32 = 235 Y33 = 241
Treatment (Marginal) Means M1 = 189 M2 = 235 M3 = 236

Note: A common notation used in ANOVA cell designs is \(Y_{I J}\) where I stands for the individual cases within treatment J, so \(Y_{32}\) (or Y32 in the figure above) is the third case in the second treatment (i.e., Delta).

The Null And Alternate Statistical Hypotheses

General case.

In performing an analysis of variance (ANOV A) on data like those shown in Figures 14b(i) and 14b(ii), the null hypothesis states that there are no differences among the population means of the treatments, that is, all of the treatment population means are equal, and the alternate hypothesis states that at least one of the treatment population means differs from the others. For the general case of J treatment levels the statistical null hypothesis can be written as:

\[ \begin{align} H_0&: \mu_1 = \mu_2 = \text{ ... } = \mu_J \\ H_0&: \mu_J = \mu_K &&\text{ for all j & k where } j \ne k \\ H_0&: \mu_J - \mu_K = 0 &&\text{ for all j & k where } j \ne k \end{align} \]

The second and third forms of the statistical null hypothesis is often used when the number of treatment groups is larger. The statistical alternative hypothesis can be written as:

\[ \begin{align} H_A&: \mu_J \ne \mu_K &&\text{ for some/any j & k where } j \ne k \\ H_A&: \mu_J - \mu_K \ne 0 &&\text{ for some/any j & k where } j \ne k \end{align} \] Here the alternate hypothesis is a way of denoting that at least one of the treatment population means differs from the others. Some authors prefer to say “some” and some prefer “any” for the alternative hypothesis.

Specific case: J = 3.

The statistical hypotheses for the example data shown in Figure 14b(ii) can be written as:

\[ \begin{align} H_0&: \mu_1 = \mu_2 = \mu_3 \\ H_A&: \mu_J \ne \mu_K &&\text{ for any j ≠ k and j & k = 1,2,3} \end{align} \]

Note that the alternative hypothesis that you cannot use is \(H_A: \mu_1 \ne \mu_2 \ne \mu_3\). This incorrect form of the alternative hypothesis implies inequalities not required, namely that it requires \(\mu_1\) and \(\mu_2\) must be unequal, that \(\mu_2\) and \(\mu_3\) must be unequal, but says nothing about \(\mu_1\) and \(\mu_3\). The correct general alternative hypothesis does not require equality or inequality of specific means. However, this may be possible for confirmatory hypotheses where the researcher makes specific predictions about the means (but confirmatory hypotheses typically also requires some indication of which means are larger than which other means).

Values Of The Fisher’s F Ratio

In this section we will see that when the null hypothesis is true, the Fisher’s F statistic used to test it consists of the ratio of two independent estimates of the original population variance, that is,

\[ \begin{equation} F = \frac{s_{e_1}^2} {s_{e_2}^2} \tag{14-3} \end{equation} \]

The subscript e stands for error; the subscripts 1 and 2 indicate that the error variances are estimated from two different sources. We will discuss these estimates further in what follows.

From equation (14-3), you can see that if \(s_{e_1}^2\) and \(s_{e_2}^2\) are equal, the estimate of the Fisher’s F statistic will equal 1.00. However, since the estimates of the variance provided in the numerator and the denominator of this ratio come from independent sample sources, they will usually differ somewhat. That is, in practice, when the null hypothesis is true, it is not unusual to find an Fisher’s F ratio which is only close to 1.00 (e.g., values such as .80, .90, 1.10, and 1.20 are common when the null hypothesis is true).

When the null hypothesis is false, that is, when at least one population treatment mean differs from the others, the Fisher’s F ratio may be written as:

\[ \begin{equation} F = \frac{s_{e_1}^2 + \text{a function of squared treatment effects}} {s_{e_2}^2} \tag{14-4} \end{equation} \]

Here, since the numerator is a function of squared treatment effects, it will be larger than the denominator. Therefore, you can see that when the null hypothesis is false, the Fisher’s F ratio will be greater than one. For example, given a .05 level of significance and more than six units per treatment, Fisher’s F ratios that are larger than 5.00 will lead to rejection of the null hypothesis. (How the Fisher’s F value of 5.00 is found is explained in the following sections.)

Note that when the null hypothesis is true, the treatment effects are zero and therefore the squared treatment effects are also zero. Therefore, the formulas for Fisher’s F are actually equivalent when the null hypothesis is true.

The Fisher’s F Statistic In ANOVA: No Treatment Effects

We will begin our consideration of the development of the Fisher’s F ratio as a test statistic in ANOVA by first introducing the following set of cholesterol scores. In this example, the scores are from populations with no differences among or within the units. “No differences among” implies that there are group mean differences and “no differences within” implies that all units within each treatment group have exactly the same score.

x Treatment_1 Treatment_2 Treatment_3
Gamma Brand Delta Brand None
Y11 = 220 Y12 = 220 Y13 = 220
Y21 = 220 Y22 = 220 Y23 = 220
Y31 = 220 Y32 = 220 Y33 = 220
Treatment (Marginal) Means M1 = 220 M2 = 220 M3 = 220

These scores represent a sample of cholesterol scores from a population (the original population) of women in a large city. The women were randomly sampled from the population and then randomly placed into the treatments. These data were taken from a population of women who all have the same cholesterol levels. Of course, this is a hypothetical population because we would expect women’s cholesterol scores to differ. But what would cause these differences? You may partially answer this question by considering some of the factors that would cause cholesterol scores to differ. For example, women have different eating habits, and different inherited characteristics, such as bone structure and body fat. Note that including men in the sample would add even more potential reasons for individual differences and therefore the variability of scores. Limiting the population in such a way was discussed as one method for reducing variance to increase power in an earlier chapter.

In an experiment where we are interested in the effects of different treatments on women’s cholesterol levels, the initial cholesterol differences among the women’s scores are considered to be error. We can represent this error by randomly adding and subtracting numbers, which primarily reflect the combined impact of environmental and inherited characteristics, to our set of constant cholesterol scores. That is, we create scores from a population with differences among and within the units caused by errors. When we do this the result might look as follows:

Treatment_1 Treatment_2 Treatment_3
Gamma Brand Delta Brand None
Y11=220-20=200 Y12=220+15=235 Y13=220- 5=215
Y21=220+ 5=225 Y22=220-15=205 Y23=220+20=240
Y31=220-15=205 Y32=220+ 0=220 Y33=220+15=235
M1 = 210 M2 = 220 M3 = 230

That is, in an analysis where we are interested in the effects of different treatments, we consider the differences that occur among the units in our original population, for whatever reason, to represent error. Then, the variance of the scores in the original population may be referred to as error variance.

We may consider the latter set of scores to represent the scores when there are no treatment effects, that is, where the only differences between the treatment means are those that are due to sample fluctuations. When this is the case, we can estimate the original population error variance in two ways. Both of these ways make use of the numerator and denominator of our equation for a sample variance (see equation 5-2), that is, numerator of the variance equation is \(\sum (X-M)^2\) and the denominator of the variance equation is \((n-1)\).

We will now consider two estimates of the original population error variance.

An Estimate of the Original Population variance: The Mean Square Between Groups (MSB).

The first estimate of the original population variance that is usually obtained during an analysis of variance is found from the variance of the treatment means. This is also called Mean Square Among (MSA) groups. In what follows this variance estimate will be found using elementary algebra and then calculated using the preceding cholesterol scores.

Given no treatment effects, the variance of the sampling distribution for each treatment mean is known to be (see chapter 10 and the discussion of the Central Limit Theorem):

\[ \begin{equation} s_M^2 = \frac{s_{e_1}^2} {n_J} \tag{14-5} \end{equation} \]

so that the first estimate of the error variance is found as

\[ \begin{equation} s_{e_1}^2 = n_J (s_M^2 ) \tag{14-6} \end{equation} \]

Therefore, by using the variance equation (5-2) on the J treatment means in an ANOVA we have:

\[ \begin{equation} s_M^2 = \frac{\sum(M_j - M)^2} {n_j} \tag{14-7} \end{equation} \]

where M, here, is the mean of the J treatment means (Mj), which is often called the unweighted Grand Mean. Note that there are two types of Grand Means. First, the Weighted Grand Mean is calculated as

\[ \text{Weighted Grand Mean} = \sum_{j=1}^K \frac{(n_j)(M_j)} {N} \]

and is called the Weighted Grand Mean because each cell or group mean is “weighted” by the number of units in its cell. Therefore, the Grand Mean is weighted more heavily by the larger cells or groups. The Weighted Grand Mean is equivalent to simply adding up the scores for all the units and divide by the number of units. Second, the Unweighted Grand Mean is calculated as

\[ \text{Unweighted Grand Mean} = \sum_{j=1}^K \frac{M_j}{K} \]

and is called “unweighted” because the number of units in each cell is not considered in the calculation of the Grand Mean. Therefore, each cell mean is weighted the same in the calculation of the Grand Mean (and therefore, no cell is weighted differently). Essentially, you simply add up the cell means and divide by the number of cells.

When there are an equal number of units per cell or per group, then the Grand Means is calculated exactly the same using both formulas. However, when the cells have unequal numbers of units, then the calculation of the Grand Mean differs. When there are an unequal number of units per cell, the Grand Mean M in equation 14-7 is found as the mean of all of the units in the experiment, which is called the weighted Grand Mean.

If we substitute (14-7) into (14-6) we find that our first estimate of the original population variance is:

\[ \begin{equation} s_{e_1}^2 = \frac{\sum n_j(M_j-M)^2} {J-1} \tag{14-8} \end{equation} \]

Here, \(s_{e_1}^2\) is called the mean square between groups (MSB) or the mean square among groups (MSA); the numerator in equation (14-8) is called the sum of squares between groups (SSB) or the sum of squares among groups (SSA), and the denominator of equation (14-8) represents the degrees of freedom (v1) associated with the mean square among groups.

For the cholesterol scores we have that:

\[ \begin{align} \sum n_j (M_j-M)^2 &= 3(210-220)^2 + 3(220-220)^2 + 3(230-220)^2 \\ &= 3(-10)^2 + 3(0)^2 + 3(10)^2 \\ &= 300 + 0 + 300 \\ SSB &= 600 \end{align} \]

and that:

\[ v_1 = df_B = J-1 = 2 \]

where \(df_B\) stands for degrees of freedom between. Therefore, the sum of squares among groups is 600 and the degrees of freedom are 2, so that the among groups estimate of the original population variance is:

\[ MSB = s_{e_1}^2 = \frac{SSB}{v_1} = \frac{SSB}{df_B} = \frac{600}{2} = 300 \]

That is, the mean square among groups provides us with an estimate of the original population variance (when the null hypothesis is true), which in this case is estimated to be 300.

Let us now consider another way to estimate the population error variance.

An Estimate of the Original Population Variance: The Mean Square Within (MSW).

The second way to estimate the original population variance basically involves pooling the population variance estimates found from each of the treatments. The rationale is that a pooled estimate will usually be closer to the true population variance than will any of the single estimates. This pooled estimate of the original population variance is arrived at through the following steps.

  1. Step 1. Find the numerator sum of squares for each treatment and then add these sums of squares together. For example, for the latter no treatment effect data we have:

\[ \begin{align} \text{Treatment #1: } \sum (X-M_1)^2 &= (200-210)^2 + (225-210)^2 + (205-210)^2 \\ &= (-10)^2 + (15)^2 + (-5)^2 = 100 + 225 + 25 = 350 \\ \text{Treatment #2: } \sum (X-M_2)^2 &= (235-220)^2 + (205-220)^2 + (220-220)^2 \\ &= (15)^2 + (-15)^2 + (0)^2 = 225 + 225 + 0 = 450 \\ \text{Treatment #3: } \sum (X-M_3)^2 &= (215-230)^2 + (240-230)^2 + (235-230)^2 \\ &= (-15)^2 + (10)^2 + (5)^2 = 225 + 100 + 25 = 350 \end{align} \]

The sum of these sums of squares is referred to as the within groups sum of squares (SSW) and is written as:

\[ SSW = \sum_{j=1}^J \sum_{i=1}^N (X_{ij} - M_j)^2 \]

where X is the score for unit i in treatment j, and M is the mean of treatment j. In this example the within groups sum of squares is:

\[ SSW = \sum_{j=1}^J \sum_{i=1}^N (X_{ij} - M_j)^2 = 350 + 450 + 350 = 1150 \]

  1. Step 2. Subtract one from the number of units in each treatment and add the resultant numbers. For our example we have:

\[ \begin{align} v_2 = df_W = \sum_{j=1}^J (n_j - 1) &= (3-1) + (3-1) + (3-1) \\ &= 2 + 2 + 2 \\ &= 6 \end{align} \]

This sum represents the degrees of freedom (\(v_2\) or \(df_W\)) associated with the within group variation. Note that if there are an equal number of units per treatment, \(v_2 = J(n – 1)\). In this example we have equal n’s, so that \(v_2 = 3(3 – 1) = 6\).

  1. Step 3. The second estimate of the variance of the population, \(s_{e_2}^2\) is found by dividing the within groups sum of squares (found in step 1) by its associated degrees of freedom (found in step 2). This estimate of the population variance is often referred to as the mean square within (MSW), error variance (sometimes called mean square error, MSE, and in regression it is called mean square residual), or as the within groups variance. For our example, this error variance is found as:

\[ MSW = s_{e_2}^2 = \frac{SSW}{v_2} = \frac{SSW}{df_W} = \frac{1150}{6} = 191.667 \]

That is, our estimate of the variance of the original population of women is 191.667.

The Sum of Squares Total: A Means Of Checking Our Calculations.

The sum of squares total (SST) is calculated as:

\[ \begin{equation} SST = \sum_{j=1}^J \sum_{i=1}^N (X_{ij}-M)^2 \tag{14-9} \end{equation} \]

The sum of squares total is the sum of all the squared deviation scores in the design, that is, each observation (X) minus the mean of all of the observations (M, the Grand Mean), squared, and summed. The sum of squares total (SST) is of interest because it can be shown that it is equal to the sum of squares between groups (SSB) plus the sum of squares within groups (SSW):

\[ \begin{equation} SST = SSB + SSW \tag{14-10} \end{equation} \]

Therefore, SST provides us with a means of checking our calculations of SSA and SSB. For our cholesterol scores we may find SST as:

\[ \begin{align} SST &= \sum(X-M)^2 \\ SST &= (200-220)^2 + (225-220)^2 + (205-220)^2 \\ &+ (235-220)^2 + (205-220)^2 + (220-220)^2 \\ &+ (215-220)^2 + (240-220)^2 + (235-220)^2 \\ SST &= (-20)^2 + (5)^2 + (-15)^2 + (15)^2 \\ &+ (-15)^2 + (0)^2 + (-5)^2 + (20)^2 + (15)^2 \\ SST &= 400+25+225+225+225+0+25+400+225 \\ SST &= 1750 \end{align} \]

Then if we use equation (14-10), we can check our calculations of SSB and SSW as:

\[ \begin{align} SST &= SSB + SSW \\ 1750 &= 600 + 1150 \end{align} \]

Therefore, it appears as though we have calculated SSB, SSW, and SST without error.

Let us now consider how the two estimates of the population error variance are used to form the Fisher’s F statistic.

The Fisher’s F Statistic.

In chapter 13, we saw that the sampling distribution of a variance follows a chi-square statistic with \(v_1 = (J – 1)\) and \(v_2 = \sum(n – 1)\) degrees of freedom. Also, it can be shown that the mean square within groups is distributed as a chi-square statistic with ratio of these two chi-square statistics may be written as:

\[ \begin{equation} \frac{\chi_{v_1}^2} {\chi_{v_2}^2} \tag{14-11} \end{equation} \]

which we noted at the beginning of this chapter is distributed as an Fisher’s F statistic with v1 and v2 degrees of freedom. For the no treatment effect cholesterol scores we have:

\[ \begin{align} F &= \frac{\text{Mean Square Between Groups}} { \text{Mean Square Within Groups}} \\ F &= \frac{SSB⁄v_1} {SSW⁄v_2} = \frac{SSB⁄df_B} {SSW⁄df_W} = \frac{MSB}{MSW} \\ F &= \frac{600⁄2}{1150⁄6} = \frac{300}{191.667} \\ F &= 1.5652 \end{align} \]

Given a .05 level of significance, the critical value found in Table A.5 is \(F_{(.95, 2, 6)} = 5.14\), so that we would fail to reject the null hypothesis that the treatment population means are all equal. The results of this analysis are shown in Figure 14c, the p-value found is .2838).

Here we found that the differences among the treatment means could be attributed to chance. We did this by examining the ratio of two separate estimates of the population error variance. If a difference among the means were present, the numerator variance estimate (MSB) would have been large enough, as compared to the denominator variance estimate (MSW), to yield an F-statistic greater than 5.14 (see equation 14-4).

Let us now consider an ANOVA where there are treatment effects.

The Fisher’s F Statistic In ANOVA: Treatment Effects.

To this point in our discussion we have had no treatment effects, so each observation has been made up of the original population mean and an error, that is, \(X_{ij} = \mu + e_{ij}\). In this section we will see what happens in the analysis of variance when there are treatment effects. Here each observation will be written as insert the sum of the original population mean plus a treatment effect plus an error, that is,

\[ X_{ij} = \mu + a_j + e_{ij} \]

where aj is a treatment effect. If we add treatment effects to the preceding cholesterol measures we might find measures like the following: (see chart below). This chart represents scores from a population with differences within units caused by errors and differences between units caused by errors and treatment effects (recall the original population mean was 220).

Treatment_1 Treatment_2 Treatment_3
Gamma Brand Delta Brand None
Treatment Effect = -21 Treatment Effect = +15 Treatment Effect = +6
Y11=220-21-20 = 179 Y12=220+15+15 = 250 Y13=220+6- 5 = 221
Y21=220-21+ 5 = 204 Y22=220+15-15 = 220 Y23=220+6+20 = 246
Y31=220-21-15 = 184 Y32=220+15+ 0 = 235 Y33=220+6+15 = 241
M1 = 189 M2 = 235 M3 = 236

A mathematical consequence of this additive model is that the sum of the treatment effects must add to zero across the treatments. This is a restriction caused by choosing to write each observation in terms of three unknowns. However, a further discussion of this restriction is beyond the scope of this book. The more mathematically inclined readers may choose to consult Scheffe (1959).

Here, we have added the constant treatment effect, \(a_1 = -21\), \(a_2 = +15\), or \(a_3 = +6\), to each observation in a given treatment. The effect that these treatment effects will have on the analysis of variance is to cause the mean square among groups to estimate the sum of the error variance plus a function of the squared treatment effects, but they will have no effect upon the mean square within groups. We can see this when we repeat the process of finding the mean squares needed for the Fisher’s F ratio.

Mean Square Between (or Among) Groups.

For the latter treatment effect data we have that the sum of squares among groups is found as:

\[ \begin{align} \sum n_j (M_j - M)^2 &= 3(189-220)^2 + 3(235-220)^2 + 3(236-220)^2 \\ &= 3(-31)^2 + 3(15)^2 + 3(16)^2 \\ &= 2883 + 675 + 768 \\ SSB &= 4326 \end{align} \]

and that: J – 1 = 2. Therefore the mean square between groups is: 4326/2 = 2163, which is an estimate of:

\[ s_{e_1}^2 + \text{(a function of the squared treatment effects)} \]

You can see that there has been a substantial increase over the value of 300 found when there were no treatment effects.

The Mean Square Within Groups.

For the latter treatment effect data, the mean square within groups does not change when there are treatment effects. This can be seen through the following calculation of the sum of square within groups:

\[ \begin{align} \text{Treatment #1: } \sum (X-M_1)^2 &= (179-189)^2 + (204-189)^2 + (184-189)^2 \\ &= (-10)^2 + (15)^2 + (-5)^2 = 100 + 225 + 25 = 350 \\ \text{Treatment #2: } \sum (X-M_2)^2 &= (250-235)^2 + (220-235)^2 + (235-235)^2 \\ &= (15)^2 + (-15)^2 + (0)^2 = 225 + 225 + 0 = 450 \\ \text{Treatment #3: } \sum (X-M_3)^2 &= (221-236)^2 + (246-236)^2 + (241-236)^2 \\ &= (-15)^2 + (10)^2 + (5)^2 = 225 + 100 + 25 = 350 \end{align} \]

Here the within groups sum of squares is:

\[ SSW = \sum_{j=1}^J \sum_{i=1}^N (X_{ij} - M_j)^2 = 350 + 450 + 350 = 1150 \]

Also, the degrees of freedom remain the same as for the no treatment data, that is, \(\sum (nj – 1) = 6\). Therefore, the mean square within groups also remains the same as for the no treatment data, that is, at \(1150/6 = 191.667\).

The Sum of Squares Total.

The sum of squares total is found, using equation (14-9), as:

\[ \begin{align} SST &= Σ(X-M)^2 \\ &= (179-220)^2 + (204-220)^2 + (184-220)^2 + (250-220)^2 \\ &+ (220-220)^2 + (235-220)^2 + (221-220)^2 + (246-220)^2 \\ &+(241-220)^2 \\ SST &= (-41)^2 + (-16)^2 + (-36)^2 + (30)^2 + (0)^2 \\ &+ (15)^2 + (1)^2 + (26)^2 + (21)^2 \\ SST &= 1681 + 256 + 1296 + 900 + 0 + 225 + 1 + 676 + 441 \\ SST &= 5476 \end{align} \]

Then, using equation (14-10), we can check our calculations of SSA and SSW. Here our calculations are consistent since: SST = SSA + SSW using 5476 = 4326 + 1150.

The Fisher’s F Statistic.

The Fisher’s F statistic found for the cholesterol data where there are treatment effects is:

\[ \begin{align} F &= \frac {\text{Mean Square Between Groups}} {\text{Mean Square Within Groups}} = \frac{MSB}{MSW} \\ F &= \frac{4326⁄2}{1150⁄6} = \frac{2163}{191.667} \\ F &= 11.2852 \end{align} \]

Here the critical value found in table A5 is \(F_{(.95,2,6)} = 5.14\), so that we would reject the null hypothesis that the treatment population means are all equal. The results of this analysis are shown in Figure 14d. It is instructive to compare the ANOVA output with no treatment effects shown in Figure 14c, with the ANOVA output shown in Figure 14d where there are treatment effects.

Figure 14c An analysis of variance showing no treatment effects, including descriptive statistics

  jmv::anovaOneW(
    formula = Cholesterol ~ Treatment,
    data = data,
    fishers = TRUE,
    desc = TRUE,
    descPlot = TRUE,
    norm = TRUE,
    qq = TRUE,
    eqv = TRUE)

 ONE-WAY ANOVA

 One-Way ANOVA                                                  
 ────────────────────────────────────────────────────────────── 
                              F        df1    df2      p        
 ────────────────────────────────────────────────────────────── 
   Cholesterol    Welch's     1.469      2    3.987    0.3327   
                  Fisher's    1.565      2        6    0.2838   
 ────────────────────────────────────────────────────────────── 


 Group Descriptives                                           
 ──────────────────────────────────────────────────────────── 
                  Treatment    N    Mean     SD       SE      
 ──────────────────────────────────────────────────────────── 
   Cholesterol    Gamma        3    210.0    13.23    7.638   
                  Delta        3    220.0    15.00    8.660   
                  None         3    230.0    13.23    7.638   
 ──────────────────────────────────────────────────────────── 


 ASSUMPTION CHECKS

 Normality Test (Shapiro-Wilk)       
 ─────────────────────────────────── 
                  W         p        
 ─────────────────────────────────── 
   Cholesterol    0.9074    0.2979   
 ─────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the assumption
   of normality


 Homogeneity of Variances Test (Levene's)             
 ──────────────────────────────────────────────────── 
                  F            df1    df2    p        
 ──────────────────────────────────────────────────── 
   Cholesterol    8.520e-32      2      6    1.0000   
 ──────────────────────────────────────────────────── 

Figure 14d An analysis of variance with a treatment effect, including descriptive statistics

  jmv::anovaOneW(
    formula = Cholesterol ~ Treatment,
    data = data,
    welchs = TRUE,
    fishers = TRUE,
    miss="perAnalysis",
    desc = TRUE,
    descPlot = TRUE,
    norm = TRUE,
    qq = TRUE,
    eqv = TRUE)

 ONE-WAY ANOVA

 One-Way ANOVA                                                  
 ────────────────────────────────────────────────────────────── 
                              F        df1    df2      p        
 ────────────────────────────────────────────────────────────── 
   Cholesterol    Welch's     10.19      2    3.987    0.0271   
                  Fisher's    11.29      2        6    0.0093   
 ────────────────────────────────────────────────────────────── 


 Group Descriptives                                           
 ──────────────────────────────────────────────────────────── 
                  Treatment    N    Mean     SD       SE      
 ──────────────────────────────────────────────────────────── 
   Cholesterol    Gamma        3    189.0    13.23    7.638   
                  Delta        3    235.0    15.00    8.660   
                  None         3    236.0    13.23    7.638   
 ──────────────────────────────────────────────────────────── 


 ASSUMPTION CHECKS

 Normality Test (Shapiro-Wilk)       
 ─────────────────────────────────── 
                  W         p        
 ─────────────────────────────────── 
   Cholesterol    0.9074    0.2979   
 ─────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the assumption
   of normality


 Homogeneity of Variances Test (Levene's)             
 ──────────────────────────────────────────────────── 
                  F            df1    df2    p        
 ──────────────────────────────────────────────────── 
   Cholesterol    8.520e-32      2      6    1.0000   
 ──────────────────────────────────────────────────── 

As you can see from Figures 14c and 14d, JAMOVI’s One-way ANOVA output provides the basic statistics that are of interest in a one-way analysis of variance. Steps are provided in the SUMMARY section.

Unequal Number Of Observations.

The JAMOVI One-way ANOVA procedure will handle both equal and unequal sample sizes in your levels. Therefore, we will not consider anything special for the case where there is an unequal number of scores per treatment, but we will provide output for both so you can see the differences. Figure 14f will show unequal sample sizes (called unbalanced) so you can compare the subtleties to Figure 14d (results have change, of course, because different data are analyzed). However, with unequal variances you should consider using the Welch’s F test instead of the pooled variances ANOVA Fisher’s F in the ANOVA table). This is because a violation of the homogeneity of variance assumption has potentially severe impact on Type I error rates with unequal sample sizes.

Figure 14e Data for Unbalanced (unequal sample sizes) analysis

Treatment_1 Treatment_2 Treatment_3
Gamma Brand Delta Brand None
Treatment Effect = -21 Treatment Effect = +15 Treatment Effect = +6
Y11=220-21-20 = 179 Y12=220+15+15 = 250 Y13=220+6- 5 = 221
Y21=220-21+ 5 = 204 Y22=220+15-15 = 220
Y31=220-21-15 = 184 Y33=220+6+15 = 241
M1 = 189 M2 = 235 M3 = 231

Figure 14f One-way ANOVA with unequal sample sizes

  jmv::anovaOneW(
    formula = Cholesterol ~ Treatment,
    data = data,
    welchs = TRUE,
    fishers = TRUE,
    miss="perAnalysis",
    desc = TRUE,
    descPlot = TRUE,
    norm = TRUE,
    qq = TRUE,
    eqv = TRUE)

 ONE-WAY ANOVA

 One-Way ANOVA                                                  
 ────────────────────────────────────────────────────────────── 
                              F        df1    df2      p        
 ────────────────────────────────────────────────────────────── 
   Cholesterol    Welch's     6.406      2    1.858    0.1467   
                  Fisher's    9.375      2        6    0.0142   
 ────────────────────────────────────────────────────────────── 


 Group Descriptives                                            
 ───────────────────────────────────────────────────────────── 
                  Treatment    N    Mean     SD       SE       
 ───────────────────────────────────────────────────────────── 
   Cholesterol    Gamma        5    190.0    12.94     5.788   
                  Delta        2    235.0    21.21    15.000   
                  None         2    231.0    14.14    10.000   
 ───────────────────────────────────────────────────────────── 


 ASSUMPTION CHECKS

 Normality Test (Shapiro-Wilk)       
 ─────────────────────────────────── 
                  W         p        
 ─────────────────────────────────── 
   Cholesterol    0.8526    0.0796   
 ─────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the assumption
   of normality


 Homogeneity of Variances Test (Levene's)         
 ──────────────────────────────────────────────── 
                  F        df1    df2    p        
 ──────────────────────────────────────────────── 
   Cholesterol    2.016      2      6    0.2140   
 ──────────────────────────────────────────────── 

Fixed Effects Versus Random Effects

In this book, the focus for a one-way design is on what is known as a fixed-effects analysis of variance. The term fixed-effects ANOVA indicates that an analysis of variance will be performed on treatments that have been deliberately selected. This type of ANOVA may be contrasted with what is known as a random-effects ANOVA where the treatment levels have been sampled at random from a population of possible treatments. For example, in our cholesterol experiment we were interested in the effects on cholesterol level of taking the Gamma Company’s herbal supplement pill, the Delta Company’s herbal supplement pill, and of taking no herbal supplement pill.

These three treatments were deliberately selected for study. The cholesterol experiment would become a one-way random-effects ANOVA if all three treatment levels represented Nutrition companies that had been selected at random from the population of companies that made herbal supplement pills. In this case, the research problem would have been: Do all Nutrition companies produce herbal supplement pills that have the same effect on women’s cholesterol levels?

THE INDEPENDENT t TEST

The t statistic

When there are only two groups in a one-way design, the null hypothesis and two-tailed alternative hypothesis are:

\[ H_0: \mu_1=\mu_2 H_A: \mu_1=\mu_2 \]

These hypotheses may be written in terms of mean differences as:

\[ H_0: \mu_1-\mu_2=0 H_A: \mu_1-\mu_2≠0 \]

Many researchers will not use the Fisher’s F statistic to test this null hypothesis, but will instead choose to use the t statistic (but using ANOVA with the Fisher’s F statistic is perfectly legitimate with two or more groups). When a t statistic is used in a two group analysis of variance situation, the statistical test is referred to as the independent t test. The test statistic for the independent t test is written as:

\[ \begin{equation} t = \frac{M_1 - M_2}{s_{M_1 - M_2}} \tag{14-12} \end{equation} \]

where \(M_1\) is the mean of the first treatment group and \(M_2\) is the mean of the second treatment group, and \(s_{MI-M2}\) is the standard deviation of the sampling distribution of the mean differences (i.e., standard error of the mean differences). This t statistic follows a t distribution with (\(n_1 + n_2 - 2\)) degrees of freedom, where \(n_1\) is the number of units in the first treatment and \(n_2\) is the number of units in the second treatment.

You may remember from chapter 13 that we found the standard deviation of the mean differences, sD, for the dependent t test by taking the difference between pairs of scores and then finding the standard deviation of the differences (sD). We were able to do this because each unit had been paired, either with itself or with another unit, so that the scores were correlated, that is, the scores were considered to be dependent. In using the independent t test, however, the units in each treatment are independent of one another so that there are no natural pairings, and therefore, no way to find a correlation. In this case, the standard error of the mean differences is found as:

\[ \begin{equation} s_{M_1-M_2} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2)}{n_1+n_2-2}} \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \tag{14-13} \end{equation} \]

Note that Equation 14-13 can also be written as:

\[ \begin{equation} s_{M_1-M_2} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2)}{n_1+n_2-2}} \sqrt{\frac{n_1+n_2} {n_1 n_2}} \end{equation} \]

where \(s_1^2\) is the variance of the scores in the first treatment group, \(n_1\) is the number of subjects in the first treatment group, \(s_2^2\) is the variance of the second treatment group, and \(n_2\) is the number of subjects in the second treatment group.

This form on the Independent t test is often called the “pooled t test” in reference to the pooling that in the left part of the equation. The pooled standard deviation of the mean differences (pooled SD) is:

\[ \text{pooled SD} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2)}{n_1+n_2-2}} \]

Recall that standard error is a function of the standard deviation and sample size (e.g., the standard error for the mean is

\[ s_M = \frac{s_X}{\sqrt{n}} = s_X \sqrt{\frac{1}{n}} \]

and recall that the variance of the sampling distribution of the mean is simply \(s_M^2 = s_X^2 / n\), where \(s_X^2\) is the variance of the X variable and n is the sample size.

If we use similar logic to convert the pooled SD to the standard error of the mean differences, we can simply divide the pooled SD by some relevant function of the sample size, much like we did in the previous box. You can see that the standard error of the mean differences is simply the pooled SD multiplied by a function of \(1/n\) that incorporates both group sizes.

One-Way Two Group ANOVA: An Example

Let us use the first two groups from the latter discussion of the fixed-effects cholesterol experiment to demonstrate the calculations for the independent t test. In this case our data from the first two groups of Figure 14b(ii) is:

Treatment_1 Treatment_2
Gamma Brand Delta Brand
Treatment Effect = -21 Treatment Effect = +15
Y11=220-21-20 = 179 Y12=220+15+15 = 250
Y21=220-21+ 5 = 204 Y22=220+15-15 = 220
Y31=220-21-15 = 184 Y32=220+15+ 0 = 235
M1 = 189 M2 = 235

Here, the variances in each group are found as:

\[ \begin{align} s_1^2 &= \frac{\sum (X_{i1} - M_1)^2}{n_1-1} = \frac{350}{2} = 175 \\ s_2^2 &= \frac{\sum (X_{i2} - M_2)^2}{n_2-1} = \frac{450}{2} = 225 \end{align} \]

So that the standard error of the mean differences is estimated to be:

\[ \begin{align} s_{M_1-M_2} &= \sqrt{\frac{(2)175+(2)225} {3+3-2}} \sqrt{\frac{1}{3}+\frac{1}{3}} \\ &= \sqrt{\frac{(2)175+(2)225}{3+3-2}} \sqrt{\frac{3+3}{3*3}} \\ &= 14.1421 * 0.8165 \\ &= 11.5470 \end{align} \]

Therefore, the t statistic is found using equation (14-12) as:

\[ t = \frac{189-235}{11.55} = \frac{-46}{11.55} = -3.98 \]

From table A.2, we find that the critical t-values for the two-tailed t test with \(n_1 + n_2 – 2 = 4\) degrees of freedom at the two-tailed .05 level of significance are \(t_{(.025,4)} = -2.776\) and \(t_{(.975,4)} = 2.776\). Also, the p-value is found as .0164. We may conclude that the null hypothesis is false because the t statistic is less than the lower critical t-value, or the p-value is less than the level of significance. Therefore, we may conclude that women in Treatment population 2 have higher cholesterol scores than the women in Treatment population 1.

Independent t test: Confidence Intervals

The two-tailed confidence interval for the independent t test is found as:

\[ \begin{equation} (M_1-M_2) - t_{(1-α⁄2,n_1+n_2-2)} (s_{M_1-M_2}) < \mu_1 - \mu_2 < (M_1-M_2) + t_{(1-α⁄2,n_1+n_2-2)} (s_{M_1-M_2}) \tag{14-14} \end{equation} \]

The one-tailed confidence intervals are found in appendix K.

The 95% two-tailed confidence interval for the preceding example data set would be:

\[ -46 - 2.776(11.55) < \mu_1-\mu_2 < -46+2.776(11.55) \]

which is approximately \(-78.1 < \mu1 – \mu2 < -13.9\) or alternatively (-78.1, -13.9). Note that because the direction of subtraction is essentially arbitrary, we could have equivalently calculated the confidence for \(\mu2 – \mu1\) and the confidence interval values would have had exactly the same magnitude, with opposite signs: approximately \(13.9 < \mu2 – \mu1 < 78.1\) or alternatively (13.9, 78.1). Note that the smallest number (i.e., closest to \(-\infty\)) is always the first number reported in a confidence interval.

Actual Effect Size for two-group mean comparisons

Cohen’s d is the standardize mean difference effect size to be reported for the Independent t test. In is not easy to calculate by hand using the pooled standard deviation, but there is a reasonable estimate that can be obtained from the output using the average variances. There are also variations of Cohen’s d called Glass’s delta and Hedge’s g (which also allows a confidence interval to be created). However, Cohen’s d remains the most frequently reported standardized mean difference effect size. Glass’s delta and Hedge’s g are more common in meta-analysis.

If we calculate Cohen’s d using the pooled estimate of the population standard deviation, we would calculate:

\[ \begin{align} d &= \frac{|M_1-M_2|}{\text{Pooled SD}} \\ &= \frac{|M_1-M_2|} {\sqrt{\frac{(n_1-1) s_1^2 + (n_2-1) s_2^2} {n_1 + n_2 - 2}}} \\ &= \frac{|189-235|}{\sqrt{\frac{2*175 + 2*225}{4}}} \\ &= \frac{46}{14.142} = 3.253 \end{align} \]

Note that this formula for the pooled standard deviation is, not surprisingly, the left-hand part of Equation 14-13. This was described in more detail above (and shown in Equation 14-13a). Another way to calculate the pooled SD is as:

\[ \begin{align} & SE / \sqrt{\frac{1}{n_1} + \frac{1}{n_2}} \\ & \text{or } \\ & SE / \sqrt{\frac{n_1+n_2} {n_1 n_2}} \\ & \text{so } \\ & 11.547 / \sqrt{\frac{3+3} {3*3}} = 11.547 /\sqrt{0.6667} = 14.142 \end{align} \]

If we use the square root of the average variance (which is a better choice than simply averaging the standard deviations), we calculate the following.

\[ \begin{align} d &= \frac{|M_1-M_2|} {\sqrt{\text{Average Variance}}} \\ &= \frac{|M_1-M_2|} {\sqrt{\frac{s_1^2 + s_2^2}{2}}} \\ &= \frac{|189-235|}{\sqrt{\frac{175+225}{2}}} \\ &= \frac{46}{\sqrt{200}} = \frac{46}{14.142} = 3.253 \end{align} \]

Note that both formulas produce the same results in this case because the sample sizes are equal for both groups (\(n_1 = n_2 = 3\)). Also note that if you use the robust, separate variances t test (called variously the Welch’s t test, the Welch-Satterthwaite t test, or the “Equal Variances Not Assumed” t test), then this second formula with the square root of the average variances is the more appropriate standardized effect size to report.

Power: Independent Versus Dependent t tests

When the dependent t test is used and the independent t test should have been, or vise versa, the result is a loss of power. The reason for this is that when you use the dependent t test and you should have used the independent t test, you lose degrees of freedom. That is, the dependent t test has (\(n – 1\)) degrees of freedom, but the independent t test has (\(n_1 + n_2 - 2\)) degrees of freedom. This loss of degrees of freedom causes you to establish an artificially low and/or high critical values, and therefore you lose power. For example, given 10 units in each treatment, the two-tailed critical values for the dependent t test are \(t_{(.025,9)} = -2.262\), and \(t_{(.975,9)} = 2.262\), but the critical values for the independent t test are \(t_{(.025,18)} = -2.101\) and \(t_{(.975,18)} = 2.101\).

On the other hand, when you use the independent t test and you should have used the dependent t test, your calculated t statistic will be smaller than it should be, resulting in a loss of power. This is because the standard error of the mean differences for the dependent t test is smaller than the standard error of the mean differences for the independent t test, and this reduction in the standard deviation compensates for the loss of degrees of freedom. For example, if the previous example consisted of dependent scores, rearranged in pairs from highest to lowest, the data would look like:

Treatment_1 Treatment_2 Difference
Gamma Brand Delta Brand
179 220 -41
204 250 -46
184 235 -51
M1 = 189 M2 = 235 M2 = -46

For this data, the unpaired t-value is -3.98, but the paired t-value is -15.93. Since the mean difference did not change, you can see that the large dependent t-value is due to a decrease in the standard error of the mean differences. The standard error of the mean differences for the independent t test is found in the previous section using equation ( 14-13) to be:

\[ s_{M_1 - M_2} = 11.547 \]

but the standard error for the dependent t test is found as (with some rounding error due to only 3 decimal places):

\[ \begin{align} s_{M_1 - M_2} &= \sqrt{\frac{s_1^2}{n} + \frac{s_2^2}{n} - \frac{2r_{12} s_1 s_2}{n}} \\ &= \sqrt{\frac{175}{3} + \frac{225}{3} - \frac{2(.945)(13.229)(15.00)}{3}} \\ &= 2.88 \end{align} \]

The decrease in the standard error of the dependent t test is caused by the subtraction of the term

\[ \begin{align} \frac{2r_{12} s_1 s_2}{n} = \frac{2(.945)(13.229)(15.00)}{3} = 125.01 \end{align} \]

which is due to the correlation between the variables. Because the standard error is in the denominator of the t statistic, this results in the t statistic being larger (i.e., dividing by a smaller value in the denominator results in a larger answer). Therefore, you can see the importance of the assumption that the variables are correlated for the dependent t test.

The t statistic Versus the Fisher’s F Statistic

When there are only two groups in a one-way analysis of variance the Fisher’s F statistic has:

\[ \begin{align} v_1 &= df_B = J-1 = 2-1 = 1 \\ &and \\ v_2 &= df_W = \sum (n_j-1) = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2 \end{align} \]

degrees of freedom. Now since both the t test and the Fisher’s F test both test the same hypothesis, you might guess that there is a relationship between the two test statistics. Your guess would be correct, for it can be shown that the relationship between the Student’s t and Fisher’s F statistics is:

\[ \begin{equation} t_{(1-α⁄2,v_2)}^2 = F_{(1-α,1,v_2)} \tag{14-15} \end{equation} \]

or

\[ \begin{equation} \sqrt{F_{(1-α,1,v_2)}} = t_{(1-α⁄2,v_2)} \tag{14-16} \end{equation} \]

where \(v_2 = n_1 + n_2 – 2\). That is, given two groups, the two tailed independent Student’s t test will yield the same results as the overall Fisher’s F test.

By the way, the within degrees of freedom for one-way ANOVA and the independent Student’s t test are often written as \((N – K)\), where N is the total sample size for all groups combined (e.g., \(n_1 + n_2\), in a two group scenario) and K is the number of groups (e.g., 2 in an independent Student’s t test). This can be seen as follows:

\[ v_2 = df_W = \sum_{j=1}^K (n_j-1) = (n_1-1) + \text{ ... } + (n_k-1) = N-K \]

so there will be an (n) for each group and a (-1) for each group. Adding together the n values for each group results in total N, and adding together the (-1) for each group results in a subtraction of the number of groups, K. The subtraction results in \(N – K\). Another way to think about it, here, in the two-group situation is that \(df_W = (n_1 – 1) + (n_2 – 1) = n_1 + n_2 – 2 = N – 2\).

One-Tailed Student’s t and Fisher’s F Tests.

It is frequently noted that the Fisher’s F test is a “one-tailed two-tailed test.” This is because we generally use the Fisher’s F statistic to test whether the numerator variance is larger than the denominator variance. Indeed, the test statistic is often written something like this:

\[ F = \frac{\text{larger } s^2}{\text{smaller } s^2} \]

Recall that the logic of the Fisher’s F test for ANOVA described above is that we want the variance between groups (i.e., the explained variance, the “good stuff”) to be larger than the variance within groups (i.e., the unexplained variance, the “bad stuff”). Due to this approach, we are only concerned when the Fisher’s F statistic is larger than 1.0 and therefore only care about the upper (or right) tail. From this perspective, any confirmatory analyses require that we have predicted which group(s) will be larger than the other(s) and after a statistically significant omnibus test we simply examine the means to see whether our data support such a conclusion.

However, another way to think about the fact that both the Student’s t test and the Fisher’s F test may be used to test a one-tailed alternative hypothesis by finding critical values at the \(\alpha\) or \((1 – \alpha)\) percentile for the Student’s t statistic and at the \((1 – 2\alpha)\) percentile for the Fisher’s F statistic. For example, if the alternative hypothesis for the independent Student’s t test had been:

\[ H_A: \mu_1 < \mu_2 \]

The critical t statistic at the one-tailed .05 level of significance would have been \(t_{(.05,4)} = -2.132\), and the critical F statistic would have been \(F_{(.90,1,4)} = 4.54\). Here, we have:

\[ \begin{align} t_{(1-\alpha,v_2)}^2 &= F_{(1-2\alpha,1,v_2 )} \\ -2.132^2 &= 4.545 \text{ (within rounding error)} \end{align} \]

Here, since the calculated t-value is less than the critical value, that is, -3.98 < -2.132, we would reject the null hypothesis using either statistic. If we had calculated the square root of Fisher’s F instead, the sign of the square root of the Fisher’s F statistic would have been found by considering the mean difference, because the sign of the Student’s t statistic is determined by its numerator, see equations (13-2) and (14-12). For example, here the sign of the mean differences was negative \((M_1 – M_2 = -46)\), and therefore the Student’s t statistic was negative. That is, we would have chosen -2.13 as the \(\sqrt{4.54}\) rather than +2.13.

Problem: Too Many t Tests

Given knowledge of the independent Student’s t test, there may be a temptation to use it exclusively in analysis of variance situations, that is, to use it when there are more than two treatment groups. For example, you might conduct three Student’s t tests at the .05 level of significance, one for each treatment pair (i.e., 1 vs 2, 1 vs 3, and 2 vs 3). In the next chapter we shall find that the number of pairs of means that can be compared is \(J(J – 1)/2\), where J is the total number of groups (note that many scholars use K as the total number of groups, and this formula would then look like \(K[K – 1]/2\)). Here, \(J = 3\) and \(3(3 – 1)/2 = 3\), meaning that there a three possible ways to combine the pairs of groups for comparison. The problem with this approach is that the probability of falsely rejecting one or more of the three null hypotheses is greater than .05.

The actual probability of making one or more errors cannot be calculated because these three tests are not independent of one another. For example, if you know that \(M_1 > M_2\) and that \(M_2 > M_3\), then you also know that \(M_1 > M_3\). If the tests were independent of one another, however, and if the same level of significance, \(\alpha\), is used for each Student’s t test, the probability of making at least one error is called the Family-wise Error Rate and is calculated as:

\[ \begin{equation} FWER = 1 - (1-\alpha)^c \tag{14-17} \end{equation} \]

where c is the number of Student’s t tests conducted. For example, with three Student’s t tests and \(\alpha = .05\), equation (14-17) is:

\[ FWER = 1 - (1-.05)^3 = 1-.8574 \approx .14 \]

Given six treatment groups, there are 15 possible Student’s t tests (\((6*5)/2\)), so that equation (14-17) yields:

\[ FWER = 1 - (1-.05)^{15} = 1 - .4632 \approx .54 \]

That is, with three independent Student’s t tests the probability of making at least one error is .14, but with 15 Student’s t tests the probability of making at least one error is .54. You can see that the probability of making one or more Type I errors becomes intolerably high as the number of Student’s t tests increases. Fortunately, the overall (omnibus) Fisher’s F test controls the probability of making at least one error at your predetermined level of significance, e.g., .01 or .05.

Independent Student’s t Test

JAMOVI will compute both the pooled variance (i.e., when we have equal variances in the population) Student’s t test and the separate variance (i.e., when we have unequal variances in the population). Figure 14i illustrates the independent-samples Student’s t test output for the first two treatments from Figure 14b(ii). This output contains both the equal variances and the unequal variances Student’s t test. For reference, we have also included the One-way ANOVA results for this two-group scenario as Figure 14j.

  1. the p value for ANOVA Fisher’s F test is the same as the p value for the “Equal Variances Assumed” Student’s t test
  2. the p value for the Welch’s F “Robust Test of Equality of Means” is the same as the p value for the “Equal Variances Not Assumed” Welch’s t test
  3. \(t^2\) = F
  4. One-way ANOVA produces a more compete table of descriptive statistics
  5. One-way ANOVA makes is easier to calculate \(R_2\) if you are interested in that statistic, but there are formulas to convert t into r (and recall that if you run this analysis using GLM Univariate, you will get \(R_2\) as part of the output by default)

In chapter 15 we will discuss the assumptions for an ANOVA. These same assumptions hold for the independent t test since the independent t test is a simple case of an ANOVA. One of the assumptions states that the variances across treatment groups must be approximately equal. As we shall see in chapter 15, when you have unequal numbers of units in your treatments and you have no reason to believe that your variances are approximately equal, you might run a higher risk than you have specified of making a type I error.

For example, if you had \(n_1 = 10\) and \(n_2 = 50\) with \(s_1^2 = 500\) and \(s_2^2 = 100\), you might specify a level of significance of .05, but because you have violated the homogeneity of variance assumption, the actual level of significance might be .30.

If you find that you have an unequal number of units, and you expect that a larger variance will be found in the smaller group, you may decide not to pool the treatment variances, which is what is implicitly done in calculating the estimate of the standard error of the mean difference in equation (14-13). Instead, the standard error for the Student’s t statistic of equation (14-12) is found using

\[ \begin{equation} s_{M_1-M_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \tag{14-18} \end{equation} \]

This statistic does not follow the t distribution, however, using the degrees of freedom we calculated above. Therefore, we must calculate the degrees of freedom for the tabled critical t value using the following formula:

\[ \begin{equation} v = \frac{( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} )^2} {{\frac {( \frac{s_1^2}{n_1} ) ^2}{(n_1-1)}} + {\frac {( \frac{s_2^2}{n_2} ) ^2}{(n_2-1)}}} \tag{14-19} \end{equation} \]

The degrees of freedom found using equation (14-19) may not be a whole number, in which case use the nearest whole number (or more conservatively, the smaller whole number). This degree of freedom, v, will be between: min(n1 – 1, n2 – 1) < v < n1 + n2 – 2. This version of the t test is usually attributed to Welch and the degrees of freedom to Satterthwaite, and is called either the Welch’s t test or the Welch-Satterthwaite t test.

Figure 14i Independent-samples t test output

jmv::ttestIS(
    formula = Cholesterol ~ Treatment,
    data = data,
    vars = Cholesterol,
    students=T,
    welchs = TRUE,
    mann = TRUE,
    hypothesis="different",
    norm = TRUE,
    qq = TRUE,
    eqv = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    ciWidth=95,
    effectSize = TRUE,
    desc = TRUE,
    plots = TRUE,
    miss="perAnalysis")

 INDEPENDENT SAMPLES T-TEST

 Independent Samples T-Test                                                                                                                                            
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                    Statistic    df       p         Mean difference    SE difference    Lower     Upper                                  Effect Size   
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Cholesterol    Student's t          -3.984    4.000    0.0164             -46.00            11.55    -78.06    -13.94    Cohen's d                         -3.253   
                  Welch's t            -3.984    3.938    0.0169             -46.00            11.55    -78.26    -13.74    Cohen's d                         -3.253   
                  Mann-Whitney U        0.000             0.1000             -46.00                     -71.00    -16.00    Rank biserial correlation          1.000   
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ <sub>Gamma</sub> ≠ μ <sub>Delta</sub>


 ASSUMPTIONS

 Normality Test (Shapiro-Wilk)       
 ─────────────────────────────────── 
                  W         p        
 ─────────────────────────────────── 
   Cholesterol    0.8943    0.3415   
 ─────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the assumption
   of normality


 Homogeneity of Variances Test (Levene's)            
 ─────────────────────────────────────────────────── 
                  F            df    df2    p        
 ─────────────────────────────────────────────────── 
   Cholesterol    2.524e-31     1      4    1.0000   
 ─────────────────────────────────────────────────── 
   Note. A low p-value suggests a violation of
   the assumption of equal variances


 Group Descriptives                                                 
 ────────────────────────────────────────────────────────────────── 
                  Group    N    Mean     Median    SD       SE      
 ────────────────────────────────────────────────────────────────── 
   Cholesterol    Gamma    3    189.0     184.0    13.23    7.638   
                  Delta    3    235.0     235.0    15.00    8.660   
 ────────────────────────────────────────────────────────────────── 

Figure 14j One-way ANOVA output for two groups (for comparison with Figure 14i)

  jmv::anovaOneW(
    formula = Cholesterol ~ Treatment,
    data = data,
    welchs = TRUE,
    fishers = TRUE,
    miss="perAnalysis",
    desc = TRUE,
    descPlot = TRUE,
    norm = TRUE,
    qq = TRUE,
    eqv = TRUE)

 ONE-WAY ANOVA

 One-Way ANOVA                                                  
 ────────────────────────────────────────────────────────────── 
                              F        df1    df2      p        
 ────────────────────────────────────────────────────────────── 
   Cholesterol    Welch's     15.87      1    3.938    0.0169   
                  Fisher's    15.87      1        4    0.0164   
 ────────────────────────────────────────────────────────────── 


 Group Descriptives                                           
 ──────────────────────────────────────────────────────────── 
                  Treatment    N    Mean     SD       SE      
 ──────────────────────────────────────────────────────────── 
   Cholesterol    Gamma        3    189.0    13.23    7.638   
                  Delta        3    235.0    15.00    8.660   
 ──────────────────────────────────────────────────────────── 


 ASSUMPTION CHECKS

 Normality Test (Shapiro-Wilk)       
 ─────────────────────────────────── 
                  W         p        
 ─────────────────────────────────── 
   Cholesterol    0.8943    0.3415   
 ─────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the assumption
   of normality


 Homogeneity of Variances Test (Levene's)             
 ──────────────────────────────────────────────────── 
                  F            df1    df2    p        
 ──────────────────────────────────────────────────── 
   Cholesterol    2.524e-31      1      4    1.0000   
 ──────────────────────────────────────────────────── 

SUMMARY

The purpose of this chapter was to introduce you to the workings of a one-way fixed-effects analysis of variance (ANOVA). In doing this, we considered the Fisher’s F statistic, which is used to determine if there are any differences among the J, where J > 2, population treatment groups under study in a one-way design. We found that when the null hypothesis is true, the Fisher’s F statistic is made up of the ratio of two independent estimates of the variance in the original population, the error variance. These two estimates of the error variance were referred to as the mean square between, MSB, and the mean square within, MSW.

We found that when the treatments had an effect on the units, that the mean square among included a term made up of squared treatment effects, but that the mean square within still estimated the error variance. Therefore, we found that when no treatment effects are present in a one-way ANOVA, the Fisher’s F ratio will be near one, but given treatment effects, the Fisher’s F ratio will be larger than one. Here, we used the critical F values given in table A.5 to determine how much greater than one an F ratio had to be before we would consider it to reflect significant treatment effects.

In this chapter, we found that R will yield an analysis of variance output for both fixed-effects and random-effects designs. A fixed-effects design was found to be a design wherein the researcher deliberately selected specific treatments for study, while a random-effects design was found to be a design wherein the treatment levels were a random sample of treatments from a population of treatments. In this chapter, our focus was on the fixed-effects design because it is the one that is primarily used by data analysts.

We concluded this chapter with a discussion of the independent Student’s t test. Here, we found that the independent Student’s t test may be used in place of the Fisher’s F test, but only when there are only two groups in a one-way fixed-effects ANOVA Given two treatment groups in a one-way design, we found that either a Student’s t test or an Fisher’s F test could be used to test the null hypothesis of no population treatment mean differences, and that R can output both of these test statistics.

The dependent Student’s t test, considered in chapter 13, was compared with the independent Student’s t test. The point was made and illustrated that a loss of power will generally result from the incorrect use of one of these Student’s t tests in place of the other. We also indicated that in a one-way design with more than two treatments it is inappropriate to use several Student’s t tests in place of the overall Fisher’s F test because the risk of making a Type I error increases, often dramatically.

We concluded our discussion of the independent t test by discussing one of its assumptions: that the variances across treatment groups are equal. We found that when this assumption is violated, and there are unequal n’s, with the smaller number of units having a larger variance, that a separate variance t test is performed.

In chapter 15, we will continue our discussion of the one-way fixed-effects ANOVA with a focus on the steps necessary to carry out an exploratory ANOVA In the process of discussing the latter steps, we will be able to examine the mathematical assumptions for an ANOVA (and for the independent Student’s t test), what happens when these assumptions are violated, how to select an appropriate sample size, and what to do following a significant overall Fisher’s F test.

Procedures

  • Test the hypothesis that a fixed set of treatment means are all equal, given data from each treatment is in a single column
  • Test the hypothesis that all treatments came from the same population
  • Test the hypothesis that a mean difference equals zero, given that the assumption of equal variances is tenable
  • Test the hypothesis that a mean difference equals zero, given unequal n’s and the potential that a larger variance will occur in the treatment with the smaller n

Chapter 14 Appendix A Study Guide for Multiple Group Comparisons

Independent-samples Student’s t test: Analyses to Run

  • Use a 2-group CATEGORICAL variable W
  • Use a SCALE variable Y
  • Run descriptive statistics with W as the grouping variable and Y as the dependent variable
  • Run an independent samples Student’s t test with W as the grouping variable (because it is a 2-group variable or you could use just 2 groups of some variable that has more than 2 groups) and Y as the dependent variable
  • Run a nonparametric test for comparing group means (e.g., Mann-Whitney U test or Wilcoxon Rank Sum test) with W as the grouping variable (because it is a 2-group variable or you could use just 2 groups of some variable that has more than 2 groups) and Y as the dependent variable
  • Run one-way ANOVA with W as a 2-group grouping variable and Y as the dependent variable
  • Run a linear regression with Y as the dependent variable and W as the independent variable (this will only work with W as a 2-group variable)
  • Run an error bar plot with Y as the dependent variable and W as the independent variable

Using the descriptive statistics output, respond to the following items for the Y Dependent Variable

  1. Provide the most appropriate research question for this analysis, both as a difference question and as a relationship question
  2. Provide the statistical null hypothesis using both appropriate symbols and words
  3. Is the assumption of normality tenable for both groups in the independent Student’s t test analysis (that is, is the assumption met, defensible, or reasonable)?
  4. Regardless of whether the assumption is met for both groups, is there a greater concern about skewness, kurtosis, neither, or both? Report and interpret the statistical and/or graphical evidence for your decision (t statistics and/or confidence intervals are good to use here).
  5. Report which (if any) W level/group shows the most extreme values. Explain and describe the outliers if they exist.

Using the homogeneity of variance test output, respond to the following items for the Y Dependent Variable

  1. Using appropriate symbols or words or both, provide the statistical null hypothesis for the homogeneity of variance test.
  2. Is the assumption of homoscedasticity (i.e. equality of variances) tenable (i.e., reasonable, defensible, met) for the independent Student’s t test analysis? Use p (Sig.) values as evidence to explain your answer.

Using the INDEPENDENT-SAMPLES T TEST output, respond to the following items for the Y Dependent Variable

  1. Show or explain how to calculate the standard error of the mean for the W=1 level/group is calculated.
  2. Show or explain how to calculate the 95% confidence interval for the mean for W=1 level/group (report the t critical value and df for this confidence interval – note that the df here is based on group size not sample size and therefore is not the same as in the df in the “Independent Samples Test” table).
  3. Show or explain how to calculate the Mean Difference between W groups/levels
  4. Show or explain how to calculate the “Equal Variances Assumed” degrees of freedom (df)
  5. Show or explain how to calculate the “Equal Variances Assumed” 95% Confidence Interval for the Mean Difference
  6. Show or explain how to calculate the “Equal Variances Assumed” t statistic (you do not need to calculate the pooled standard deviation or pooled standard error)
  7. Is there a statistically significant difference in mean Y scores between the two groups? Use a confidence interval for the mean difference as evidence to explain your answer
  8. Is there a statistically significant difference in mean Y scores between the two groups? Use the calculated t statistic compared to a t critical value as evidence to explain your answer (include both the degrees of freedom and the critical value used for this test)
  9. Is there a statistically significant difference in mean Y scores between the two groups? Use a p value to explain your answer
  10. If the true mean difference (i.e., reality) between the group means on Y is exactly equal to ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain.
  11. If the true mean difference (i.e., reality) between the group means on Y is larger than ZERO in the population, then did you make an error or not when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Explain.
  12. Based on the decision you reached regarding the Null Hypotheses in the analyses, what type of error might you have made in the analyses?
  13. Which level/group mean was higher in the sample for Y?
  14. Which level/group mean would you estimate to be higher in the population for Y, if any?
  15. Can we confidently conclude that one group scored higher than the other on the variable Y?
  16. If the true mean difference (i.e., reality) between the group means on Y is exactly equal to ZERO in the population, then what type of error might you have made when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Or was there no error? Explain.
  17. If the true mean difference (i.e., reality) between the group means on Y is larger than ZERO in the population, then did you make an error or not when reaching your decision in the previous item about the Null Hypothesis of no difference in means? Explain.
  18. Show or explain how the standardized mean difference effect size (i.e., Cohen’s d) is calculated for the mean comparison of the Y variable between W groups. Use either pooled SD or average SD (calculated as the square root of the average variances) and be sure to indicate what standard deviation you used in the Cohen’s d formula. Interpret the statistic.

Using the BIVARIATE REGRESSION output, respond to the following items for the Y Dependent Variable

  1. Show or explain how the magnitude of the relationship between W and Y (i.e., r2) is calculated for the mean comparison of the Y variable between W groups. You may run this as a regression to obtain the answer if you prefer.

Using ALL the output in this section above, respond to the following item for the Y Dependent Variable

  1. Interpret the results for the independent t test in an APA-style report to answer the research question and to describe in detail the relationship between W and Y. Whether statistically significant or not, refer to descriptive statistics, (e.g., means, standard deviations, mean differences, effect sizes, and/or confidence intervals), graphs, inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between groups. Be sure to discuss assumptions, outliers, statistical significance, and the answer to the research question.

Using the NONPARAMETRIC INDEPENDENT-SAMPLES MANN-WHITNEY U (or the equivalent to WILCOXON RANK-SUM) TEST output, respond to the following items for the Y Dependent Variable only

  1. What is the average rank for the W=1 level/group?
  2. Is there a statistically significant difference in the mean ranks between the two levels/groups? Provide the evidence from your output that supports your answer.
  3. Approximately what is the two-tailed probability of obtaining the resulting z statistic (as an absolute value) if the null hypothesis is true?
  4. Interpret the results for the Mann-Whitney-Wilcoxon test in an APA-style report to answer the research question and to describe in detail the relationship between W and Y. Whether statistically significant or not, refer to descriptive statistics, (e.g., mean ranks but also report means, standard deviations, mean differences, effect sizes, and/or confidence intervals), graphs, inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between groups. Be sure to discuss assumptions, outliers, statistical significance, and the answer to the research question.