INTRODUCTION

In this chapter we will consider two test statistics which are used with single groups of units. Please note that although the test statistics we present use descriptive statistics (i.e., means and variances) that are commonly found with interval and ratio data, many statisticians (e.g., Nunnally, 1967, pp. 24 25), argue strongly that they should not be limited to such scales of measurement (i.e., they can be used with ordinal data). Nunnally’s argument is convincing, and we generally agree with it, and it is not unusual to find parametric statistics used with ordinal data. We warn, however, that it is critically important that you understand the meaning of your results in terms of their scale of measurement. For example, beyond the careful interpretations required when using these statistics with ordinal data, there are times when using even interval data requires careful interpretation because of the lack of a meaningful zero in the scale.

We will begin by briefly reviewing the elements of hypothesis testing for the z statistic. We will then consider a new statistic, called Student’s t statistic, and its probability distribution. An assumption that is made in the use of both the z statistic and the t statistic is that the dependent variable follows a normal distribution in the population of interest. We will conclude this chapter by considering transformations that are useful when this assumption is violated. (The section on transformations is an optional section, which can be included in classes where time permits.)

In the presentation of the z and t test statistics different alternative hypotheses and different conditions (referred to as the “state of affairs”) will be considered, but only one data set will be used. That is, the same data set will be used for all of the presentations considered here, but each time the data will be considered with a different story, i.e., the scenario will change. For example, the same data set was used for the t statistic with a given state of affairs and four different alternative hypotheses. This was done in order to emphasize the importance of the alternative hypothesis in determining how a test statistic is interpreted, and of the state of affairs in selecting a test statistic.

z STATISTIC TEST OF THE NULL HYPOTHESIS THAT THE POPULATION MEAN EQUALS· A CONSTANT (H0: μ = c0)

The example presented in this section is based on the study discussed in chapter 11 concerning the mean intelligence level of the delegates to the United Nations. Our discussion will focus on the four possible research hypotheses that were open to Mary Barth in that study. Before we consider these cases, let us consider the state of affairs that must be present before you can use the test statistic to be described, the test statistic, the assumptions required for the test statistic to be valid what it means to violate these assumptions, and the research problem that is common to all of the cases.

State of Affairs

The state of affairs that must exist before you can consider the following test statistic are:

  1. You are able to select a random sample of a single group of units.
  2. You are interested in the mean of the population from which you will randomly draw your sample of units.
  3. The variance in the population is known. (This condition was underlined because it differentiates this state of affairs from those presented in the next section regarding the t distribution.)

Test Statistic in the Sampling Distribution

The test statistic which will be used for the four cases considered in this section is:

\[ \begin{equation} z = \frac {M_X - c_0} {\sigma ⁄ \sqrt{n}} \tag{12-1} \end{equation} \] where

  • \(M\) is the sample mean (also denoted as \(\overline{X}\)),
  • \(c_0\) is some constant represenent the Hypothesized Mean (\(\mu_{Hypothesized}\) or \(\mu_H\)), which is most commonly some known population mean
  • \(\sigma\) is the known population standard deviation (recall that the z statistic requires that we know the population standard deviation)
  • \(n\) is the sample size

Recall that \(\sigma / \sqrt{n} = \sigma_X / \sqrt{n} = \sigma_{M_X} = \sigma_{\overline{X}}\), which is called the standard error of the mean. If we happen to know the standard error of the mean rather than the standard deviation or sample size, we could use the standard error direction in formula (12-1) instead of the standard deviation and \(\sqrt{n}\).

Assumptions

Mathematicians and mathematical statisticians who study the distributions and the probabilities of the statistics we use have warned us that certain statistics can only be used confidently when specific mathematical assumptions have been met. That is, statistical tests are only valid when their assumptions have been met. A valid statistical test is one whose level of significance and power are as specified by the researcher, that is, alpha and power have not changed because of a violation of one or more assumptions. The preceding statistical test will be valid when:

  1. The units are independent of one another. That is, the score that one unit receives does not affect the score that another unit receives.
  2. The dependent variable has a normal distribution in the population of units.

Violation of the Assumptions

The importance of the preceding assumptions is as follows:

  1. The assumption that the units are independent of one another is extremely important because if it is violated the level of significance (i.e., the probability of rejecting a true null hypothesis) can increase dramatically, e.g., from .05 to .40.
  2. Given a small sample size, a violation of the assumption of normality does not seriously affect the level of significance, for example, it may increase from .050 to .051 or decrease from .050 to .049, and with large samples, the assumption of normality in the population can generally be ignored. Here a reasonably large sample (based on a review of the literature of violations of the normality assumption) is one that is greater than or equal to twenty five units.

A robust statistical test is one that is valid even under violations of one or more of its assumptions (or it has no assumptions). Here statisticians would say that the z test is robust to violations of its normality assumption.

Case A: An Exploratory Analysis where the Research Hypothesis agrees with \(H_A: \mu \ne c_0\)

Research Problem:

The research problem that we will consider in each of the following cases is: Is the mean on the Wechsler Adult Intelligence Scale of the delegates to the United Nations equal to 100?

Research Hypothesis:

The mean on the Wechsler Adult Intelligence Scale of the delegates to the United Nations differs from 100.

Statistical Hypotheses: \[ \begin{align} H_0&: \mu = \mu_{Hypothesized} = \mu_H \\ H_0&: \mu = \mu_{Population} \\ H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Note that both \(\mu\) and \(\mu_{Hypothesized}\) are means. \(\mu\) is the mean for the target population from which we are sampling (to get \(M_X\) or \(\overline{X}\)) to whereas \(\mu_H\) is some known—or otherwise hypothesized—population mean. Therefore, we are testing whether our target population mean equals some known mean—or otherwise hypothesized or important value. We essentially want to know whether OUR population mean equals some important value in order to decide whether OUR population is the same as the hypothesis population (because we assume the other distributional characteristics are the same: variance, skewness, kurtosis).

Mary set her level of significance at 0.05 and power at 0.85. She decided to establish an a priori effect size of 8 points or more. That is, she would like to detect a U.N. delegate population mean on the Wechsler Adult Intelligence Scale that differed from the population mean of all adults by 8 points or more. From equation (11-2), her a priori effect size was d = 0.53. She calculated her sample size, using equation (11-3), as follows:

\[ n = \left[ \frac {15(1.96+1.04])} {8} \right] ^2 = 31.64 \approx 32 \] Here, \(\sigma = 15\), \(\alpha_P = .05/2 = .025\), \(z_{(.975)} = 1.96\), \(1 – \beta = .85\), \(z_{(.85)} = 1.04\), and \(|\mu_A – \mu_0|= 8\).

Critical Values

In Table 9.1(b) the critical values of the z statistic for the two tailed test with \(\alpha = .05\) are 1.96 and 1.96 (i.e., \(z_{(.025)} = 1.96\) and \(z_{(.975)} = 1.96\)).

At this point the a priori parameters of hypothesis testing have been established (i.e., your “bet” is made).

Randomly Select and Measure the Sample Units.

The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. Note that, in this case, we know that the population \(\mu = 100\) and population \(\sigma = 15\) based on how the Wechsler scores are scaled.

Calculate the Test Statistic.

In Figure 12b the test statistic is found to be -3.01 using the information from the Descriptive Statistics results shown below. Note that for the z statistic we use the population \(\sigma_X = 15\) not the sample standard deviation or standard error in Figure 12b. Also, we use the known population mean \(\mu = 100\) for both our statistical null hypothesis and statistic calculation.

\[ \begin{align} z &= \frac {M_X - \mu_{Hypothesized}} {\sigma_{M_X}} \\ z &= \frac {\overline{X} - \mu_{Population}} {\sigma_X/\sqrt{n}} \\ z &= \frac {92.031 - 100} {15/\sqrt{32}} \\ z &= \frac {-7.969} {15/5.6569} \\ z &= \frac {-7.969} {2.6517} \\ z &= -3.0052 \\ z &\approx -3.01 \end{align} \]

Make A Decision About the Null Hypothesis.

For this two tailed scenario, the statistical hypotheses were: \[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] The decision was made to reject the null hypothesis because the absolute value of the test statistic (\(|-3.01| = 3.01\)) was greater than the absolute value of the critical value (i.e., 3.01 > 1.96). We could also do this as -3.01 is less than -1.96 (\(-3.01 < -1.96\)), but many people find this approach less intuitive (although some prefer it). This same decision is reached when the p value is considered. In this case we have, using Table 9.1(a) that \(p(z > 3.01) = .5000 – 0.4987 = .0013\). Using \(p(z < -3.01)\) we see the same \(p = .0013\) the. Therefore, p(z < -3.01) + p(z > 3.01) yields p < 0.0026, and since \(p < \alpha\) (i.e., .0026 < .05), we reject the null hypothesis. Remember, that either the critical values or the p value may be used to decide if you will reject or fail to reject the null hypothesis. That is, both approaches – if rounded equivalently and appropriately – must result in the same decision. Both values were given here for completeness and illustrative value.

Probability Values for z statistics (p values)

Statistical computer programs (like R, jamovi, JASP, SPSS) will calculate probability values for us when they run the analyses. Indeed, that’s one of the biggest benefits of using such programs. However, we can find the statistical probabilities for common distributions in many programs and online apps. For example, jamovi has a module called “distrACTION” that will calculate probabilities for the normal, t, F, and chi-square distributions, among others. Using distrACTION, we will find the p value for the preceding z statistic.

  • Click distrACTION (after installing the module)
  • Choose Normal Distribution
  • For z statistics, enter Mean = 0 and SD = 1
  • Check “Compute Probability”
  • Enter x1 = -3.01 (the actual example z statistic based on the calculation above, but you could enter 3.01 as the absolute value of our z statistic)
  • Because z was negative, click the option for “\(P(X \le x1)\)” (but if you entered x1 = +3.01, you will need to click the option for “\(P(X \ge x1)\)”)
  • Either way, in the output you will see Probability = 0.00131 (if you set the option for at least 5 decimal places), this is the 1-tailed p value, that is the area under the curve above (beyond) the z value
  • If you want 2-tailed p value (as we do) you need to double it, therefore, p = .00262, to get the area beyond z in both directions
  • Therefore, the area under the normal curve BEYOND the z statistic with an absolute value of 3.01 (that is, less than -3.01 and greater than +3.01) is .00262

Cases B, C, and D: Confirmatory Analyses

Case B

In confirmatory analyses, the researcher specifies the direction of the expected differences. For Case B, Mary stated as her research hypothesis that the delegates’ mean on the Wechsler Adult Intelligence Scale was less than 100. This leads to the following statistical hypothesis:

\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu < 100 \\ \end{align} \] Testing the null hypothesis with a 0.05 significance level, 0.91 power, and 0.53 effect size, Mary found the test statistic to be -3.01. Since this was less than the critical value (-1.645), she decided to reject the null hypothesis. We could also perform this process using the absolute value of the z statistic as long as we pay attention to whether the sample mean is below or above the hypothesized mean.

Case C

Case C tests the research hypothesis that the delegates’ mean is greater than 100.

\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu > 100 \\ \end{align} \] The test statistic still equals -3.01. However, in this case, Mary fails to reject the null hypothesis because the critical value of the z statistic for the one tailed test with alpha = 0.05 is +1.645. This is an unusual case which would require the researcher to carefully reconsider the reasons for the research hypothesis. This is one of the major reasons not to use one tailed tests. That is, if we predict the wrong direction for the difference we are left with inconclusive results even when the difference is large. Surprisingly, this happens. A second reason to avoid a one tailed test without really strong a priori empirical and theoretical support is that it is a less conservative approach, which therefore can results in more Type I errors (but more on this idea later).

Case D

In Case D, Mary tests the hypothesis that the mean intelligence level at the U.N. equals 100. But here her research hypothesis is that the mean = 100. So, technically these would be her statistical hypotheses:

\[ \begin{align} H_0&: \mu \ne 100 \\ H_A&: \mu = 100 \\ \end{align} \] However, the process we are discussing does not allow this approach. Therefore, Mary must use the same statistical hypotheses as Case A:

\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Therefore, this situation exactly parallels Case A, and results in the same test strategy described for it. However, in this case, we HOPE to fail to reject the null hypothesis. Therefore, Mary needs to fail to reject the null hypothesis to have evidence to support her research hypothesis. However, Mary rejects the null hypothesis of \(\mu=100\) and therefore has no evidence to conclude that \(\mu = 100\).

Note that failure to reject the null hypothesis IS NOT actually evidence the \(\mu=100\), we just have no evidence to reject it. But there are many reasons we may fail to reject a null hypothesis—not only because the null hypothesis is actually true in the population. There are newer approaches to null hypothesis testing that are able to test the equality of means, but they are not sidely used and require a different logic than the approach we are taking (which is still the approach most researchers take).

Table 12.1 Intelligence test scores from Mary Barth’s research

Two tables placed side by side.
Case_ID IQ_Score
1 104
2 105
3 61
4 93
5 111
6 66
7 74
8 105
Case_ID IQ_Score
9 72
10 87
11 123
12 89
13 102
14 59
15 115
16 94
Case_ID IQ_Score
17 113
18 105
19 116
20 97
21 74
22 99
23 77
24 84
Case_ID IQ_Score
25 83
26 100
27 104
28 83
29 100
30 80
31 86
32 84

Figure 12a Histogram and Boxplot/Violin Plot with median, mean, and data points for data shown in Table 12.1

  out <- capture.output(
    jmv::descriptives(
    data = data,
    vars = IQ_Score,
    hist = TRUE,
    dens = TRUE,
    box = TRUE,
    violin = TRUE,
    dot = TRUE,
    boxMean = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

Figure 12b Descriptive results needed for possible tests of hypotheses based on the sample data shown in Table 12.1

  jmv::descriptives(
    data = data,
    vars = IQ_Score,
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE)

 DESCRIPTIVES

 Descriptives                            
 ─────────────────────────────────────── 
                              IQ_Score   
 ─────────────────────────────────────── 
   N                                32   
   Missing                           0   
   Mean                         92.031   
   Std. error mean              2.9226   
   95% CI mean lower bound      86.071   
   95% CI mean upper bound      97.992   
   Median                       93.500   
   Standard deviation           16.532   
   Variance                     273.32   
   IQR                          22.000   
   Range                        64.000   
   Minimum                      59.000   
   Maximum                      123.00   
   Skewness                   -0.20762   
   Std. error skewness         0.41446   
   Kurtosis                   -0.67077   
   Std. error kurtosis         0.80937   
 ─────────────────────────────────────── 
   Note. The CI of the mean assumes
   sample means follow a
   t-distribution with N - 1 degrees
   of freedom

THE SAMPLING DISTRIBUTION OF THE t STATISTIC

Introduction: A Comparison of z statistics and t statistics

In our discussions thus far our test statistic has been the z statistic. We were able to use the z statistic because: 1. we assumed that the dependent variable was normally distributed in the population of units of interest (so that under the Central Limit Theorem the sampling distribution of the sample means, and therefore their transformed z scores, would be normal) and 2. the population variance was known. Here the z statistic was written as presented in equation 12 1 (as represented by the σ in the formula):

\[ z = \frac {M_X-c_0} {\sigma ⁄ \sqrt{n}} = \frac{M_X-c_0}{\sigma_{M_X}} = \frac{M_X-c_0}{\text{std. error}} \] Because tabled values of z for the standard normal distribution exist, we were able to establish critical values prior to taking measurements on units, and p levels after the sample z statistic was obtained.

Frequently we will be able to meet condition (a) above, but not condition (b). That is, our measure will be normally distributed in the population, but the population variance will be unknown. In this case we can form a statistic known as the t statistic by replacing the known variance in the denominator of the z statistic by its sample estimate. Then, the t statistic may be written as:

\[ \begin{equation} t = \frac{M_X-c_0}{s_X⁄\sqrt{n}} = \frac{M_X-c_0}{s_{M_X}} = \frac{M_X-c_0}{s_{\overline{X}}} \tag{12-2} \end{equation} \] Note that the only thing that differs between the calculation of the z statistic and the t statistic for a given sample, is that the z statistic uses the population standard deviation in its denominator while the t statistic uses the sample standard deviation. However, there is a much greater difference between these two statistics in terms of finding critical values. This is because the z statistic is independent of sample size, since, except for the sample mean, it is based on population parameters. That is, there is only one standard normal distribution regardless of the sample size. However, since the t statistic is partially based on the sample standard deviation, we find a different t distribution for each sample size.

THE FAMILY OF t DISTRIBUTIONS

The different t distributions can be more easily understood when we consider the theoretical function which generates them. Because the probability density function for the t statistic is not required for practical research work, we will not provide it here. However, examination of the formula makes it clear that the only unknown value (besides t) needed to use the distribution formula is something referred to as the degrees of freedom (usually denoted as v in the formula). We will find that for all of the states of affairs requiring t statistics that we will consider in this book, the degrees of freedom, v, will be dependent upon sample size. For example, given a single group of subjects where our interest is in the population mean, the degrees of freedom are found as v = n – 1. Therefore, the formula will result in different t distributions different sample sizes (and therefore different degrees of freedom).

Critical Values: A Table of t-Values

Using Table 9.1(b) the probability of drawing a mean from a given interval of the normal distribution can be found (see chapter 9). Here, since the normal distribution is symmetrically distributed, we are able to use this Table to find critical values for most levels of significance. For the t distribution, however, we would have to have a separate table like Table 9.1(b) for every sample size (hence an infinite, or at least very large, number of tables). Since this is impractical, only particular values of the t statistic are usually tabled. Such a table is presented as Table 12.2.

Table 12.2 Critical Values for the t distribution at several levels of significance (use \(\alpha\) for one-tailed, \(\alpha/2\) for two-tailed) with degrees of freedom as the left-most column

α=0.1000 α/2=0.2000 α=0.0500 α/2=0.1000 α=0.0250 α/2=0.0500 α=0.0100 α/2=0.0200 α=0.0050 α/2=0.0100 α=0.0005 α/2=0.0010
1 3.0777 6.3138 12.7062 31.8205 63.6567 636.6192
2 1.8856 2.9200 4.3027 6.9646 9.9248 31.5991
3 1.6377 2.3534 3.1824 4.5407 5.8409 12.9240
4 1.5332 2.1318 2.7764 3.7469 4.6041 8.6103
5 1.4759 2.0150 2.5706 3.3649 4.0321 6.8688
6 1.4398 1.9432 2.4469 3.1427 3.7074 5.9588
7 1.4149 1.8946 2.3646 2.9980 3.4995 5.4079
8 1.3968 1.8595 2.3060 2.8965 3.3554 5.0413
9 1.3830 1.8331 2.2622 2.8214 3.2498 4.7809
10 1.3722 1.8125 2.2281 2.7638 3.1693 4.5869
11 1.3634 1.7959 2.2010 2.7181 3.1058 4.4370
12 1.3562 1.7823 2.1788 2.6810 3.0545 4.3178
13 1.3502 1.7709 2.1604 2.6503 3.0123 4.2208
14 1.3450 1.7613 2.1448 2.6245 2.9768 4.1405
15 1.3406 1.7531 2.1314 2.6025 2.9467 4.0728
16 1.3368 1.7459 2.1199 2.5835 2.9208 4.0150
17 1.3334 1.7396 2.1098 2.5669 2.8982 3.9651
18 1.3304 1.7341 2.1009 2.5524 2.8784 3.9216
19 1.3277 1.7291 2.0930 2.5395 2.8609 3.8834
20 1.3253 1.7247 2.0860 2.5280 2.8453 3.8495
21 1.3232 1.7207 2.0796 2.5176 2.8314 3.8193
22 1.3212 1.7171 2.0739 2.5083 2.8188 3.7921
23 1.3195 1.7139 2.0687 2.4999 2.8073 3.7676
24 1.3178 1.7109 2.0639 2.4922 2.7969 3.7454
25 1.3163 1.7081 2.0595 2.4851 2.7874 3.7251
26 1.3150 1.7056 2.0555 2.4786 2.7787 3.7066
27 1.3137 1.7033 2.0518 2.4727 2.7707 3.6896
28 1.3125 1.7011 2.0484 2.4671 2.7633 3.6739
29 1.3114 1.6991 2.0452 2.4620 2.7564 3.6594
30 1.3104 1.6973 2.0423 2.4573 2.7500 3.6460
31 1.3095 1.6955 2.0395 2.4528 2.7440 3.6335
32 1.3086 1.6939 2.0369 2.4487 2.7385 3.6218
33 1.3077 1.6924 2.0345 2.4448 2.7333 3.6109
34 1.3070 1.6909 2.0322 2.4411 2.7284 3.6007
35 1.3062 1.6896 2.0301 2.4377 2.7238 3.5911
36 1.3055 1.6883 2.0281 2.4345 2.7195 3.5821
37 1.3049 1.6871 2.0262 2.4314 2.7154 3.5737
38 1.3042 1.6860 2.0244 2.4286 2.7116 3.5657
39 1.3036 1.6849 2.0227 2.4258 2.7079 3.5581
40 1.3031 1.6839 2.0211 2.4233 2.7045 3.5510
50 1.2987 1.6759 2.0086 2.4033 2.6778 3.4960
60 1.2958 1.6706 2.0003 2.3901 2.6603 3.4602
70 1.2938 1.6669 1.9944 2.3808 2.6479 3.4350
80 1.2922 1.6641 1.9901 2.3739 2.6387 3.4163
100 1.2901 1.6602 1.9840 2.3642 2.6259 3.3905
150 1.2872 1.6551 1.9759 2.3515 2.6090 3.3566
200 1.2858 1.6525 1.9719 2.3451 2.6006 3.3398
500 1.2832 1.6479 1.9647 2.3338 2.5857 3.3101
1000 1.2824 1.6464 1.9623 2.3301 2.5808 3.3003
100000~z 1.2816 1.6449 1.9600 2.3264 2.5759 3.2906

Table 12.3 Critical Values for many degrees of freedom (left-most column) of the t distribution (use \(\alpha\) for one-tailed, \(\alpha/2\) for two-tailed)

In Table 12.2, t values are given for different degrees of freedom (sample sizes), abbreviated as “df” in the left hand column, and for probabilities most frequently used in parameter estimation and hypothesis testing (e.g., .10, .05, .025, .01). At the top of this table the probabilities (i.e., levels of significance) are presented for one tailed and two tailed tests. Like the normal distribution the t distribution is symmetrically distributed with a mean of zero, and as for the z scores in the top of Table 9.1(b), only the positive values of the t scores are given in Table 12.2. Therefore, negative critical values are found by finding the associated positive t value and then making this value negative. The notation t(p, v) will be used to indicate the percentile rank, p, of a t statistic from a t distribution with v degrees of freedom.

To illustrate the t notation and the use of Table 12.2, let us consider an example where we are ready to establish critical values in a test of the null hypothesis H0: μ = 50 under conditions where the variance was unknown. Then, at the .05 level of significance with a sample of 20 units, the degrees of freedom would be 19 (i.e., df = n – 1), and the two tailed critical values would be found as 2.093 and +2.093. These critical values would be written as: t(.025, 19) = 2.093, and t(.975, 19) = 2.093. Here, if the alternate hypothesis were HA: μ > 50, the one tailed critical value would be found as t(.95, 19) = 1. 729, but if the alternate hypothesis were HA: μ < 50, the one tailed critical value would be t(.05, 19) = 1.729.

A Plot of Two t distributions and the Standard Normal Distribution

Figure 12e shows the standard normal distribution and two t distributions, one t distribution with 5 degrees of freedom, and one t distribution with 10 degrees of freedom. In Figure 12e, you can see that there are small differences between the t distributions and the standard normal distribution. These differences are largest when the degrees of freedom are small but decrease sharply as the degrees of freedom increase. This can be seen in Table 12.2 where the t values are larger when the degrees of freedom are small but become smaller and more homogeneous in each column as the degrees of freedom increase.

As the degrees of freedom become infinite, the t distribution becomes the standard normal distribution. This can be seen by considering the t values in the last row of Table 12.2 where the degrees of freedom (df) are huge (i.e., df = 100000). Here, the critical values are the same as those found using the z scores in Table 9.1(b). For example, in Table 12.2 with df = 100000 and = .05, the critical values are \(\pm 1.960\) for a two tailed test, and are equal to 1.645 or -1.645 for the two possible one tailed tests. These are the same z scores that we found for the p values using Table 9.1(b), that is, \(t_{\alpha}(\infty) = z_{(\alpha)}\).

Figure 12d Comparison of probabilities of a few example z and t statistics (with t degrees of freedom in brackets)

tails alpha z t(10000) t(1000) t(100) t(40) t(31) t(30) t(10)
1-tailed 0.050 1.6449 1.6450 1.6464 1.6602 1.6839 1.6955 1.6973 1.8125
2-tailed 0.050 1.9600 1.9602 1.9623 1.9840 2.0211 2.0395 2.0423 2.2281
1-tailed 0.010 2.3263 2.3267 2.3301 2.3642 2.4233 2.4528 2.4573 2.7638
2-tailed 0.010 2.5758 2.5763 2.5808 2.6259 2.7045 2.7440 2.7500 3.1693
1-tailed 0.001 3.0902 3.0910 3.0984 3.1737 3.3069 3.3749 3.3852 4.1437
2-tailed 0.001 3.2905 3.2915 3.3003 3.3905 2.7045 3.6335 3.6460 4.5869

Figure 12e The standard normal distribution compared to two t distributions

Other Uses of the t statistic

In the latter introduction to the t distribution we transformed the sample mean into a t statistic and compared it to the z statistic for the situation where we were interested in the mean of a population. In the following chapters you will be introduced to different sample values that can be transformed into t statistics. We will find that the t distribution can be used as the test statistic not only when we are interested in the population mean, but also when we are interested in the difference between population means of related and independent scores, or the population correlation coefficient.

STUDENT’S t TEST: A TEST OF THE NULL HYPOTHESIS THAT THE POPULATION MEAN EQUALS A CONSTANT: \(H_0: \mu = c_0\)

In this section we will discuss the conditions under which you would use Student’s t test to test the null hypothesis that the population mean is equal to a constant. Here we will examine the same data set that we considered in the last section for the z statistic (i.e., Mary Barth’s United Nations study), but we will slightly change the scenario. The change that is made is that here Mary Barth has decided to use a measure of intelligence that has been found to be highly correlated with the Wechsler Adult Intelligence Scale, but for which there is no large scale norm data. To give this instrument a name, we shall call it the World Adult Intelligence Scale (WAIS). Mary decided to use this instrument because it required much less time to administer, was available in different languages, and had the same mean and standard deviation for each language. Therefore, in this study we have all of the conditions that were present in the latter section, except the population variance is unknown. Let us know consider the elements of hypothesis testing given this information.

State of Affairs

The state of affairs that must exist before you can consider the following test statistic are:

  1. You are able to select a random sample of a single group of units.
  2. You are interested in the mean of the population from which you will randomly draw your sample of units.
  3. The variance in the population is unknown. (This condition was underlined because it differentiates this state of affairs from those found in the preceding section regarding the z distribution.)

Test Statistic In the Sampling Distribution

The test statistic which will be used for the four cases considered in this section is (as shown in equation 12-2):

\[ \begin{equation} t = \frac {M_X - c_0} {s_X ⁄ \sqrt{n} } \end{equation} \] When the null hypothesis is true, the sampling distribution of this statistic follows a t distribution with (n – 1) degrees of freedom, where n is the sample size.

Assumptions

The preceding statistical test will be valid when:

  1. The units are independent of one another. That is, the score that one unit receives does not affect the score that another unit receives.
  2. The dependent variable is normally distributed in the population of units.

Violation of the Assumptions

The importance of the preceding assumptions is as follows:

  1. The assumption that the units are independent of one another is extremely important because if it is violated the level of significance (i.e., the probability of rejecting a true null hypothesis) can increase dramatically (e.g., from .05 to .40).
  2. Given a small sample size, a violation of the assumption of normality does not seriously affect the level of significance for the two tailed test. However, it may be a problem for one tailed tests (see Srivastava, 1959). Given large samples, the assumption of normality in the population can generally be ignored. Here a reasonably large sample is one that is greater than or equal to twenty five units.

Case A: An Exploratory Analysis where the Research Hypothesis agrees with \(H_A: \mu \ne c_0\)

Research Problem

The research problem that we will consider in each of the following cases is: Is the mean on the World Adult Intelligence Scale (WAIS) of the delegates to the United Nations equal to 100?

Research Hypothesis

The mean on the World Adult Intelligence Scale of the delegates to the United Nations differs from 100.

Statistical Hypotheses:

\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \]

Determine valid and reliable measures of the dependent and Independent variables

Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed in chapter 11.

Level of Significance

The probability of rejecting a true null hypothesis (i.e., the probability of making a Type I error) was set at .05.

Power

Mary Barth decided that she would like her power to be at least .85. That is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.

A Priori Effect Size

In a single group exploratory analysis where the variance is unknown we will rely of the work of Cohen (1977) to aid us in selecting an a priori Effect Size.

For experiments with two groups, Cohen describes three different effect sizes: “small” is \(d = .20\), “medium” is \(d = .50\), and “large” is \(d = .80\). As a last resort, these can be used as guides in exploratory studies where information is lacking.

If we denote the single group effect size of Equation (11-2) as \(d_S\), then the relationship of \(d_S\) to Cohen’s two group effect size, d, is:

Equation 12-4 \[ \begin{align} d &= d_S \sqrt{2} \\ &\text{or} \\ d_s &= \frac{d}{\sqrt{2}} \tag{12-4} \end{align} \] Therefore, in terms of the single groups effect size \(d_S\), Cohen’s small, medium, and large effect sizes become:

\[ \begin{align} \text{Small } d_S &= .20⁄\sqrt{2} ≈ 0.14 \\ \text{Medium } d_S &= .50⁄\sqrt{2} \approx 0.35 \\ \text{Large } d_S &= .80⁄\sqrt{2} \approx 0.57 \end{align} \] In light of Cohen’s suggestions for effect size, Mary Barth selected a large a priori effect size (i.e., \(d = 0.80\); \(d_S = 0.57\)) as one that she felt would be present among the U. N. delegates. Therefore, we will use \(d_S = 0.57\) when we look at the tables below for the large standard mean difference effect size.

Sample Size

Tables 12.4a-d were calculated using the pwr package in R (based on Cohen, 1977) where sample size is listed for effect sizes (d) for the single group case. These are the same values obtained from the jamovi Power module. To use this table for a single group, simply use the single group effect size d in the table. In the Tables 12.4a-d “\(\alpha/2\)” symbolizes the level of significance for a two tailed test, and “\(\alpha\)” symbolizes the level of significance for a one tailed test. Mary found that she needed approximately 27 subjects when \(\alpha/2 = .05\), \(d = 0.80\) (here, \(d_S = 0.57\)), and \(power = .85\). However, since Mary could conveniently collect information on 32 subjects, she decided to proceed with this number. In Table 12.4c we see that by using 32 subjects Mary’s power is about .90.

[1] 11 10

Table 12.4a N to detect d by Single Sample t test for \(\alpha/2 = .01\) (\(\alpha = .005\))

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 1.2
0.25 365 94 44 26 18 14 11 9 7 6
0.5 667 170 78 45 30 22 17 14 10 8
0.6 804 204 93 54 36 26 20 16 12 9
0.67 913 231 105 61 40 29 22 18 13 10
0.7 965 244 111 64 42 31 23 19 13 11
0.75 1060 268 121 70 46 33 25 20 14 11
0.8 1172 296 134 77 51 36 28 22 16 12
0.85 1309 330 149 85 56 40 31 24 17 13
0.9 1492 376 169 97 63 45 34 27 19 14
0.95 1785 449 202 115 75 53 40 32 22 16
0.99 2407 605 271 154 100 71 53 41 28 21

Table 12.4b N to detect d by Single Sample t test for \(\alpha/2 = .02\) (\(\alpha = .01\))

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 1.2
0.25 276 71 34 20 14 11 9 7 6 5
0.5 544 139 63 37 25 18 14 12 9 7
0.6 669 170 77 45 30 22 17 14 10 8
0.67 768 195 88 51 34 25 19 15 11 9
0.7 816 206 94 54 36 26 20 16 11 9
0.75 904 228 103 60 39 28 22 17 12 10
0.8 1007 254 115 66 43 31 24 19 13 10
0.85 1134 286 129 74 48 35 26 21 15 11
0.9 1305 329 148 85 55 39 30 24 16 12
0.95 1580 397 178 102 66 47 35 28 19 14
0.99 2168 544 244 139 90 63 47 37 25 18

Table 12.4c N to detect d by Single Sample t test for \(\alpha/2 = .05\) (\(\alpha = .025\))

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 1.2
0.25 167 44 21 13 9 7 6 5 4 4
0.5 387 98 45 26 18 13 10 9 6 5
0.6 492 125 57 33 22 16 13 10 7 6
0.67 578 146 66 38 26 18 14 12 8 7
0.7 620 157 71 41 27 20 15 12 9 7
0.75 696 176 80 46 30 22 17 13 10 7
0.8 787 199 90 52 34 24 19 15 10 8
0.85 900 227 102 59 38 27 21 17 12 9
0.9 1053 265 119 68 44 32 24 19 13 10
0.95 1302 327 147 84 54 39 29 23 16 12
0.99 1840 462 207 117 76 54 40 31 21 15

Table 12.4b N to detect d by Single Sample t test for \(\alpha/2 = .10\) (\(\alpha = .05\))

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 1.2
0.25 93 25 12 8 6 5 4 3 3 3
0.5 272 69 32 19 13 9 8 6 5 4
0.6 362 92 42 24 16 12 9 8 6 5
0.67 436 110 50 29 19 14 11 9 6 5
0.7 472 119 54 31 21 15 12 9 7 5
0.75 540 136 62 36 23 17 13 10 7 6
0.8 620 156 71 41 27 19 15 12 8 6
0.85 721 182 82 47 31 22 17 13 9 7
0.9 858 216 97 55 36 26 19 15 11 8
0.95 1084 272 122 70 45 32 24 19 13 10
0.99 1579 396 177 100 65 46 34 27 18 13

Critical Values

In Table 12.2 the critical values of the t statistic for the two tailed test with \(\alpha = .05\) are t(.025, 30) = -2.042 and t(.975, 30) = +2.042, where 30 is the degrees of freedom used and .025 and .975 are the probably values that will be outside the middle 95% of the distribution. Mary used df = 30 because she used an older table that only listed df = 30 and df = 40 but did not list df = 31. When this is the case, first consider the t statistic associated with degrees of freedom less than you have (e.g., here, 30). If your calculated test statistic is greater than the critical value for this smaller degrees of freedom, then your calculated t statistic will also be greater than the critical value based on your actual degrees of freedom. However, if your calculated test statistic is between the two critical values given in the table (for degrees of freedom both above and below your actual degrees of freedom), you can generally use linear interpolation to estimate the actual critical value. However, in Table 12.2, we can see that the critical value for the t statistic at df = 31 is 2.0395.

At this point the a priori parameters of hypothesis testing have been established (i.e., your “bet” is made).

Randomly Select and Measure the Sample Units

The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. (The same data was used here as was used for the z statistic so that you could compare the two procedures.)

Check: Outliers

There are several ways to check for outliers on a single variable. Common ways are to use z scores, boxplots, and histograms.

The key for us is that we are not simply trying to identify cases to remove from our dataset. We are looking for cases to investigate further. We do not want to remove any cases without justification. The best justification is that the case does not belong to our stated population and therefore should not have been in our sample. However, there are other reasons that we might feel justified removing a case. For example, if studying job satisfaction, we might not want to include people who have only worked 6 months or are very near retirement. Or if studying students, we might decide we should not include students on academic probation. Ideally, we would have delimited our population before we collected data so such cases would never have been included. But we cannot always think of all the possibilities before collecting data.

When we cannot justifiably remove a case, then we might try running analyses both with and without outliers. When there are not differences in the results and conclusions, then we can easily leave the outliers in the dataset. However, when the results change with and without the outlier, we will need to decide which is the most appropriate data to analyze and report. But we recommend reporting that an outlier does change the results and provide a brief description of the differences. Always remember, outliers can both help and hurt analyses. Just as it is not fair to remove an outlier to improve your results without reporting that, it is also not fair to leave an outlier in your data knowing that it is helping your results without reporting that. We are looking for results with good external validity. Any results that change due to one or two outliers are not stable results, and should be reported as such. The best way to combat outliers is to have large sample sizes. Individual cases have much less influence on results when there are more cases in the data.

z Scores

We have calculated z scores for each case. Frequently, any z score outside the range of -3 < z < 3 are considered outliers. Our view however, based on the description above, is that we must adjust our comparison value to reflect our sample size. With very small samples, for example, less than 50, we recommend using -2 < z < 2 as our decision rule (any |z| > 2 is considered an outlier). As sample size gets larger, maybe up to 200 or 300, we recommend considering any score outside 2.5 < z < 2.5 to be an outlier. Then above 200 or 300, use |z| > 3 as the rule. The rationale behind this recommendation is that the z score we choose is related to probability. We know that about 5% of scores are beyond a z score of ±2 (i.e., outside of the range 2 < z < 2). We also know, based on the probabilities of the normal curve, that less than a ½ percent (i.e., 0.5% or .005) of scores are beyond ±3. If we only have 100 cases, then we wouldn’t expect even one case to be beyond ±3. But we can have cases that are clearly outliers in a dataset with 100 or fewer cases – they just wouldn’t have z scores over ±3. Similarly, using ±2 when you have too many cases would result in too many outliers being identified (e.g., 5% of a sample size of 500 would be 25 outliers expected just due to the normal curve probabilities).

In the z scores below, we can see that none of the cases have z scores over 2.0. Therefore, we do not have any concerns about outliers

Case_ID IQ_Score z_Score
1 104 0.724
2 105 0.784
3 61 -1.877
4 93 0.059
5 111 1.147
6 66 -1.575
7 74 -1.091
8 105 0.784
9 72 -1.212
10 87 -0.304
11 123 1.873
12 89 -0.183
13 102 0.603
14 59 -1.998
15 115 1.389
16 94 0.119
” ”
Case_ID IQ_Score z_Score
17 113 1.268
18 105 0.784
19 116 1.450
20 97 0.301
21 74 -1.091
22 99 0.422
23 77 -0.909
24 84 -0.486
25 83 -0.546
26 100 0.482
27 104 0.724
28 83 -0.546
29 100 0.482
30 80 -0.728
31 86 -0.365
32 84 -0.486

Boxplot

Recall the boxplots from chapter 5, reproduced below, Figure 5g(ii) and Figure 5n. In Figure 5g(ii) you can see that each boxplot has a circle outside the box and whiskers. Any score that is \(1.5*IQR\) above or below the box (i.e., the middle 50% of scores) is considered an outlier and identified with the small circle (o). Any score that is \(3*IQR\) above of below the box is considered an extreme value and marked with an asterisk (*). In Figure 5n, you can see that the leptokurtic distribution (on the left) has multiple extreme values and a single outlier (case 30).

We consider both outliers and extreme values to be outliers, but we always start by considering the most extreme values first. But with normal and leptokurtic, and even skewed distributions, a certain number of outliers is expected. Just because these values are outliers is not cause to remove them. But we should investigate to make sure the cases belong to the population we defined and to make sure the data are correct. Further, we want to look for other problems the outliers may be causing. For example, sometimes an outlier or group of outliers make a distribution skewed. Outliers can often have a bigger impact than a violation of the assumptions, so we want to check.

  out <- capture.output(
    jmv::descriptives(
    formula = Scores ~ Shape,
    data = data,
    box = TRUE,
    dot = TRUE,
    boxMean = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

  junk <- capture.output(
    jmv::descriptives(
    formula = Scores ~ Ratings,
    data = data,
    box = TRUE,
    dot = TRUE,
    boxMean = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

Mary’s data resulted in the following boxplot.

junk <- capture.output(
  
  jmv::descriptives(
    data = data,
    vars = IQ_Score,
    box = TRUE,
    dot = TRUE,
    boxMean = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

Note that jamovi can provide a table for Extreme Values (we actually select an option called “outliers” to obtain the table in the results). The important thing to know about this table is that it simply lists the most extreme highest and lowest values for a variable and does not provide any diagnosis about whether they are actually outliers. Here is the Extreme Values table for Mary Barth’s data.

  jmv::descriptives(
    data = data,
    vars = vars(IQ_Score, z_Score),
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    extreme = TRUE)

 DESCRIPTIVES

 EXTREME VALUES

 Extreme values of IQ_Score                
 ───────────────────────────────────────── 
                   Row number    Value     
 ───────────────────────────────────────── 
   Highest    1            11    123.000   
              2            19    116.000   
              3            15    115.000   
              4            17    113.000   
              5             5    111.000   
   Lowest     1            14     59.000   
              2             3     61.000   
              3             6     66.000   
              4             9     72.000   
              5             7     74.000   
 ───────────────────────────────────────── 


 Extreme values of z_Score                 
 ───────────────────────────────────────── 
                   Row number    Value     
 ───────────────────────────────────────── 
   Highest    1            11     1.8730   
              2            19     1.4500   
              3            15     1.3890   
              4            17     1.2680   
              5             5     1.1470   
   Lowest     1            14    -1.9980   
              2             3    -1.8770   
              3             6    -1.5750   
              4             9    -1.2120   
              5             7    -1.0910   
 ───────────────────────────────────────── 

Check: Assumptions

Prior to conducting a statistical test with a sample data set, it is wise to check the data for potential outliers, for potential violations of the statistical test’s assumptions, and to compute descriptive statistics. We will have additional assumptions to check with future statistical tests we discuss, but for now the assumption we need to investigate is normality. We should consider whether the cases were randomly and independently sampled, but this requires a logical analysis rather than a statistical test.

We have seen that we can use histograms and Q-Q plots to investigate normality. We can also simply review the skewness and kurtosis statistics provided from our descriptive statistics (see the next step). The Normal Q-Q Plot does not show strong patterns that suggest nonnormality. In particular, the dots line up pretty well on the line in the Q-Q Plot. However, the Histogram shows some potential for a bimodal distribution. Due to idiosyncrasies that occur when creating these graphs, it is not necessarily true that the distribution is bimodal. However, it is worth checking further.

out <- capture.output(
    
  jmv::descriptives(
    data = data,
    vars = vars(IQ_Score),
    n = FALSE,
    qq = TRUE,
    hist = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    extreme = FALSE))

out <- capture.output(
  
  jmv::descriptives(
    data = data,
    vars = vars(IQ_Score),
    n = FALSE,
    hist = TRUE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE,
    extreme = FALSE))

Sometimes the bimodality is due to the variable itself, but sometimes it is due to another variable that is more difficult to diagnose. For example, if you have height data for men and women, the histogram here could indicate that (e.g., the taller mound on the left could be for women, who are on average shorter, and the mound on the right could display the men in the sample, who are generally taller). In the case of Mary Barth’s data, they could be illustrating a group of U.N. delegates who have had training for how to take tests versus a group of delegates who have not been trained in how to take tests. In Mary Barth’s case, the training was her independent variable. In her cases, it may be more appropriate to analyze the distribution of the data separately by group. However, if you have not collected data on the training variable (i.e., if it were not a variable of interest in your study), there would be no way to diagnose this cause.

Compute Descriptive Statistics.

Here is some example output. We want to review the descriptive statistics to become more familiar with our variables. You want to become as familiar as you can with your data in order to make better interpretations of your results. One of our colleagues frequently said “Data don’t speak to strangers.” Get to know your data well. This would include both descriptive statistical information as well as graphical information (which we have looked at above when considering outliers and assumptions). There are certainly more ways to get to know your data (e.g.,. frequency tables, group-by-group breakdowns when appropriate), so don’t limit yourself to only those things we are showing you here.

jmv::descriptives(
    data = data,
    vars = IQ_Score,
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE,
    extreme = FALSE)

 DESCRIPTIVES

 Descriptives                            
 ─────────────────────────────────────── 
                              IQ_Score   
 ─────────────────────────────────────── 
   N                                32   
   Missing                           0   
   Mean                         92.031   
   Std. error mean              2.9226   
   95% CI mean lower bound      86.071   
   95% CI mean upper bound      97.992   
   Median                       93.500   
   Standard deviation           16.532   
   Variance                     273.32   
   IQR                          22.000   
   Range                        64.000   
   Minimum                      59.000   
   Maximum                      123.00   
   Skewness                   -0.20762   
   Std. error skewness         0.41446   
   Kurtosis                   -0.67077   
   Std. error kurtosis         0.80937   
   Shapiro-Wilk W              0.97668   
   Shapiro-Wilk p              0.69899   
 ─────────────────────────────────────── 
   Note. The CI of the mean assumes
   sample means follow a
   t-distribution with N - 1 degrees
   of freedom

Calculate the Test Statistic.

In Figure 12h(i), using the One-Sample t test procedure to calculate the t statistic, the test statistic is shown to be -2.7266 using the mean from descriptive statistics. There are a couple ways to think about this test. First, we can compare the sample mean to the Null Hypothesis mean directly, which is the most common way this test is performed in jamovi. When we run the one-sample t test in jamovi we need to enter 100 as the “test value” in order to make this comparison. The resulting output follows.

Figure 12h(i) Output from One-Sample t Test for the data in 12a compared to 100

jmv::ttestOneS(
    data = data,
    vars = IQ_Score,
    testValue = 100,
    norm = TRUE,
    qq = TRUE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = FALSE,
    desc = TRUE,
    plots = TRUE)

 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                      
 ────────────────────────────────────────────────────────────────────────────────────────────────────── 
                              Statistic    df        p          Mean difference    Lower      Upper     
 ────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ_Score    Student's t      -2.7266    31.000    0.01043            -7.9688    -13.929    -2.0082   
 ────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ ≠ 100


 Normality Test (Shapiro-Wilk)      
 ────────────────────────────────── 
               W          p         
 ────────────────────────────────── 
   IQ_Score    0.97668    0.69899   
 ────────────────────────────────── 
   Note. A low p-value suggests
   a violation of the
   assumption of normality


 Descriptives                                               
 ────────────────────────────────────────────────────────── 
               N     Mean      Median    SD        SE       
 ────────────────────────────────────────────────────────── 
   IQ_Score    32    92.031    93.500    16.532    2.9226   
 ────────────────────────────────────────────────────────── 

Second, we can calculate a difference score where we subtract the Null Hypothesis mean value from every case in the dataset. This results in the data shown in Figure 12g. This approach requires an extra step, of course: computing the difference score in the dataset. However, doing the one-sample t test in this way creates results that are more similar to what we will do with the paired-samples t test: comparing whether the paired difference is different from zero.

Figure 12g Data from Table 12.1 with 100 subtracted from every value (for t test compared to 0)

Two tables placed side by side.
Case_ID IQ_Score z_Score IQ_Score_100
1 104 0.724 4
2 105 0.784 5
3 61 -1.877 -39
4 93 0.059 -7
5 111 1.147 11
6 66 -1.575 -34
7 74 -1.091 -26
8 105 0.784 5
9 72 -1.212 -28
10 87 -0.304 -13
11 123 1.873 23
12 89 -0.183 -11
13 102 0.603 2
14 59 -1.998 -41
15 115 1.389 15
16 94 0.119 -6
Case_ID IQ_Score z_Score IQ_Score_100
17 113 1.268 13
18 105 0.784 5
19 116 1.450 16
20 97 0.301 -3
21 74 -1.091 -26
22 99 0.422 -1
23 77 -0.909 -23
24 84 -0.486 -16
25 83 -0.546 -17
26 100 0.482 0
27 104 0.724 4
28 83 -0.546 -17
29 100 0.482 0
30 80 -0.728 -20
31 86 -0.365 -14
32 84 -0.486 -16

Performing the one-sample t test in this way requires us to enter zero as the “test value” instead of 100. By subtracting 100 from each score also means that we must convert our thinking from this Null and Alternative Hypothesis

\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] to this: \[ \begin{align} H_0&: \mu - 100 = 0 \\ H_A&: \mu - 100 \ne 0 \\ \end{align} \] The output that results from this approach is shown in Figure 12h(ii). Note that the important statistical information is all still the same. That is, the p value is still p = .010. The mean difference, t statistic, and the confidence interval of the mean difference are still exactly the same. In fact, the only result that has changed is the Mean.

Figure 12h(ii) Output from One-Sample t Test for the data in 12g(i) compared to 0

jmv::ttestOneS(
    data = data,
    vars = IQ_Score_100,
    testValue = 0,
    norm = TRUE,
    qq = FALSE,
    meanDiff = TRUE,
    ci = TRUE,
    effectSize = FALSE,
    desc = TRUE,
    plots = FALSE)

 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                          
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                                  Statistic    df        p          Mean difference    Lower      Upper     
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ_Score_100    Student's t      -2.7266    31.000    0.01043            -7.9688    -13.929    -2.0082   
 ────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ ≠ 0


 Normality Test (Shapiro-Wilk)          
 ────────────────────────────────────── 
                   W          p         
 ────────────────────────────────────── 
   IQ_Score_100    0.97668    0.69899   
 ────────────────────────────────────── 
   Note. A low p-value suggests a
   violation of the assumption of
   normality


 Descriptives                                                     
 ──────────────────────────────────────────────────────────────── 
                   N     Mean       Median     SD        SE       
 ──────────────────────────────────────────────────────────────── 
   IQ_Score_100    32    -7.9688    -6.5000    16.532    2.9226   
 ──────────────────────────────────────────────────────────────── 

Make A Decision About the Null Hypothesis.

The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values, that is, \(|-2.7266| = 2.7266 > 2.042\). This same conclusion was reached using the p value. The two tailed p-level was calculated using jamovi in the section below. The result was a p value of .0104 which is less that the level of significance of .05; therefore, the null hypothesis was rejected.

Construct a Confidence Interval and calculate an actual effect size.

The two tailed 100(1 – .05)% = 95% confidence interval was found using the equation:

Equation 12-5

(12-5) \[ \begin{equation} M_X-t_{(1-\alpha⁄2,n-1)} (\frac{s_X}{\sqrt{n}}) < \mu < M_X+t_{(1-\alpha⁄2,n-1)} (\frac{s_X}{\sqrt{n}}) \tag{12-5} \end{equation} \] Here we have for M = 92.03, s = 16.5324 (from Figure 12h(i)), t(.975,31) ≈ t(.975,30) = 2.042 (using Mary’s critical value based on df = 30), and n = 32.

\[ 92.03-2.402(16.5324⁄\sqrt{32})<μ<92.03+2.402(16.5324⁄\sqrt{32}) \] This simplifies to 86.06 < μ < 98.00. Mary was confident that the true population mean fell between 86.06 and 98.00. Here you should note that the null hypothesis mean of 100 is not in the confidence interval and it shouldn’t be since we rejected the null hypothesis.

Alternatively, we can use the confidence interval to make a decision about the null hypothesis. That is, because the hypothesized mean of 100 is not in our confidence interval, we decide that it is not a reasonable population mean based on the sample data we collected. Therefore, we reject the null hypothesis that the mean for the population represented by our sample is 100.

Note also that Equation 12-5 differs from what we calculated in Chapter 11. That is, the equations in chapter 11, most notably equations 11-5, 11-6, and 11-7 used a z distribution critical value in order to calculate the confidence interval. It is more common practice, however, to recognize that we have “small” samples in most research and instead calculate the confidence intervals using a critical value from the t distribution as was done in equation 12-5 above.

The actual standardized mean difference effect size for this analysis can be calculate as the absolute value of the Mean Difference, | 7.96875|, divided by the standard deviation for the original variable, 16.53244. Therefore, d = 7.96875 / 16.53244 = 0.482. This is interpreted as almost a half standard deviation difference.

jmv::ttestOneS(
    data = data,
    vars = IQ_Score,
    testValue = 100,
    norm = FALSE,
    qq = FALSE,
    meanDiff = TRUE,
    ci = FALSE,
    effectSize = TRUE,
    desc = TRUE,
    plots = FALSE)

 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                            
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                              Statistic    df        p          Mean difference                 Effect Size   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ_Score    Student's t      -2.7266    31.000    0.01043            -7.9688    Cohen's d       -0.48201   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ ≠ 100


 Descriptives                                               
 ────────────────────────────────────────────────────────── 
               N     Mean      Median    SD        SE       
 ────────────────────────────────────────────────────────── 
   IQ_Score    32    92.031    93.500    16.532    2.9226   
 ────────────────────────────────────────────────────────── 

Probability Values for t Statistics (p values)

jamovi will calculate probability values using the distrACTION module.

  1. df = your degrees of freedom (here, 31)
  2. lambda (\(\lambda\)) = 0
  3. Check “Compute probability”
  4. Set x1 = your t statistic (here, -2.7266)
  5. Click \(P(X \le x1)\)
  • In our example, the one-tailed p value for t = -2.7266 is p = .0052. That is, the area under the curve to the left of z = -2.7266 is p = .0052.
  • Therefore, in our example, the two-tailed p value for z = -2.7266 is p = .0104. That is, the combined area under the curve to the left of z = -2.7266 plus to the right of +2.7266 is p = .0104, or about 1.04%.

Note that the probability above z = +2.7266 is the same as the probability below z = -2.7266, so we could have used the positive version of the z statistic here. But if we did, we would need to switch to the option \(P(X \ge x1)\).

Also note that for the same t statistic value with the same degrees of freedom, you can calculate \(p_{1tail} = (p_{2tail}/2)\) or \(p_{2tail} = (p_{1tail}*2)\), which is literally what we did above.

Case B: A Confirmatory Analysis where the Research Hypothesis agrees with \(H_A: \mu < c_0\)

Research Hypothesis:

The mean on the World Adult Intelligence Scale of the delegates to the United Nations is less than 100.

Statistical Hypotheses:

H0: μ = 100 HA: μ < 100

Determine valid and reliable measures of the dependent and Independent variables.

Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed chapter 11.

Level of Significance.

The probability of rejecting a true null hypothesis (i.e. the probability of making a Type I error) was set at .05.

Power.

Mary Barth decided that she would like her power to be at least .85, that is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.

A priori effect size.

In a confirmatory study a priori effect size is calculated in the same manner as was found for the z statistic, i.e., using equation (11-2) of chapter 11. This equation can be used because previous information is usually available to assist the researcher in estimating an a priori effect size. For example, Mary Barth wanted to be sure of detecting a population mean as low as 92. The mean of 92 was chosen based on her knowledge of intelligence and of decisions made at the United Nations. Similarly, previous use of her newly created intelligence test indicated that a good estimate of its population standard deviation is 16.1. If we substitute these values into equation (11-2) we have:

\[ d= \frac{|92-100|}{16.1} = \frac{8}{16.1} = .4969 ≈ .50 \]

Sample Size.

In a confirmatory analysis sample size may be found using equation (11 3) or Table 12.4. Sample size was found using equation (11 3) of chapter 11 to be:

\[ n = \left[ \frac {16.1(1.645 + 1.04)}{8} \right]^2 = 29.20 ≈ 30 \] Here, σ = 16.1, = .05, z(.025) = 1.645, 1 – β = .85, z(.85) = 1.04, and |μA – μ0| = 8. To verify this result, Mary used equation (12 9) to find d for Table 12.4 as:

\[ d = d_s \sqrt{2} = 0.4969 * 1.41422 = 0.7027 ≈ 0.7 \] In Table 12.4 Mary found that with = .05, d = 0.70, and power =.85 that she needed 30 subjects. However, since Mary could conveniently collect information on 32 subjects, she decided to proceed with this number. In Table 12.4 we may use linear interpolation to estimate Mary’s power to be .88.

Critical Value.

In Table 12.2 the critical value of the t statistic for the one tailed test with = .05 is t(.05,30) = -1.697. This critical value is based on 30 degrees of freedom (see the discussion of this point for the two tailed t test above).

Box 12.1
At this point the a priori parameters of hypothesis testing have been established (i.e., your "bet" is made).

Randomly Select and Measure the Sample Units.

The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. (The same data was used here as was used for the z statistic so that you could compare the two procedures.)

Check: Outliers.

Same as that described above for Case A.

Check: Assumptions.

Same as that described above for Case A.

Compute Descriptive Statistics.

Same as that described above for Case A.

Calculate the Test Statistic.

In Figure 12b the test statistic is found to be -2.7266 using the mean from Descriptive Statistics.

Make A Decision About the Null Hypothesis.

The decision was made to reject the null hypothesis because the test statistic (t = -2.7266) was less than the critical values (t = -1.697). This same conclusion was reached using the p value. The one tailed p-level was calculated using jamovi above. The result was a p value of .0052 which is less that the level of significance of .05; therefore, the null hypothesis was rejected.

In the output in Figure 12h(ii) above, the significance (Sig. 2-tailed) is reported as a two-tailed p = .010. Because a two-tailed p value includes both tails, but we are performing just a one-tailed test now, we would simply divide the two-tailed p value by 2. Therefore, the output provides us with a one-tailed p value of .010/2 = .005. Note that if we need more accuracy from more decimal places, we can obtain more decimal places.

Construct a Confidence Interval and calculate an actual effect size.

The one tailed 100(1 – .05)% = 95% confidence interval is found using the equation:

Equation 12-6 \[ \begin{equation} \mu < M_X + t_{((1-\alpha,n-1))} * (s_X⁄\sqrt{n}) \tag{12-6} \end{equation} \] Here, we have for \(M_X = 92.03\), \(s_X = 16.5324\) (from Figure 12h(i)), \(t_{(.95,31)} ≈ t_{(.95,30)} = 1.697\), and n = 32,

\[ \begin{align} \mu &< 92.03 + 1.697(16.5324/√32) \\ \mu &< 96.99 \end{align} \] Mary was confident that the true population mean fell below 96.99. This confidence interval does not contain the population mean in the null hypothesis (i.e., 100) and it shouldn’t because we rejected the null hypothesis.

The actual standardized mean difference effect size here is calculated as \(d = 7.96875 / 16.53244 = 0.482\).

Figure 12h(i) Descriptive statistics to calculate the probability of a given t statistic

jmv::ttestOneS(
    data = data,
    vars = IQ_Score,
    testValue = 100,
    hypothesis = "lt",
    norm = FALSE,
    qq = FALSE,
    meanDiff = TRUE,
    ci = FALSE,
    effectSize = TRUE,
    desc = TRUE,
    plots = FALSE)

 ONE SAMPLE T-TEST

 One Sample T-Test                                                                                            
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
                              Statistic    df        p          Mean difference                 Effect Size   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   IQ_Score    Student's t      -2.7266    31.000    0.00522            -7.9688    Cohen's d       -0.48201   
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────── 
   Note. Hₐ μ < 100


 Descriptives                                               
 ────────────────────────────────────────────────────────── 
               N     Mean      Median    SD        SE       
 ────────────────────────────────────────────────────────── 
   IQ_Score    32    92.031    93.500    16.532    2.9226   
 ────────────────────────────────────────────────────────── 

Case C: A Confirmatory Analysis where the Research Hypothesis agrees with \(H_A: \mu > c_0\)

Research Hypothesis:

The mean on the World Adult Intelligence Scale of the delegates to the United Nations is greater than 100.

Statistical Hypotheses:

H0: μ = 100 HA: μ > 100 \[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Using the same significance level, power, and a priori effect size, Mary’s test statistic again equals -2.7266. However, in this unusual case, Mary fails to reject the null hypothesis. The test statistic (t = -2.7266) was less than the critical value (t = 1.697). Also, the p value (.9948) is greater than the alpha level (.05). This case was unusual because Mary was so wrong about the direction of the difference; that is, Mary thought the delegates’ mean would be over 100, but her statistical test informed her that it was less than 100. Her theory may have been very wrong here and therefore, may need to be reconsidered.

If this had been a two-tailed test (i.e., a non-directional null hypothesis like Case A), then this situation might also represent what some researchers would call a Type III error. A Type III error is considered a correct rejection of the null hypothesis, resulting in a correct conclusion that the population means are indeed different. However, because the sample was flawed in such an extreme way, the sample showed the wrong group as having the larger mean. This is different than having the wrong theoretical expectation about which group should have the larger mean, but both potentially result in an incorrect decision about the null hypothesis.

Case D: A Confirmatory Analysis where the Research Hypothesis agrees with \(H_0: \mu = c_0\)

In this case the researcher believes that the null hypothesis is correct (i.e., that the mean intelligence level at the U.N. is 100). However, except for the research hypothesis and the selection of sample size, the elements of hypothesis testing in this case are all exactly like those found for Case A where the researcher believes that the population mean on the World Adult Intelligence Scale differs from 100. As for all confirmatory studies, in the selection of a sample size for this case the researcher will have past information on which to calculate the a priori effect size and therefore sample size using Table 12.4 or equation (11 3). Here we will assume, however, that the researcher has prior information which leads her to an a priori effect size (d) of .80. Therefore, our presentation would be exactly like that found for Case A so that only the research hypothesis need be stated here.

Research Hypothesis.

The mean on the World Adult Intelligence Scale of the delegates to the United Nations is equal to 100.

Figure 12e Descriptive statistics and test of normality for Figures 5p through 5s

jmv::descriptives(
    data = data,
    vars = vars(Normal, Uniform, Positive_Skew, Negative_Skew),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                           
 ────────────────────────────────────────────────────────────────────────────────────── 
                              Normal       Uniform     Positive_Skew    Negative_Skew   
 ────────────────────────────────────────────────────────────────────────────────────── 
   N                               1000        1000             1000             1000   
   Missing                            0           0                0                0   
   Mean                          74.925      75.000           75.000           75.000   
   Std. error mean              0.33156     0.31623          0.31623          0.31623   
   95% CI mean lower bound       74.274      74.379           74.379           74.379   
   95% CI mean upper bound       75.575      75.621           75.621           75.621   
   Median                        75.287      74.929           73.915           76.085   
   Standard deviation            10.485      10.000           10.000           10.000   
   Variance                      109.93      100.00           100.00           100.00   
   IQR                           13.807      17.153           14.320           14.320   
   Range                         66.149      34.663           55.005           55.005   
   Minimum                       43.759      57.883           58.298           36.697   
   Maximum                       109.91      92.546           113.30           91.702   
   Skewness                   -0.077413    0.048441          0.59242         -0.59242   
   Std. error skewness         0.077344    0.077344         0.077344         0.077344   
   Kurtosis                    0.024702     -1.1617         -0.12905         -0.12905   
   Std. error kurtosis          0.15453     0.15453          0.15453          0.15453   
   Shapiro-Wilk W               0.99825     0.95694          0.96652          0.96652   
   Shapiro-Wilk p               0.40537    < .00001         < .00001         < .00001   
 ────────────────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution with N - 1
   degrees of freedom

Figure 12e(i) Descriptive statistics and test of normality for Figures 5t through 5v (also with Normal)

jmv::descriptives(
    vars = vars(Normal, Leptokurtic, Platykurtic, Bimodal),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                       
 ────────────────────────────────────────────────────────────────────────────────── 
                              Normal       Leptokurtic    Platykurtic    Bimodal    
 ────────────────────────────────────────────────────────────────────────────────── 
   N                               1000           1000           1000        1000   
   Missing                            0              0              0           0   
   Mean                          74.925         75.000         75.000      74.940   
   Std. error mean              0.33156        0.31623        0.31623     0.44614   
   95% CI mean lower bound       74.274         74.379         74.379      74.064   
   95% CI mean upper bound       75.575         75.621         75.621      75.815   
   Median                        75.287         75.472         75.466      74.422   
   Standard deviation            10.485         10.000         10.000      14.108   
   Variance                      109.93         100.00         100.00      199.04   
   IQR                           13.807         10.661         15.432      23.351   
   Range                         66.149         108.63         42.594      73.782   
   Minimum                       43.759         20.964         53.519      39.008   
   Maximum                       109.91         129.59         96.113      112.79   
   Skewness                   -0.077413      -0.075346      -0.068537    0.022198   
   Std. error skewness         0.077344       0.077344       0.077344    0.077344   
   Kurtosis                    0.024702         3.2206       -0.85946    -0.90470   
   Std. error kurtosis          0.15453        0.15453        0.15453     0.15453   
   Shapiro-Wilk W               0.99825        0.96382        0.98224     0.98000   
   Shapiro-Wilk p               0.40537       < .00001       < .00001    < .00001   
 ────────────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution with N
   - 1 degrees of freedom

TRANSFORMATIONS

Introduction

In using a z statistic or a t statistic to test a hypothesis about a population mean the assumption is made that the dependent variable is normally distributed in the population. When we considered the consequences of violating this assumption, we found that the z and t test statistics were robust to this violation, that is, the level of significance and power are as prespecified even though the population was found to be non normal. We found that these test statistics were particularly robust when sample size is large and when a two tailed alternative hypothesis is being used. However, when the normality assumption is violated, and the sample size is small, and/or a one tailed alternative hypothesis is being considered, it may be prudent to consider a transformation of the data which will make them more nearly normal (there is a quantile transformation that makes the data very normal).

Data transformations are often justified because of the arbitrary choice of the original metric of the measurement. What, for example, is the correct metric with which to measure temperature, viscosity, reaction time, quality of product, and so forth? This implies, however, that if you transform your data, you must interpret your results in terms of your transformed variables (see Games & Lucas, 1966). This is an important point, however. If you measure achievement using number of items correct on a test but then transform those scores using a logarithm, then all your statistical analyses must be interpreted as log(items) not as items. Or if you use a square root transformation, then all your statistics must be interpreted based on the square root of items. Sometimes this can make it very difficult to interpret your results in a way that makes sense. For example, in a One Sample t Test, where the null hypothesis is perhaps \(H_0: \mu = 100\) then the hypothesized value (100) would need to change because the scale of the transformed data is different than the scale of the original data. Additionally, you risk overfitting your statistical model when you transform the scores, meaning that the results may apply only to the idiosyncrasies of the sample and not to the population.

Given a successful transformation, that is, one where the transformed data is normally distributed, your level of significance and power will be more likely to be close to their prespecified levels. In statistical parlance the prespecified levels of significance and power are referred to as the nominal levels and the levels of significance and power that actually exist are referred to as the actual levels. Box Cox Transformations

The Box-Cox transformation represents a special kind of way of changing the shape of a data distribution (note that there is a newer extension of Box-Cox by Yeo and Johanson, 2000, but Box-Cox remains quite popular). Since it raises x to a power of a, statisticians also refer to it as a power transformation. The Box-Cox transformation is defined as:

\[ \begin{equation} X_T = \frac {X^a - 1}{a} \text { (if } a \ne 0 \text{)}\\ X_T = ln{(X)} \text { (if } a = 0 \text{)} \tag{12-7} \end{equation} \] where \(X_T\) denotes the transformed value of X. Box and Cox (1964) provide a method of finding the value of “a” for equation (12 7). The process involves finding the value of “a” which maximizes the Box Cox statistic from the equation:

\[ \begin{equation} \text{Box-Cox Statistic} = (-\frac{n}{2}) ln[-(\frac{1}{n}) \sum (X_T-M_{XT})^2] + (a-1) \sum ln{(X)} \tag{12-8} \end{equation} \] where \(M_{XT}\) is the mean of the transformed X’s (i.e., the mean of the \({X_T}\)). Unfortunately, it is not easy to calculate values of the Box-Cox statistic. However, there are examples available for how to do it (e.g., Osborne, 2010, available at https://pareonline.net/pdf/v15n12.pdf). jamovi can perform transformations using the compute and transform functions. Often researchers will try several common transformations and choose the one that provides the best result (e.g., square root, log, exponential).

Figures 12j and 12l show the Box-Cox transformation along with other common transformations for positively and negatively skewed data, respectively. The square root transformation (\(\sqrt{X}\)), logarithmic transformation (\(log{X}\)), and inverse transformation (\(1 / X\)) are frequently used for for positively skewed data. Reflections of these are often used for for negatively skewed data (e.g., \(\sqrt{max(X+1)-X}\), \(log(max(X+1)-X)\), and \(1 / [max(X+1)-X]\). Reflection essentially flips the skew horizontally before transforming. A square or cube transform transform is also sometimes used with negatively skewed data.

Lambda (\(\lambda\)) was chosen near -2 for the positively skewed X_Rating variable and near 4.5 for the negatively skewed Y_Rating variable. These values were obtained by trying several values for a in Equation 12-8 above and finding the one that maximized the Box-Cox Statistic. The \(\lambda\) value is usually a value between -5 to +5) for the Box-Cox statistic. Start with negative values of Lambda if your data are positively skewed and with positive values of Lambda if your data is negatively skewed. We have used the MASS package boxcox() function to do these calculations (which calculates the Box-Cox statistic in a slightly different way than Equation 12-8, but with the same results). For neg

Figure 12i Maximum of the Box-Cox statistic and associated Lambda value (and a plot of the values from -5 to +5)

lambda BoxCox_Eq12.8 MASS_BoxCox
26 -2.5 -18.518 5.1270
27 -2.4 -18.485 5.1602
28 -2.3 -18.459 5.1858
29 -2.2 -18.441 5.2037
30 -2.1 -18.431 5.2138
31 -2.0 -18.429 5.2159
32 -1.9 -18.435 5.2101
33 -1.8 -18.449 5.1963
34 -1.7 -18.471 5.1743
35 -1.6 -18.501 5.1440
36 -1.5 -18.540 5.1055

Table 12.5 Positively skewed data set (from Figure 5f) and three transformations of that data

ID X_RATING BoxCox SqRoot Log Inverse
1 10 0.4950 3.1623 2.3026 0.1000
2 10 0.4950 3.1623 2.3026 0.1000
3 11 0.4959 3.3166 2.3979 0.0909
4 11 0.4959 3.3166 2.3979 0.0909
5 11 0.4959 3.3166 2.3979 0.0909
6 11 0.4959 3.3166 2.3979 0.0909
7 11 0.4959 3.3166 2.3979 0.0909
8 12 0.4965 3.4641 2.4849 0.0833
9 12 0.4965 3.4641 2.4849 0.0833
10 12 0.4965 3.4641 2.4849 0.0833
11 12 0.4965 3.4641 2.4849 0.0833
12 12 0.4965 3.4641 2.4849 0.0833
13 12 0.4965 3.4641 2.4849 0.0833
14 13 0.4970 3.6056 2.5649 0.0769
15 13 0.4970 3.6056 2.5649 0.0769
16 13 0.4970 3.6056 2.5649 0.0769
17 13 0.4970 3.6056 2.5649 0.0769
18 14 0.4974 3.7417 2.6391 0.0714
19 14 0.4974 3.7417 2.6391 0.0714
20 14 0.4974 3.7417 2.6391 0.0714
21 15 0.4978 3.8730 2.7081 0.0667
22 16 0.4980 4.0000 2.7726 0.0625
23 17 0.4983 4.1231 2.8332 0.0588
24 18 0.4985 4.2426 2.8904 0.0556
25 20 0.4988 4.4721 2.9957 0.0500

Figure 12j Boxplots of a positively skewed data set and three transformations of that data (all on the same scale for comparison of shape, showing Box-Cox most symmetric)

out <- capture.output(
  
  jmv::descriptives(
    formula = Transform ~ Type,
    data = data,
    box = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

Figure 12k Maximum of the Box-Cox statistic and associated Lambda value (and a plot of the values from -5 to +5)

Table 12.6 Negatively skewed data set (from Figure 5f) and three transformations of that data

Figure 12l Boxplots of a positively skewed data set and three transformations of that data (all on the same scale for comparison of shape, showing Box-Cox most symmetric)

out <- capture.output(
    
    jmv::descriptives(
    formula = Transform ~ Type,
    data = data,
    box = TRUE,
    n = FALSE,
    missing = FALSE,
    mean = FALSE,
    median = FALSE,
    sd = FALSE,
    min = FALSE,
    max = FALSE))

SUMMARY

In this chapter we have considered the z and t test statistics used to test the null hypothesis that a population mean is equal to a constant. We found that the z statistic is used when the population variance is known and that the t statistic is used when the population variance is estimated using the sample variance. Each of these test statistics was illustrated in an exploratory mode of analysis, where the research hypothesis was nondirectional, and in three confirmatory modes, where specific research hypotheses were specified. For each test statistic we found that the assumption of independent units was necessary and that the test statistics were not robust to violations of this assumption. We also found that the assumption of normality of the dependent variable was common to both the z statistic and the t statistic, and that under most conditions both of these statistics were robust to violations of this assumption. We completed this chapter by considering transformations that may be used to transform skewed data into normally distributed data. Here we found the Box Cox transformation to be of value.

Procedures

  • Find the one-tailed and two tailed p values of a z statistic (note that default jamovi will not perform z tests (but there is a module that will), but will calculate the p values as describe in this chapter)
  • Find the two tailed p value of a t statistic
  • Compute the t statistic for the test of the hypothesis H0: μ = C0
  • Use the Box Cox transformation

Chapter 12 Appendix A Study Guide for Single Sample t Statistics

Single-sample t test

Analyses to Run

• Use a SCALE variable Y. • Run descriptive statistics for Y • Run a one-sample t test for Y = some appropriate value

Using the output, respond to the following items

  1. Describe the shape of Y by referring to all of the following
  1. the histogram
  2. the boxplot
  3. the Q-Q plot (to describe whether or not Y is relatively normal)
  1. Report and interpret the numeric statistical information about shape provided by EXPLORE (e.g., skewness and kurtosis).

  2. Using a confidence interval and/or t statistic for skewness (based on a t critical value with N-1 degrees of freedom), report whether you conclude that Y is symmetric in the population (and if not, why not).

  3. Using a confidence interval and/or t statistic for kurtosis (based on a t critical value with N-1 degrees of freedom), report whether you conclude that Y is mesokurtic in the population (and if not, why not).

  4. Using statistical significance (e.g., p value for Shapiro-Wilk if the sample is less than a couple hundred), report whether you conclude that Y is normally distributed in the population.

  5. Discuss any similarities and/or differences you find among the various information used for the shape of Y. That is, do all numeric and graphical results suggest the same shape for the distribution of Y?

  6. Which would be a better measure of central tendency for Y: mean or median? Why?

  7. Using statistical significance (i.e., a Sig. or p value), report whether you conclude that the Y variable is normally distributed in the population. That is, provide your evidence for whether the assumption that the variable is normally distributed is tenable (i.e., defensible, believable, reasonable).

  8. Report whether the Y distribution appears to have any outliers of concern. Provide your evidence and rationale.

Using the ONE-SAMPLE T TEST output, respond to the following items

  1. Provide the most appropriate research question for this analysis

  2. Using BOTH appropriate symbols AND words, provide is the non-directional (two-tailed) statistical Null Hypothesis for this single sample t test?

  3. Report and interpret the difference between the sample mean and the hypothesized population mean?

  4. How many cases were involved in this study?

  5. Report and interpret the estimated population standard deviation for the variable Y

  6. Report and interpret the estimated population variance for the variable Y

  7. Report and interpret the estimated population standard error of the mean for the variable Y

  8. Show or explain how the standard error of the mean for the variable Y is calculated (don’t just give the formula, fill it in with actual numbers from the results).

  9. Using a t critical value, show or explain how to calculate the 95% confidence interval for the mean for the variable Y? Indicate clearly what critical t and what degrees of freedom you used for this answer.

  10. Does the confidence interval in the previous item contain the hypothesized mean value (i.e., test value) used for the one-sample t test?

  11. Using a t critical value, show or explain how to calculate the 95% confidence interval for the mean difference between the sample estimate and the hypothesized population parameter for the variable Y?

  12. Does the confidence interval in the previous item contain the hypothesized mean difference (i.e., 0)? What does that mean in terms of statistical null hypothesis significance testing?

  13. Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use a confidence interval for the mean difference as evidence to explain your answer

  14. Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use the calculated t statistic compared to a t critical value as evidence to explain your answer

  15. Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use a Sig. or p value to explain your answer

  16. What type of hypothesis testing error might you have made when reaching your decision about the single sample t Null Hypothesis? Or was there definitely no error? Explain.

  17. Show or explain how the t statistic for the variable Y is calculated in this analysis (i.e., what numbers are actually used in the formula to get these results). Recall that for the one-sample t test, the standard error for the mean is also the standard error of the mean difference.

  18. Show or explain how to calculate the standardized mean difference effect size (Cohen’s d) for the difference between the estimated population mean (i.e., the actual sample mean) and the hypothesized population mean for the variable Y. Recall that for the one-sample t test, the standard deviation for the scores is also the standard deviation of the mean difference.

Using ALL the output in this section above, respond to the following item

  1. Interpret the results for the one-sample t test in an APA-style report to answer the research question and to describe in detail the results. Whether statistically significant or not, use descriptive statistics, (e.g., means, standard deviations, mean differences, effect sizes, and/or confidence intervals), inferential statistics, degrees of freedom, and statistical significance to describe the size and direction of the difference between variables. Be sure to discuss assumptions and outliers and their potential impact.

Citation

Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/

This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for 
most semesters 1987-1991 and again 2018-2025.