In this chapter we will consider two test statistics which are used with single groups of units. Please note that although the test statistics we present use descriptive statistics (i.e., means and variances) that are commonly found with interval and ratio data, many statisticians (e.g., Nunnally, 1967, pp. 24 25), argue strongly that they should not be limited to such scales of measurement (i.e., they can be used with ordinal data). Nunnally’s argument is convincing, and we generally agree with it, and it is not unusual to find parametric statistics used with ordinal data. We warn, however, that it is critically important that you understand the meaning of your results in terms of their scale of measurement. For example, beyond the careful interpretations required when using these statistics with ordinal data, there are times when using even interval data requires careful interpretation because of the lack of a meaningful zero in the scale.
We will begin by briefly reviewing the elements of hypothesis testing for the z statistic. We will then consider a new statistic, called Student’s t statistic, and its probability distribution. An assumption that is made in the use of both the z statistic and the t statistic is that the dependent variable follows a normal distribution in the population of interest. We will conclude this chapter by considering transformations that are useful when this assumption is violated. (The section on transformations is an optional section, which can be included in classes where time permits.)
In the presentation of the z and t test statistics different alternative hypotheses and different conditions (referred to as the “state of affairs”) will be considered, but only one data set will be used. That is, the same data set will be used for all of the presentations considered here, but each time the data will be considered with a different story, i.e., the scenario will change. For example, the same data set was used for the t statistic with a given state of affairs and four different alternative hypotheses. This was done in order to emphasize the importance of the alternative hypothesis in determining how a test statistic is interpreted, and of the state of affairs in selecting a test statistic.
The example presented in this section is based on the study discussed in chapter 11 concerning the mean intelligence level of the delegates to the United Nations. Our discussion will focus on the four possible research hypotheses that were open to Mary Barth in that study. Before we consider these cases, let us consider the state of affairs that must be present before you can use the test statistic to be described, the test statistic, the assumptions required for the test statistic to be valid what it means to violate these assumptions, and the research problem that is common to all of the cases.
The state of affairs that must exist before you can consider the following test statistic are:
The test statistic which will be used for the four cases considered in this section is:
\[ \begin{equation} z = \frac {M_X - c_0} {\sigma ⁄ \sqrt{n}} \tag{12-1} \end{equation} \] where
Recall that \(\sigma / \sqrt{n} = \sigma_X / \sqrt{n} = \sigma_{M_X} = \sigma_{\overline{X}}\), which is called the standard error of the mean. If we happen to know the standard error of the mean rather than the standard deviation or sample size, we could use the standard error direction in formula (12-1) instead of the standard deviation and \(\sqrt{n}\).
Mathematicians and mathematical statisticians who study the distributions and the probabilities of the statistics we use have warned us that certain statistics can only be used confidently when specific mathematical assumptions have been met. That is, statistical tests are only valid when their assumptions have been met. A valid statistical test is one whose level of significance and power are as specified by the researcher, that is, alpha and power have not changed because of a violation of one or more assumptions. The preceding statistical test will be valid when:
The importance of the preceding assumptions is as follows:
A robust statistical test is one that is valid even under violations of one or more of its assumptions (or it has no assumptions). Here statisticians would say that the z test is robust to violations of its normality assumption.
The research problem that we will consider in each of the following cases is: Is the mean on the Wechsler Adult Intelligence Scale of the delegates to the United Nations equal to 100?
The mean on the Wechsler Adult Intelligence Scale of the delegates to the United Nations differs from 100.
Statistical Hypotheses: \[ \begin{align} H_0&: \mu = \mu_{Hypothesized} = \mu_H \\ H_0&: \mu = \mu_{Population} \\ H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Note that both \(\mu\) and \(\mu_{Hypothesized}\) are means. \(\mu\) is the mean for the target population from which we are sampling (to get \(M_X\) or \(\overline{X}\)) to whereas \(\mu_H\) is some known—or otherwise hypothesized—population mean. Therefore, we are testing whether our target population mean equals some known mean—or otherwise hypothesized or important value. We essentially want to know whether OUR population mean equals some important value in order to decide whether OUR population is the same as the hypothesis population (because we assume the other distributional characteristics are the same: variance, skewness, kurtosis).
Mary set her level of significance at 0.05 and power at 0.85. She decided to establish an a priori effect size of 8 points or more. That is, she would like to detect a U.N. delegate population mean on the Wechsler Adult Intelligence Scale that differed from the population mean of all adults by 8 points or more. From equation (11-2), her a priori effect size was d = 0.53. She calculated her sample size, using equation (11-3), as follows:
\[ n = \left[ \frac {15(1.96+1.04])} {8} \right] ^2 = 31.64 \approx 32 \] Here, \(\sigma = 15\), \(\alpha_P = .05/2 = .025\), \(z_{(.975)} = 1.96\), \(1 – \beta = .85\), \(z_{(.85)} = 1.04\), and \(|\mu_A – \mu_0|= 8\).
In Table 9.1(b) the critical values of the z statistic for the two tailed test with \(\alpha = .05\) are 1.96 and 1.96 (i.e., \(z_{(.025)} = 1.96\) and \(z_{(.975)} = 1.96\)).
At this point the a priori parameters of hypothesis testing have been established (i.e., your “bet” is made).
The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. Note that, in this case, we know that the population \(\mu = 100\) and population \(\sigma = 15\) based on how the Wechsler scores are scaled.
In Figure 12b the test statistic is found to be -3.01 using the information from the Descriptive Statistics results shown below. Note that for the z statistic we use the population \(\sigma_X = 15\) not the sample standard deviation or standard error in Figure 12b. Also, we use the known population mean \(\mu = 100\) for both our statistical null hypothesis and statistic calculation.
\[ \begin{align} z &= \frac {M_X - \mu_{Hypothesized}} {\sigma_{M_X}} \\ z &= \frac {\overline{X} - \mu_{Population}} {\sigma_X/\sqrt{n}} \\ z &= \frac {92.031 - 100} {15/\sqrt{32}} \\ z &= \frac {-7.969} {15/5.6569} \\ z &= \frac {-7.969} {2.6517} \\ z &= -3.0052 \\ z &\approx -3.01 \end{align} \]
For this two tailed scenario, the statistical hypotheses were: \[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] The decision was made to reject the null hypothesis because the absolute value of the test statistic (\(|-3.01| = 3.01\)) was greater than the absolute value of the critical value (i.e., 3.01 > 1.96). We could also do this as -3.01 is less than -1.96 (\(-3.01 < -1.96\)), but many people find this approach less intuitive (although some prefer it). This same decision is reached when the p value is considered. In this case we have, using Table 9.1(a) that \(p(z > 3.01) = .5000 – 0.4987 = .0013\). Using \(p(z < -3.01)\) we see the same \(p = .0013\) the. Therefore, p(z < -3.01) + p(z > 3.01) yields p < 0.0026, and since \(p < \alpha\) (i.e., .0026 < .05), we reject the null hypothesis. Remember, that either the critical values or the p value may be used to decide if you will reject or fail to reject the null hypothesis. That is, both approaches – if rounded equivalently and appropriately – must result in the same decision. Both values were given here for completeness and illustrative value.
Statistical computer programs (like R, jamovi, JASP, SPSS) will calculate probability values for us when they run the analyses. Indeed, that’s one of the biggest benefits of using such programs. However, we can find the statistical probabilities for common distributions in many programs and online apps. For example, jamovi has a module called “distrACTION” that will calculate probabilities for the normal, t, F, and chi-square distributions, among others. Using distrACTION, we will find the p value for the preceding z statistic.
In confirmatory analyses, the researcher specifies the direction of the expected differences. For Case B, Mary stated as her research hypothesis that the delegates’ mean on the Wechsler Adult Intelligence Scale was less than 100. This leads to the following statistical hypothesis:
\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu < 100 \\ \end{align} \] Testing the null hypothesis with a 0.05 significance level, 0.91 power, and 0.53 effect size, Mary found the test statistic to be -3.01. Since this was less than the critical value (-1.645), she decided to reject the null hypothesis. We could also perform this process using the absolute value of the z statistic as long as we pay attention to whether the sample mean is below or above the hypothesized mean.
Case C tests the research hypothesis that the delegates’ mean is greater than 100.
\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu > 100 \\ \end{align} \] The test statistic still equals -3.01. However, in this case, Mary fails to reject the null hypothesis because the critical value of the z statistic for the one tailed test with alpha = 0.05 is +1.645. This is an unusual case which would require the researcher to carefully reconsider the reasons for the research hypothesis. This is one of the major reasons not to use one tailed tests. That is, if we predict the wrong direction for the difference we are left with inconclusive results even when the difference is large. Surprisingly, this happens. A second reason to avoid a one tailed test without really strong a priori empirical and theoretical support is that it is a less conservative approach, which therefore can results in more Type I errors (but more on this idea later).
In Case D, Mary tests the hypothesis that the mean intelligence level at the U.N. equals 100. But here her research hypothesis is that the mean = 100. So, technically these would be her statistical hypotheses:
\[ \begin{align} H_0&: \mu \ne 100 \\ H_A&: \mu = 100 \\ \end{align} \] However, the process we are discussing does not allow this approach. Therefore, Mary must use the same statistical hypotheses as Case A:
\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Therefore, this situation exactly parallels Case A, and results in the same test strategy described for it. However, in this case, we HOPE to fail to reject the null hypothesis. Therefore, Mary needs to fail to reject the null hypothesis to have evidence to support her research hypothesis. However, Mary rejects the null hypothesis of \(\mu=100\) and therefore has no evidence to conclude that \(\mu = 100\).
Note that failure to reject the null hypothesis IS NOT actually evidence the \(\mu=100\), we just have no evidence to reject it. But there are many reasons we may fail to reject a null hypothesis—not only because the null hypothesis is actually true in the population. There are newer approaches to null hypothesis testing that are able to test the equality of means, but they are not sidely used and require a different logic than the approach we are taking (which is still the approach most researchers take).
|
|
|
|
out <- capture.output(
jmv::descriptives(
data = data,
vars = IQ_Score,
hist = TRUE,
dens = TRUE,
box = TRUE,
violin = TRUE,
dot = TRUE,
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
jmv::descriptives(
data = data,
vars = IQ_Score,
variance = TRUE,
range = TRUE,
se = TRUE,
ci = TRUE,
iqr = TRUE,
skew = TRUE,
kurt = TRUE)
DESCRIPTIVES
Descriptives
───────────────────────────────────────
IQ_Score
───────────────────────────────────────
N 32
Missing 0
Mean 92.031
Std. error mean 2.9226
95% CI mean lower bound 86.071
95% CI mean upper bound 97.992
Median 93.500
Standard deviation 16.532
Variance 273.32
IQR 22.000
Range 64.000
Minimum 59.000
Maximum 123.00
Skewness -0.20762
Std. error skewness 0.41446
Kurtosis -0.67077
Std. error kurtosis 0.80937
───────────────────────────────────────
Note. The CI of the mean assumes
sample means follow a
t-distribution with N - 1 degrees
of freedom
In our discussions thus far our test statistic has been the z statistic. We were able to use the z statistic because: 1. we assumed that the dependent variable was normally distributed in the population of units of interest (so that under the Central Limit Theorem the sampling distribution of the sample means, and therefore their transformed z scores, would be normal) and 2. the population variance was known. Here the z statistic was written as presented in equation 12 1 (as represented by the σ in the formula):
\[ z = \frac {M_X-c_0} {\sigma ⁄ \sqrt{n}} = \frac{M_X-c_0}{\sigma_{M_X}} = \frac{M_X-c_0}{\text{std. error}} \] Because tabled values of z for the standard normal distribution exist, we were able to establish critical values prior to taking measurements on units, and p levels after the sample z statistic was obtained.
Frequently we will be able to meet condition (a) above, but not condition (b). That is, our measure will be normally distributed in the population, but the population variance will be unknown. In this case we can form a statistic known as the t statistic by replacing the known variance in the denominator of the z statistic by its sample estimate. Then, the t statistic may be written as:
\[ \begin{equation} t = \frac{M_X-c_0}{s_X⁄\sqrt{n}} = \frac{M_X-c_0}{s_{M_X}} = \frac{M_X-c_0}{s_{\overline{X}}} \tag{12-2} \end{equation} \] Note that the only thing that differs between the calculation of the z statistic and the t statistic for a given sample, is that the z statistic uses the population standard deviation in its denominator while the t statistic uses the sample standard deviation. However, there is a much greater difference between these two statistics in terms of finding critical values. This is because the z statistic is independent of sample size, since, except for the sample mean, it is based on population parameters. That is, there is only one standard normal distribution regardless of the sample size. However, since the t statistic is partially based on the sample standard deviation, we find a different t distribution for each sample size.
The different t distributions can be more easily understood when we consider the theoretical function which generates them. Because the probability density function for the t statistic is not required for practical research work, we will not provide it here. However, examination of the formula makes it clear that the only unknown value (besides t) needed to use the distribution formula is something referred to as the degrees of freedom (usually denoted as v in the formula). We will find that for all of the states of affairs requiring t statistics that we will consider in this book, the degrees of freedom, v, will be dependent upon sample size. For example, given a single group of subjects where our interest is in the population mean, the degrees of freedom are found as v = n – 1. Therefore, the formula will result in different t distributions different sample sizes (and therefore different degrees of freedom).
Using Table 9.1(b) the probability of drawing a mean from a given interval of the normal distribution can be found (see chapter 9). Here, since the normal distribution is symmetrically distributed, we are able to use this Table to find critical values for most levels of significance. For the t distribution, however, we would have to have a separate table like Table 9.1(b) for every sample size (hence an infinite, or at least very large, number of tables). Since this is impractical, only particular values of the t statistic are usually tabled. Such a table is presented as Table 12.2.
α=0.1000 α/2=0.2000 | α=0.0500 α/2=0.1000 | α=0.0250 α/2=0.0500 | α=0.0100 α/2=0.0200 | α=0.0050 α/2=0.0100 | α=0.0005 α/2=0.0010 | |
---|---|---|---|---|---|---|
1 | 3.0777 | 6.3138 | 12.7062 | 31.8205 | 63.6567 | 636.6192 |
2 | 1.8856 | 2.9200 | 4.3027 | 6.9646 | 9.9248 | 31.5991 |
3 | 1.6377 | 2.3534 | 3.1824 | 4.5407 | 5.8409 | 12.9240 |
4 | 1.5332 | 2.1318 | 2.7764 | 3.7469 | 4.6041 | 8.6103 |
5 | 1.4759 | 2.0150 | 2.5706 | 3.3649 | 4.0321 | 6.8688 |
6 | 1.4398 | 1.9432 | 2.4469 | 3.1427 | 3.7074 | 5.9588 |
7 | 1.4149 | 1.8946 | 2.3646 | 2.9980 | 3.4995 | 5.4079 |
8 | 1.3968 | 1.8595 | 2.3060 | 2.8965 | 3.3554 | 5.0413 |
9 | 1.3830 | 1.8331 | 2.2622 | 2.8214 | 3.2498 | 4.7809 |
10 | 1.3722 | 1.8125 | 2.2281 | 2.7638 | 3.1693 | 4.5869 |
11 | 1.3634 | 1.7959 | 2.2010 | 2.7181 | 3.1058 | 4.4370 |
12 | 1.3562 | 1.7823 | 2.1788 | 2.6810 | 3.0545 | 4.3178 |
13 | 1.3502 | 1.7709 | 2.1604 | 2.6503 | 3.0123 | 4.2208 |
14 | 1.3450 | 1.7613 | 2.1448 | 2.6245 | 2.9768 | 4.1405 |
15 | 1.3406 | 1.7531 | 2.1314 | 2.6025 | 2.9467 | 4.0728 |
16 | 1.3368 | 1.7459 | 2.1199 | 2.5835 | 2.9208 | 4.0150 |
17 | 1.3334 | 1.7396 | 2.1098 | 2.5669 | 2.8982 | 3.9651 |
18 | 1.3304 | 1.7341 | 2.1009 | 2.5524 | 2.8784 | 3.9216 |
19 | 1.3277 | 1.7291 | 2.0930 | 2.5395 | 2.8609 | 3.8834 |
20 | 1.3253 | 1.7247 | 2.0860 | 2.5280 | 2.8453 | 3.8495 |
21 | 1.3232 | 1.7207 | 2.0796 | 2.5176 | 2.8314 | 3.8193 |
22 | 1.3212 | 1.7171 | 2.0739 | 2.5083 | 2.8188 | 3.7921 |
23 | 1.3195 | 1.7139 | 2.0687 | 2.4999 | 2.8073 | 3.7676 |
24 | 1.3178 | 1.7109 | 2.0639 | 2.4922 | 2.7969 | 3.7454 |
25 | 1.3163 | 1.7081 | 2.0595 | 2.4851 | 2.7874 | 3.7251 |
26 | 1.3150 | 1.7056 | 2.0555 | 2.4786 | 2.7787 | 3.7066 |
27 | 1.3137 | 1.7033 | 2.0518 | 2.4727 | 2.7707 | 3.6896 |
28 | 1.3125 | 1.7011 | 2.0484 | 2.4671 | 2.7633 | 3.6739 |
29 | 1.3114 | 1.6991 | 2.0452 | 2.4620 | 2.7564 | 3.6594 |
30 | 1.3104 | 1.6973 | 2.0423 | 2.4573 | 2.7500 | 3.6460 |
31 | 1.3095 | 1.6955 | 2.0395 | 2.4528 | 2.7440 | 3.6335 |
32 | 1.3086 | 1.6939 | 2.0369 | 2.4487 | 2.7385 | 3.6218 |
33 | 1.3077 | 1.6924 | 2.0345 | 2.4448 | 2.7333 | 3.6109 |
34 | 1.3070 | 1.6909 | 2.0322 | 2.4411 | 2.7284 | 3.6007 |
35 | 1.3062 | 1.6896 | 2.0301 | 2.4377 | 2.7238 | 3.5911 |
36 | 1.3055 | 1.6883 | 2.0281 | 2.4345 | 2.7195 | 3.5821 |
37 | 1.3049 | 1.6871 | 2.0262 | 2.4314 | 2.7154 | 3.5737 |
38 | 1.3042 | 1.6860 | 2.0244 | 2.4286 | 2.7116 | 3.5657 |
39 | 1.3036 | 1.6849 | 2.0227 | 2.4258 | 2.7079 | 3.5581 |
40 | 1.3031 | 1.6839 | 2.0211 | 2.4233 | 2.7045 | 3.5510 |
50 | 1.2987 | 1.6759 | 2.0086 | 2.4033 | 2.6778 | 3.4960 |
60 | 1.2958 | 1.6706 | 2.0003 | 2.3901 | 2.6603 | 3.4602 |
70 | 1.2938 | 1.6669 | 1.9944 | 2.3808 | 2.6479 | 3.4350 |
80 | 1.2922 | 1.6641 | 1.9901 | 2.3739 | 2.6387 | 3.4163 |
100 | 1.2901 | 1.6602 | 1.9840 | 2.3642 | 2.6259 | 3.3905 |
150 | 1.2872 | 1.6551 | 1.9759 | 2.3515 | 2.6090 | 3.3566 |
200 | 1.2858 | 1.6525 | 1.9719 | 2.3451 | 2.6006 | 3.3398 |
500 | 1.2832 | 1.6479 | 1.9647 | 2.3338 | 2.5857 | 3.3101 |
1000 | 1.2824 | 1.6464 | 1.9623 | 2.3301 | 2.5808 | 3.3003 |
100000~z | 1.2816 | 1.6449 | 1.9600 | 2.3264 | 2.5759 | 3.2906 |
In Table 12.2, t values are given for different degrees of freedom (sample sizes), abbreviated as “df” in the left hand column, and for probabilities most frequently used in parameter estimation and hypothesis testing (e.g., .10, .05, .025, .01). At the top of this table the probabilities (i.e., levels of significance) are presented for one tailed and two tailed tests. Like the normal distribution the t distribution is symmetrically distributed with a mean of zero, and as for the z scores in the top of Table 9.1(b), only the positive values of the t scores are given in Table 12.2. Therefore, negative critical values are found by finding the associated positive t value and then making this value negative. The notation t(p, v) will be used to indicate the percentile rank, p, of a t statistic from a t distribution with v degrees of freedom.
To illustrate the t notation and the use of Table 12.2, let us consider an example where we are ready to establish critical values in a test of the null hypothesis H0: μ = 50 under conditions where the variance was unknown. Then, at the .05 level of significance with a sample of 20 units, the degrees of freedom would be 19 (i.e., df = n – 1), and the two tailed critical values would be found as 2.093 and +2.093. These critical values would be written as: t(.025, 19) = 2.093, and t(.975, 19) = 2.093. Here, if the alternate hypothesis were HA: μ > 50, the one tailed critical value would be found as t(.95, 19) = 1. 729, but if the alternate hypothesis were HA: μ < 50, the one tailed critical value would be t(.05, 19) = 1.729.
Figure 12e shows the standard normal distribution and two t distributions, one t distribution with 5 degrees of freedom, and one t distribution with 10 degrees of freedom. In Figure 12e, you can see that there are small differences between the t distributions and the standard normal distribution. These differences are largest when the degrees of freedom are small but decrease sharply as the degrees of freedom increase. This can be seen in Table 12.2 where the t values are larger when the degrees of freedom are small but become smaller and more homogeneous in each column as the degrees of freedom increase.
As the degrees of freedom become infinite, the t distribution becomes the standard normal distribution. This can be seen by considering the t values in the last row of Table 12.2 where the degrees of freedom (df) are huge (i.e., df = 100000). Here, the critical values are the same as those found using the z scores in Table 9.1(b). For example, in Table 12.2 with df = 100000 and = .05, the critical values are \(\pm 1.960\) for a two tailed test, and are equal to 1.645 or -1.645 for the two possible one tailed tests. These are the same z scores that we found for the p values using Table 9.1(b), that is, \(t_{\alpha}(\infty) = z_{(\alpha)}\).
tails | alpha | z | t(10000) | t(1000) | t(100) | t(40) | t(31) | t(30) | t(10) |
---|---|---|---|---|---|---|---|---|---|
1-tailed | 0.050 | 1.6449 | 1.6450 | 1.6464 | 1.6602 | 1.6839 | 1.6955 | 1.6973 | 1.8125 |
2-tailed | 0.050 | 1.9600 | 1.9602 | 1.9623 | 1.9840 | 2.0211 | 2.0395 | 2.0423 | 2.2281 |
1-tailed | 0.010 | 2.3263 | 2.3267 | 2.3301 | 2.3642 | 2.4233 | 2.4528 | 2.4573 | 2.7638 |
2-tailed | 0.010 | 2.5758 | 2.5763 | 2.5808 | 2.6259 | 2.7045 | 2.7440 | 2.7500 | 3.1693 |
1-tailed | 0.001 | 3.0902 | 3.0910 | 3.0984 | 3.1737 | 3.3069 | 3.3749 | 3.3852 | 4.1437 |
2-tailed | 0.001 | 3.2905 | 3.2915 | 3.3003 | 3.3905 | 2.7045 | 3.6335 | 3.6460 | 4.5869 |
In the latter introduction to the t distribution we transformed the sample mean into a t statistic and compared it to the z statistic for the situation where we were interested in the mean of a population. In the following chapters you will be introduced to different sample values that can be transformed into t statistics. We will find that the t distribution can be used as the test statistic not only when we are interested in the population mean, but also when we are interested in the difference between population means of related and independent scores, or the population correlation coefficient.
In this section we will discuss the conditions under which you would use Student’s t test to test the null hypothesis that the population mean is equal to a constant. Here we will examine the same data set that we considered in the last section for the z statistic (i.e., Mary Barth’s United Nations study), but we will slightly change the scenario. The change that is made is that here Mary Barth has decided to use a measure of intelligence that has been found to be highly correlated with the Wechsler Adult Intelligence Scale, but for which there is no large scale norm data. To give this instrument a name, we shall call it the World Adult Intelligence Scale (WAIS). Mary decided to use this instrument because it required much less time to administer, was available in different languages, and had the same mean and standard deviation for each language. Therefore, in this study we have all of the conditions that were present in the latter section, except the population variance is unknown. Let us know consider the elements of hypothesis testing given this information.
The state of affairs that must exist before you can consider the following test statistic are:
The test statistic which will be used for the four cases considered in this section is (as shown in equation 12-2):
\[ \begin{equation} t = \frac {M_X - c_0} {s_X ⁄ \sqrt{n} } \end{equation} \] When the null hypothesis is true, the sampling distribution of this statistic follows a t distribution with (n – 1) degrees of freedom, where n is the sample size.
The preceding statistical test will be valid when:
The importance of the preceding assumptions is as follows:
The research problem that we will consider in each of the following cases is: Is the mean on the World Adult Intelligence Scale (WAIS) of the delegates to the United Nations equal to 100?
The mean on the World Adult Intelligence Scale of the delegates to the United Nations differs from 100.
\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \]
Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed in chapter 11.
The probability of rejecting a true null hypothesis (i.e., the probability of making a Type I error) was set at .05.
Mary Barth decided that she would like her power to be at least .85. That is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.
In a single group exploratory analysis where the variance is unknown we will rely of the work of Cohen (1977) to aid us in selecting an a priori Effect Size.
For experiments with two groups, Cohen describes three different effect sizes: “small” is \(d = .20\), “medium” is \(d = .50\), and “large” is \(d = .80\). As a last resort, these can be used as guides in exploratory studies where information is lacking.
If we denote the single group effect size of Equation (11-2) as \(d_S\), then the relationship of \(d_S\) to Cohen’s two group effect size, d, is:
Equation 12-4 \[ \begin{align} d &= d_S \sqrt{2} \\ &\text{or} \\ d_s &= \frac{d}{\sqrt{2}} \tag{12-4} \end{align} \] Therefore, in terms of the single groups effect size \(d_S\), Cohen’s small, medium, and large effect sizes become:
\[ \begin{align} \text{Small } d_S &= .20⁄\sqrt{2} ≈ 0.14 \\ \text{Medium } d_S &= .50⁄\sqrt{2} \approx 0.35 \\ \text{Large } d_S &= .80⁄\sqrt{2} \approx 0.57 \end{align} \] In light of Cohen’s suggestions for effect size, Mary Barth selected a large a priori effect size (i.e., \(d = 0.80\); \(d_S = 0.57\)) as one that she felt would be present among the U. N. delegates. Therefore, we will use \(d_S = 0.57\) when we look at the tables below for the large standard mean difference effect size.
Tables 12.4a-d were calculated using the pwr package in R (based on Cohen, 1977) where sample size is listed for effect sizes (d) for the single group case. These are the same values obtained from the jamovi Power module. To use this table for a single group, simply use the single group effect size d in the table. In the Tables 12.4a-d “\(\alpha/2\)” symbolizes the level of significance for a two tailed test, and “\(\alpha\)” symbolizes the level of significance for a one tailed test. Mary found that she needed approximately 27 subjects when \(\alpha/2 = .05\), \(d = 0.80\) (here, \(d_S = 0.57\)), and \(power = .85\). However, since Mary could conveniently collect information on 32 subjects, she decided to proceed with this number. In Table 12.4c we see that by using 32 subjects Mary’s power is about .90.
[1] 11 10
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 365 | 94 | 44 | 26 | 18 | 14 | 11 | 9 | 7 | 6 |
0.5 | 667 | 170 | 78 | 45 | 30 | 22 | 17 | 14 | 10 | 8 |
0.6 | 804 | 204 | 93 | 54 | 36 | 26 | 20 | 16 | 12 | 9 |
0.67 | 913 | 231 | 105 | 61 | 40 | 29 | 22 | 18 | 13 | 10 |
0.7 | 965 | 244 | 111 | 64 | 42 | 31 | 23 | 19 | 13 | 11 |
0.75 | 1060 | 268 | 121 | 70 | 46 | 33 | 25 | 20 | 14 | 11 |
0.8 | 1172 | 296 | 134 | 77 | 51 | 36 | 28 | 22 | 16 | 12 |
0.85 | 1309 | 330 | 149 | 85 | 56 | 40 | 31 | 24 | 17 | 13 |
0.9 | 1492 | 376 | 169 | 97 | 63 | 45 | 34 | 27 | 19 | 14 |
0.95 | 1785 | 449 | 202 | 115 | 75 | 53 | 40 | 32 | 22 | 16 |
0.99 | 2407 | 605 | 271 | 154 | 100 | 71 | 53 | 41 | 28 | 21 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 276 | 71 | 34 | 20 | 14 | 11 | 9 | 7 | 6 | 5 |
0.5 | 544 | 139 | 63 | 37 | 25 | 18 | 14 | 12 | 9 | 7 |
0.6 | 669 | 170 | 77 | 45 | 30 | 22 | 17 | 14 | 10 | 8 |
0.67 | 768 | 195 | 88 | 51 | 34 | 25 | 19 | 15 | 11 | 9 |
0.7 | 816 | 206 | 94 | 54 | 36 | 26 | 20 | 16 | 11 | 9 |
0.75 | 904 | 228 | 103 | 60 | 39 | 28 | 22 | 17 | 12 | 10 |
0.8 | 1007 | 254 | 115 | 66 | 43 | 31 | 24 | 19 | 13 | 10 |
0.85 | 1134 | 286 | 129 | 74 | 48 | 35 | 26 | 21 | 15 | 11 |
0.9 | 1305 | 329 | 148 | 85 | 55 | 39 | 30 | 24 | 16 | 12 |
0.95 | 1580 | 397 | 178 | 102 | 66 | 47 | 35 | 28 | 19 | 14 |
0.99 | 2168 | 544 | 244 | 139 | 90 | 63 | 47 | 37 | 25 | 18 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 167 | 44 | 21 | 13 | 9 | 7 | 6 | 5 | 4 | 4 |
0.5 | 387 | 98 | 45 | 26 | 18 | 13 | 10 | 9 | 6 | 5 |
0.6 | 492 | 125 | 57 | 33 | 22 | 16 | 13 | 10 | 7 | 6 |
0.67 | 578 | 146 | 66 | 38 | 26 | 18 | 14 | 12 | 8 | 7 |
0.7 | 620 | 157 | 71 | 41 | 27 | 20 | 15 | 12 | 9 | 7 |
0.75 | 696 | 176 | 80 | 46 | 30 | 22 | 17 | 13 | 10 | 7 |
0.8 | 787 | 199 | 90 | 52 | 34 | 24 | 19 | 15 | 10 | 8 |
0.85 | 900 | 227 | 102 | 59 | 38 | 27 | 21 | 17 | 12 | 9 |
0.9 | 1053 | 265 | 119 | 68 | 44 | 32 | 24 | 19 | 13 | 10 |
0.95 | 1302 | 327 | 147 | 84 | 54 | 39 | 29 | 23 | 16 | 12 |
0.99 | 1840 | 462 | 207 | 117 | 76 | 54 | 40 | 31 | 21 | 15 |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 1 | 1.2 | |
---|---|---|---|---|---|---|---|---|---|---|
0.25 | 93 | 25 | 12 | 8 | 6 | 5 | 4 | 3 | 3 | 3 |
0.5 | 272 | 69 | 32 | 19 | 13 | 9 | 8 | 6 | 5 | 4 |
0.6 | 362 | 92 | 42 | 24 | 16 | 12 | 9 | 8 | 6 | 5 |
0.67 | 436 | 110 | 50 | 29 | 19 | 14 | 11 | 9 | 6 | 5 |
0.7 | 472 | 119 | 54 | 31 | 21 | 15 | 12 | 9 | 7 | 5 |
0.75 | 540 | 136 | 62 | 36 | 23 | 17 | 13 | 10 | 7 | 6 |
0.8 | 620 | 156 | 71 | 41 | 27 | 19 | 15 | 12 | 8 | 6 |
0.85 | 721 | 182 | 82 | 47 | 31 | 22 | 17 | 13 | 9 | 7 |
0.9 | 858 | 216 | 97 | 55 | 36 | 26 | 19 | 15 | 11 | 8 |
0.95 | 1084 | 272 | 122 | 70 | 45 | 32 | 24 | 19 | 13 | 10 |
0.99 | 1579 | 396 | 177 | 100 | 65 | 46 | 34 | 27 | 18 | 13 |
In Table 12.2 the critical values of the t statistic for the two tailed test with \(\alpha = .05\) are t(.025, 30) = -2.042 and t(.975, 30) = +2.042, where 30 is the degrees of freedom used and .025 and .975 are the probably values that will be outside the middle 95% of the distribution. Mary used df = 30 because she used an older table that only listed df = 30 and df = 40 but did not list df = 31. When this is the case, first consider the t statistic associated with degrees of freedom less than you have (e.g., here, 30). If your calculated test statistic is greater than the critical value for this smaller degrees of freedom, then your calculated t statistic will also be greater than the critical value based on your actual degrees of freedom. However, if your calculated test statistic is between the two critical values given in the table (for degrees of freedom both above and below your actual degrees of freedom), you can generally use linear interpolation to estimate the actual critical value. However, in Table 12.2, we can see that the critical value for the t statistic at df = 31 is 2.0395.
At this point the a priori parameters of hypothesis testing have been established (i.e., your “bet” is made).
The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. (The same data was used here as was used for the z statistic so that you could compare the two procedures.)
There are several ways to check for outliers on a single variable. Common ways are to use z scores, boxplots, and histograms.
The key for us is that we are not simply trying to identify cases to remove from our dataset. We are looking for cases to investigate further. We do not want to remove any cases without justification. The best justification is that the case does not belong to our stated population and therefore should not have been in our sample. However, there are other reasons that we might feel justified removing a case. For example, if studying job satisfaction, we might not want to include people who have only worked 6 months or are very near retirement. Or if studying students, we might decide we should not include students on academic probation. Ideally, we would have delimited our population before we collected data so such cases would never have been included. But we cannot always think of all the possibilities before collecting data.
When we cannot justifiably remove a case, then we might try running analyses both with and without outliers. When there are not differences in the results and conclusions, then we can easily leave the outliers in the dataset. However, when the results change with and without the outlier, we will need to decide which is the most appropriate data to analyze and report. But we recommend reporting that an outlier does change the results and provide a brief description of the differences. Always remember, outliers can both help and hurt analyses. Just as it is not fair to remove an outlier to improve your results without reporting that, it is also not fair to leave an outlier in your data knowing that it is helping your results without reporting that. We are looking for results with good external validity. Any results that change due to one or two outliers are not stable results, and should be reported as such. The best way to combat outliers is to have large sample sizes. Individual cases have much less influence on results when there are more cases in the data.
We have calculated z scores for each case. Frequently, any z score outside the range of -3 < z < 3 are considered outliers. Our view however, based on the description above, is that we must adjust our comparison value to reflect our sample size. With very small samples, for example, less than 50, we recommend using -2 < z < 2 as our decision rule (any |z| > 2 is considered an outlier). As sample size gets larger, maybe up to 200 or 300, we recommend considering any score outside 2.5 < z < 2.5 to be an outlier. Then above 200 or 300, use |z| > 3 as the rule. The rationale behind this recommendation is that the z score we choose is related to probability. We know that about 5% of scores are beyond a z score of ±2 (i.e., outside of the range 2 < z < 2). We also know, based on the probabilities of the normal curve, that less than a ½ percent (i.e., 0.5% or .005) of scores are beyond ±3. If we only have 100 cases, then we wouldn’t expect even one case to be beyond ±3. But we can have cases that are clearly outliers in a dataset with 100 or fewer cases – they just wouldn’t have z scores over ±3. Similarly, using ±2 when you have too many cases would result in too many outliers being identified (e.g., 5% of a sample size of 500 would be 25 outliers expected just due to the normal curve probabilities).
In the z scores below, we can see that none of the cases have z scores over 2.0. Therefore, we do not have any concerns about outliers
|
|
|
Recall the boxplots from chapter 5, reproduced below, Figure 5g(ii) and Figure 5n. In Figure 5g(ii) you can see that each boxplot has a circle outside the box and whiskers. Any score that is \(1.5*IQR\) above or below the box (i.e., the middle 50% of scores) is considered an outlier and identified with the small circle (o). Any score that is \(3*IQR\) above of below the box is considered an extreme value and marked with an asterisk (*). In Figure 5n, you can see that the leptokurtic distribution (on the left) has multiple extreme values and a single outlier (case 30).
We consider both outliers and extreme values to be outliers, but we always start by considering the most extreme values first. But with normal and leptokurtic, and even skewed distributions, a certain number of outliers is expected. Just because these values are outliers is not cause to remove them. But we should investigate to make sure the cases belong to the population we defined and to make sure the data are correct. Further, we want to look for other problems the outliers may be causing. For example, sometimes an outlier or group of outliers make a distribution skewed. Outliers can often have a bigger impact than a violation of the assumptions, so we want to check.
out <- capture.output(
jmv::descriptives(
formula = Scores ~ Shape,
data = data,
box = TRUE,
dot = TRUE,
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
junk <- capture.output(
jmv::descriptives(
formula = Scores ~ Ratings,
data = data,
box = TRUE,
dot = TRUE,
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
Mary’s data resulted in the following boxplot.
junk <- capture.output(
jmv::descriptives(
data = data,
vars = IQ_Score,
box = TRUE,
dot = TRUE,
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
Note that jamovi can provide a table for Extreme Values (we actually select an option called “outliers” to obtain the table in the results). The important thing to know about this table is that it simply lists the most extreme highest and lowest values for a variable and does not provide any diagnosis about whether they are actually outliers. Here is the Extreme Values table for Mary Barth’s data.
jmv::descriptives(
data = data,
vars = vars(IQ_Score, z_Score),
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
extreme = TRUE)
DESCRIPTIVES
EXTREME VALUES
Extreme values of IQ_Score
─────────────────────────────────────────
Row number Value
─────────────────────────────────────────
Highest 1 11 123.000
2 19 116.000
3 15 115.000
4 17 113.000
5 5 111.000
Lowest 1 14 59.000
2 3 61.000
3 6 66.000
4 9 72.000
5 7 74.000
─────────────────────────────────────────
Extreme values of z_Score
─────────────────────────────────────────
Row number Value
─────────────────────────────────────────
Highest 1 11 1.8730
2 19 1.4500
3 15 1.3890
4 17 1.2680
5 5 1.1470
Lowest 1 14 -1.9980
2 3 -1.8770
3 6 -1.5750
4 9 -1.2120
5 7 -1.0910
─────────────────────────────────────────
Prior to conducting a statistical test with a sample data set, it is wise to check the data for potential outliers, for potential violations of the statistical test’s assumptions, and to compute descriptive statistics. We will have additional assumptions to check with future statistical tests we discuss, but for now the assumption we need to investigate is normality. We should consider whether the cases were randomly and independently sampled, but this requires a logical analysis rather than a statistical test.
We have seen that we can use histograms and Q-Q plots to investigate normality. We can also simply review the skewness and kurtosis statistics provided from our descriptive statistics (see the next step). The Normal Q-Q Plot does not show strong patterns that suggest nonnormality. In particular, the dots line up pretty well on the line in the Q-Q Plot. However, the Histogram shows some potential for a bimodal distribution. Due to idiosyncrasies that occur when creating these graphs, it is not necessarily true that the distribution is bimodal. However, it is worth checking further.
out <- capture.output(
jmv::descriptives(
data = data,
vars = vars(IQ_Score),
n = FALSE,
qq = TRUE,
hist = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
extreme = FALSE))
out <- capture.output(
jmv::descriptives(
data = data,
vars = vars(IQ_Score),
n = FALSE,
hist = TRUE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE,
extreme = FALSE))
Sometimes the bimodality is due to the variable itself, but sometimes it is due to another variable that is more difficult to diagnose. For example, if you have height data for men and women, the histogram here could indicate that (e.g., the taller mound on the left could be for women, who are on average shorter, and the mound on the right could display the men in the sample, who are generally taller). In the case of Mary Barth’s data, they could be illustrating a group of U.N. delegates who have had training for how to take tests versus a group of delegates who have not been trained in how to take tests. In Mary Barth’s case, the training was her independent variable. In her cases, it may be more appropriate to analyze the distribution of the data separately by group. However, if you have not collected data on the training variable (i.e., if it were not a variable of interest in your study), there would be no way to diagnose this cause.
Here is some example output. We want to review the descriptive statistics to become more familiar with our variables. You want to become as familiar as you can with your data in order to make better interpretations of your results. One of our colleagues frequently said “Data don’t speak to strangers.” Get to know your data well. This would include both descriptive statistical information as well as graphical information (which we have looked at above when considering outliers and assumptions). There are certainly more ways to get to know your data (e.g.,. frequency tables, group-by-group breakdowns when appropriate), so don’t limit yourself to only those things we are showing you here.
jmv::descriptives(
data = data,
vars = IQ_Score,
variance = TRUE,
range = TRUE,
se = TRUE,
ci = TRUE,
iqr = TRUE,
skew = TRUE,
kurt = TRUE,
sw = TRUE,
extreme = FALSE)
DESCRIPTIVES
Descriptives
───────────────────────────────────────
IQ_Score
───────────────────────────────────────
N 32
Missing 0
Mean 92.031
Std. error mean 2.9226
95% CI mean lower bound 86.071
95% CI mean upper bound 97.992
Median 93.500
Standard deviation 16.532
Variance 273.32
IQR 22.000
Range 64.000
Minimum 59.000
Maximum 123.00
Skewness -0.20762
Std. error skewness 0.41446
Kurtosis -0.67077
Std. error kurtosis 0.80937
Shapiro-Wilk W 0.97668
Shapiro-Wilk p 0.69899
───────────────────────────────────────
Note. The CI of the mean assumes
sample means follow a
t-distribution with N - 1 degrees
of freedom
In Figure 12h(i), using the One-Sample t test procedure to calculate the t statistic, the test statistic is shown to be -2.7266 using the mean from descriptive statistics. There are a couple ways to think about this test. First, we can compare the sample mean to the Null Hypothesis mean directly, which is the most common way this test is performed in jamovi. When we run the one-sample t test in jamovi we need to enter 100 as the “test value” in order to make this comparison. The resulting output follows.
jmv::ttestOneS(
data = data,
vars = IQ_Score,
testValue = 100,
norm = TRUE,
qq = TRUE,
meanDiff = TRUE,
ci = TRUE,
effectSize = FALSE,
desc = TRUE,
plots = TRUE)
ONE SAMPLE T-TEST
One Sample T-Test
──────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Lower Upper
──────────────────────────────────────────────────────────────────────────────────────────────────────
IQ_Score Student's t -2.7266 31.000 0.01043 -7.9688 -13.929 -2.0082
──────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ ≠ 100
Normality Test (Shapiro-Wilk)
──────────────────────────────────
W p
──────────────────────────────────
IQ_Score 0.97668 0.69899
──────────────────────────────────
Note. A low p-value suggests
a violation of the
assumption of normality
Descriptives
──────────────────────────────────────────────────────────
N Mean Median SD SE
──────────────────────────────────────────────────────────
IQ_Score 32 92.031 93.500 16.532 2.9226
──────────────────────────────────────────────────────────
Second, we can calculate a difference score where we subtract the Null Hypothesis mean value from every case in the dataset. This results in the data shown in Figure 12g. This approach requires an extra step, of course: computing the difference score in the dataset. However, doing the one-sample t test in this way creates results that are more similar to what we will do with the paired-samples t test: comparing whether the paired difference is different from zero.
|
|
Performing the one-sample t test in this way requires us to enter zero as the “test value” instead of 100. By subtracting 100 from each score also means that we must convert our thinking from this Null and Alternative Hypothesis
\[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] to this: \[ \begin{align} H_0&: \mu - 100 = 0 \\ H_A&: \mu - 100 \ne 0 \\ \end{align} \] The output that results from this approach is shown in Figure 12h(ii). Note that the important statistical information is all still the same. That is, the p value is still p = .010. The mean difference, t statistic, and the confidence interval of the mean difference are still exactly the same. In fact, the only result that has changed is the Mean.
jmv::ttestOneS(
data = data,
vars = IQ_Score_100,
testValue = 0,
norm = TRUE,
qq = FALSE,
meanDiff = TRUE,
ci = TRUE,
effectSize = FALSE,
desc = TRUE,
plots = FALSE)
ONE SAMPLE T-TEST
One Sample T-Test
──────────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Lower Upper
──────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ_Score_100 Student's t -2.7266 31.000 0.01043 -7.9688 -13.929 -2.0082
──────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ ≠ 0
Normality Test (Shapiro-Wilk)
──────────────────────────────────────
W p
──────────────────────────────────────
IQ_Score_100 0.97668 0.69899
──────────────────────────────────────
Note. A low p-value suggests a
violation of the assumption of
normality
Descriptives
────────────────────────────────────────────────────────────────
N Mean Median SD SE
────────────────────────────────────────────────────────────────
IQ_Score_100 32 -7.9688 -6.5000 16.532 2.9226
────────────────────────────────────────────────────────────────
The decision was made to reject the null hypothesis because the absolute value of the test statistic was greater than the absolute value of the critical values, that is, \(|-2.7266| = 2.7266 > 2.042\). This same conclusion was reached using the p value. The two tailed p-level was calculated using jamovi in the section below. The result was a p value of .0104 which is less that the level of significance of .05; therefore, the null hypothesis was rejected.
The two tailed 100(1 – .05)% = 95% confidence interval was found using the equation:
Equation 12-5
(12-5) \[ \begin{equation} M_X-t_{(1-\alpha⁄2,n-1)} (\frac{s_X}{\sqrt{n}}) < \mu < M_X+t_{(1-\alpha⁄2,n-1)} (\frac{s_X}{\sqrt{n}}) \tag{12-5} \end{equation} \] Here we have for M = 92.03, s = 16.5324 (from Figure 12h(i)), t(.975,31) ≈ t(.975,30) = 2.042 (using Mary’s critical value based on df = 30), and n = 32.
\[ 92.03-2.402(16.5324⁄\sqrt{32})<μ<92.03+2.402(16.5324⁄\sqrt{32}) \] This simplifies to 86.06 < μ < 98.00. Mary was confident that the true population mean fell between 86.06 and 98.00. Here you should note that the null hypothesis mean of 100 is not in the confidence interval and it shouldn’t be since we rejected the null hypothesis.
Alternatively, we can use the confidence interval to make a decision about the null hypothesis. That is, because the hypothesized mean of 100 is not in our confidence interval, we decide that it is not a reasonable population mean based on the sample data we collected. Therefore, we reject the null hypothesis that the mean for the population represented by our sample is 100.
Note also that Equation 12-5 differs from what we calculated in Chapter 11. That is, the equations in chapter 11, most notably equations 11-5, 11-6, and 11-7 used a z distribution critical value in order to calculate the confidence interval. It is more common practice, however, to recognize that we have “small” samples in most research and instead calculate the confidence intervals using a critical value from the t distribution as was done in equation 12-5 above.
The actual standardized mean difference effect size for this analysis can be calculate as the absolute value of the Mean Difference, | 7.96875|, divided by the standard deviation for the original variable, 16.53244. Therefore, d = 7.96875 / 16.53244 = 0.482. This is interpreted as almost a half standard deviation difference.
jmv::ttestOneS(
data = data,
vars = IQ_Score,
testValue = 100,
norm = FALSE,
qq = FALSE,
meanDiff = TRUE,
ci = FALSE,
effectSize = TRUE,
desc = TRUE,
plots = FALSE)
ONE SAMPLE T-TEST
One Sample T-Test
────────────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Effect Size
────────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ_Score Student's t -2.7266 31.000 0.01043 -7.9688 Cohen's d -0.48201
────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ ≠ 100
Descriptives
──────────────────────────────────────────────────────────
N Mean Median SD SE
──────────────────────────────────────────────────────────
IQ_Score 32 92.031 93.500 16.532 2.9226
──────────────────────────────────────────────────────────
jamovi will calculate probability values using the distrACTION module.
Note that the probability above z = +2.7266 is the same as the probability below z = -2.7266, so we could have used the positive version of the z statistic here. But if we did, we would need to switch to the option \(P(X \ge x1)\).
Also note that for the same t statistic value with the same degrees of freedom, you can calculate \(p_{1tail} = (p_{2tail}/2)\) or \(p_{2tail} = (p_{1tail}*2)\), which is literally what we did above.
The mean on the World Adult Intelligence Scale of the delegates to the United Nations is less than 100.
H0: μ = 100 HA: μ < 100
Although test norms were not available for the WAIS, Mary did find many studies that supported the reliability and validity of the instrument. The reliability and validity of the independent variable (membership in the United Nations) were discussed chapter 11.
The probability of rejecting a true null hypothesis (i.e. the probability of making a Type I error) was set at .05.
Mary Barth decided that she would like her power to be at least .85, that is, she decided that she would like to be able to reject the null hypothesis at least 85 times out of 100 when the null hypothesis was false.
In a confirmatory study a priori effect size is calculated in the same manner as was found for the z statistic, i.e., using equation (11-2) of chapter 11. This equation can be used because previous information is usually available to assist the researcher in estimating an a priori effect size. For example, Mary Barth wanted to be sure of detecting a population mean as low as 92. The mean of 92 was chosen based on her knowledge of intelligence and of decisions made at the United Nations. Similarly, previous use of her newly created intelligence test indicated that a good estimate of its population standard deviation is 16.1. If we substitute these values into equation (11-2) we have:
\[ d= \frac{|92-100|}{16.1} = \frac{8}{16.1} = .4969 ≈ .50 \]
In a confirmatory analysis sample size may be found using equation (11 3) or Table 12.4. Sample size was found using equation (11 3) of chapter 11 to be:
\[ n = \left[ \frac {16.1(1.645 + 1.04)}{8} \right]^2 = 29.20 ≈ 30 \] Here, σ = 16.1, = .05, z(.025) = 1.645, 1 – β = .85, z(.85) = 1.04, and |μA – μ0| = 8. To verify this result, Mary used equation (12 9) to find d for Table 12.4 as:
\[ d = d_s \sqrt{2} = 0.4969 * 1.41422 = 0.7027 ≈ 0.7 \] In Table 12.4 Mary found that with = .05, d = 0.70, and power =.85 that she needed 30 subjects. However, since Mary could conveniently collect information on 32 subjects, she decided to proceed with this number. In Table 12.4 we may use linear interpolation to estimate Mary’s power to be .88.
In Table 12.2 the critical value of the t statistic for the one tailed test with = .05 is t(.05,30) = -1.697. This critical value is based on 30 degrees of freedom (see the discussion of this point for the two tailed t test above).
Box 12.1
At this point the a priori parameters of hypothesis testing have been established (i.e., your "bet" is made).
The measurements of the 32 randomly selected United Nations delegates are shown in Table 12.1. (The same data was used here as was used for the z statistic so that you could compare the two procedures.)
Same as that described above for Case A.
Same as that described above for Case A.
Same as that described above for Case A.
In Figure 12b the test statistic is found to be -2.7266 using the mean from Descriptive Statistics.
The decision was made to reject the null hypothesis because the test statistic (t = -2.7266) was less than the critical values (t = -1.697). This same conclusion was reached using the p value. The one tailed p-level was calculated using jamovi above. The result was a p value of .0052 which is less that the level of significance of .05; therefore, the null hypothesis was rejected.
In the output in Figure 12h(ii) above, the significance (Sig. 2-tailed) is reported as a two-tailed p = .010. Because a two-tailed p value includes both tails, but we are performing just a one-tailed test now, we would simply divide the two-tailed p value by 2. Therefore, the output provides us with a one-tailed p value of .010/2 = .005. Note that if we need more accuracy from more decimal places, we can obtain more decimal places.
The one tailed 100(1 – .05)% = 95% confidence interval is found using the equation:
Equation 12-6 \[ \begin{equation} \mu < M_X + t_{((1-\alpha,n-1))} * (s_X⁄\sqrt{n}) \tag{12-6} \end{equation} \] Here, we have for \(M_X = 92.03\), \(s_X = 16.5324\) (from Figure 12h(i)), \(t_{(.95,31)} ≈ t_{(.95,30)} = 1.697\), and n = 32,
\[ \begin{align} \mu &< 92.03 + 1.697(16.5324/√32) \\ \mu &< 96.99 \end{align} \] Mary was confident that the true population mean fell below 96.99. This confidence interval does not contain the population mean in the null hypothesis (i.e., 100) and it shouldn’t because we rejected the null hypothesis.
The actual standardized mean difference effect size here is calculated as \(d = 7.96875 / 16.53244 = 0.482\).
jmv::ttestOneS(
data = data,
vars = IQ_Score,
testValue = 100,
hypothesis = "lt",
norm = FALSE,
qq = FALSE,
meanDiff = TRUE,
ci = FALSE,
effectSize = TRUE,
desc = TRUE,
plots = FALSE)
ONE SAMPLE T-TEST
One Sample T-Test
────────────────────────────────────────────────────────────────────────────────────────────────────────────
Statistic df p Mean difference Effect Size
────────────────────────────────────────────────────────────────────────────────────────────────────────────
IQ_Score Student's t -2.7266 31.000 0.00522 -7.9688 Cohen's d -0.48201
────────────────────────────────────────────────────────────────────────────────────────────────────────────
Note. Hₐ μ < 100
Descriptives
──────────────────────────────────────────────────────────
N Mean Median SD SE
──────────────────────────────────────────────────────────
IQ_Score 32 92.031 93.500 16.532 2.9226
──────────────────────────────────────────────────────────
The mean on the World Adult Intelligence Scale of the delegates to the United Nations is greater than 100.
H0: μ = 100 HA: μ > 100 \[ \begin{align} H_0&: \mu = 100 \\ H_A&: \mu \ne 100 \\ \end{align} \] Using the same significance level, power, and a priori effect size, Mary’s test statistic again equals -2.7266. However, in this unusual case, Mary fails to reject the null hypothesis. The test statistic (t = -2.7266) was less than the critical value (t = 1.697). Also, the p value (.9948) is greater than the alpha level (.05). This case was unusual because Mary was so wrong about the direction of the difference; that is, Mary thought the delegates’ mean would be over 100, but her statistical test informed her that it was less than 100. Her theory may have been very wrong here and therefore, may need to be reconsidered.
If this had been a two-tailed test (i.e., a non-directional null hypothesis like Case A), then this situation might also represent what some researchers would call a Type III error. A Type III error is considered a correct rejection of the null hypothesis, resulting in a correct conclusion that the population means are indeed different. However, because the sample was flawed in such an extreme way, the sample showed the wrong group as having the larger mean. This is different than having the wrong theoretical expectation about which group should have the larger mean, but both potentially result in an incorrect decision about the null hypothesis.
In this case the researcher believes that the null hypothesis is correct (i.e., that the mean intelligence level at the U.N. is 100). However, except for the research hypothesis and the selection of sample size, the elements of hypothesis testing in this case are all exactly like those found for Case A where the researcher believes that the population mean on the World Adult Intelligence Scale differs from 100. As for all confirmatory studies, in the selection of a sample size for this case the researcher will have past information on which to calculate the a priori effect size and therefore sample size using Table 12.4 or equation (11 3). Here we will assume, however, that the researcher has prior information which leads her to an a priori effect size (d) of .80. Therefore, our presentation would be exactly like that found for Case A so that only the research hypothesis need be stated here.
The mean on the World Adult Intelligence Scale of the delegates to the United Nations is equal to 100.
jmv::descriptives(
data = data,
vars = vars(Normal, Uniform, Positive_Skew, Negative_Skew),
variance = TRUE,
range = TRUE,
se = TRUE,
ci = TRUE,
iqr = TRUE,
skew = TRUE,
kurt = TRUE,
sw = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────────────────────────────────────────────────────────
Normal Uniform Positive_Skew Negative_Skew
──────────────────────────────────────────────────────────────────────────────────────
N 1000 1000 1000 1000
Missing 0 0 0 0
Mean 74.925 75.000 75.000 75.000
Std. error mean 0.33156 0.31623 0.31623 0.31623
95% CI mean lower bound 74.274 74.379 74.379 74.379
95% CI mean upper bound 75.575 75.621 75.621 75.621
Median 75.287 74.929 73.915 76.085
Standard deviation 10.485 10.000 10.000 10.000
Variance 109.93 100.00 100.00 100.00
IQR 13.807 17.153 14.320 14.320
Range 66.149 34.663 55.005 55.005
Minimum 43.759 57.883 58.298 36.697
Maximum 109.91 92.546 113.30 91.702
Skewness -0.077413 0.048441 0.59242 -0.59242
Std. error skewness 0.077344 0.077344 0.077344 0.077344
Kurtosis 0.024702 -1.1617 -0.12905 -0.12905
Std. error kurtosis 0.15453 0.15453 0.15453 0.15453
Shapiro-Wilk W 0.99825 0.95694 0.96652 0.96652
Shapiro-Wilk p 0.40537 < .00001 < .00001 < .00001
──────────────────────────────────────────────────────────────────────────────────────
Note. The CI of the mean assumes sample means follow a t-distribution with N - 1
degrees of freedom
jmv::descriptives(
vars = vars(Normal, Leptokurtic, Platykurtic, Bimodal),
variance = TRUE,
range = TRUE,
se = TRUE,
ci = TRUE,
iqr = TRUE,
skew = TRUE,
kurt = TRUE,
sw = TRUE)
DESCRIPTIVES
Descriptives
──────────────────────────────────────────────────────────────────────────────────
Normal Leptokurtic Platykurtic Bimodal
──────────────────────────────────────────────────────────────────────────────────
N 1000 1000 1000 1000
Missing 0 0 0 0
Mean 74.925 75.000 75.000 74.940
Std. error mean 0.33156 0.31623 0.31623 0.44614
95% CI mean lower bound 74.274 74.379 74.379 74.064
95% CI mean upper bound 75.575 75.621 75.621 75.815
Median 75.287 75.472 75.466 74.422
Standard deviation 10.485 10.000 10.000 14.108
Variance 109.93 100.00 100.00 199.04
IQR 13.807 10.661 15.432 23.351
Range 66.149 108.63 42.594 73.782
Minimum 43.759 20.964 53.519 39.008
Maximum 109.91 129.59 96.113 112.79
Skewness -0.077413 -0.075346 -0.068537 0.022198
Std. error skewness 0.077344 0.077344 0.077344 0.077344
Kurtosis 0.024702 3.2206 -0.85946 -0.90470
Std. error kurtosis 0.15453 0.15453 0.15453 0.15453
Shapiro-Wilk W 0.99825 0.96382 0.98224 0.98000
Shapiro-Wilk p 0.40537 < .00001 < .00001 < .00001
──────────────────────────────────────────────────────────────────────────────────
Note. The CI of the mean assumes sample means follow a t-distribution with N
- 1 degrees of freedom
In using a z statistic or a t statistic to test a hypothesis about a population mean the assumption is made that the dependent variable is normally distributed in the population. When we considered the consequences of violating this assumption, we found that the z and t test statistics were robust to this violation, that is, the level of significance and power are as prespecified even though the population was found to be non normal. We found that these test statistics were particularly robust when sample size is large and when a two tailed alternative hypothesis is being used. However, when the normality assumption is violated, and the sample size is small, and/or a one tailed alternative hypothesis is being considered, it may be prudent to consider a transformation of the data which will make them more nearly normal (there is a quantile transformation that makes the data very normal).
Data transformations are often justified because of the arbitrary choice of the original metric of the measurement. What, for example, is the correct metric with which to measure temperature, viscosity, reaction time, quality of product, and so forth? This implies, however, that if you transform your data, you must interpret your results in terms of your transformed variables (see Games & Lucas, 1966). This is an important point, however. If you measure achievement using number of items correct on a test but then transform those scores using a logarithm, then all your statistical analyses must be interpreted as log(items) not as items. Or if you use a square root transformation, then all your statistics must be interpreted based on the square root of items. Sometimes this can make it very difficult to interpret your results in a way that makes sense. For example, in a One Sample t Test, where the null hypothesis is perhaps \(H_0: \mu = 100\) then the hypothesized value (100) would need to change because the scale of the transformed data is different than the scale of the original data. Additionally, you risk overfitting your statistical model when you transform the scores, meaning that the results may apply only to the idiosyncrasies of the sample and not to the population.
Given a successful transformation, that is, one where the transformed data is normally distributed, your level of significance and power will be more likely to be close to their prespecified levels. In statistical parlance the prespecified levels of significance and power are referred to as the nominal levels and the levels of significance and power that actually exist are referred to as the actual levels. Box Cox Transformations
The Box-Cox transformation represents a special kind of way of changing the shape of a data distribution (note that there is a newer extension of Box-Cox by Yeo and Johanson, 2000, but Box-Cox remains quite popular). Since it raises x to a power of a, statisticians also refer to it as a power transformation. The Box-Cox transformation is defined as:
\[ \begin{equation} X_T = \frac {X^a - 1}{a} \text { (if } a \ne 0 \text{)}\\ X_T = ln{(X)} \text { (if } a = 0 \text{)} \tag{12-7} \end{equation} \] where \(X_T\) denotes the transformed value of X. Box and Cox (1964) provide a method of finding the value of “a” for equation (12 7). The process involves finding the value of “a” which maximizes the Box Cox statistic from the equation:
\[ \begin{equation} \text{Box-Cox Statistic} = (-\frac{n}{2}) ln[-(\frac{1}{n}) \sum (X_T-M_{XT})^2] + (a-1) \sum ln{(X)} \tag{12-8} \end{equation} \] where \(M_{XT}\) is the mean of the transformed X’s (i.e., the mean of the \({X_T}\)). Unfortunately, it is not easy to calculate values of the Box-Cox statistic. However, there are examples available for how to do it (e.g., Osborne, 2010, available at https://pareonline.net/pdf/v15n12.pdf). jamovi can perform transformations using the compute and transform functions. Often researchers will try several common transformations and choose the one that provides the best result (e.g., square root, log, exponential).
Figures 12j and 12l show the Box-Cox transformation along with other common transformations for positively and negatively skewed data, respectively. The square root transformation (\(\sqrt{X}\)), logarithmic transformation (\(log{X}\)), and inverse transformation (\(1 / X\)) are frequently used for for positively skewed data. Reflections of these are often used for for negatively skewed data (e.g., \(\sqrt{max(X+1)-X}\), \(log(max(X+1)-X)\), and \(1 / [max(X+1)-X]\). Reflection essentially flips the skew horizontally before transforming. A square or cube transform transform is also sometimes used with negatively skewed data.
Lambda (\(\lambda\)) was chosen near -2 for the positively skewed X_Rating variable and near 4.5 for the negatively skewed Y_Rating variable. These values were obtained by trying several values for a in Equation 12-8 above and finding the one that maximized the Box-Cox Statistic. The \(\lambda\) value is usually a value between -5 to +5) for the Box-Cox statistic. Start with negative values of Lambda if your data are positively skewed and with positive values of Lambda if your data is negatively skewed. We have used the MASS package boxcox() function to do these calculations (which calculates the Box-Cox statistic in a slightly different way than Equation 12-8, but with the same results). For neg
lambda | BoxCox_Eq12.8 | MASS_BoxCox | |
---|---|---|---|
26 | -2.5 | -18.518 | 5.1270 |
27 | -2.4 | -18.485 | 5.1602 |
28 | -2.3 | -18.459 | 5.1858 |
29 | -2.2 | -18.441 | 5.2037 |
30 | -2.1 | -18.431 | 5.2138 |
31 | -2.0 | -18.429 | 5.2159 |
32 | -1.9 | -18.435 | 5.2101 |
33 | -1.8 | -18.449 | 5.1963 |
34 | -1.7 | -18.471 | 5.1743 |
35 | -1.6 | -18.501 | 5.1440 |
36 | -1.5 | -18.540 | 5.1055 |
ID | X_RATING | BoxCox | SqRoot | Log | Inverse |
---|---|---|---|---|---|
1 | 10 | 0.4950 | 3.1623 | 2.3026 | 0.1000 |
2 | 10 | 0.4950 | 3.1623 | 2.3026 | 0.1000 |
3 | 11 | 0.4959 | 3.3166 | 2.3979 | 0.0909 |
4 | 11 | 0.4959 | 3.3166 | 2.3979 | 0.0909 |
5 | 11 | 0.4959 | 3.3166 | 2.3979 | 0.0909 |
6 | 11 | 0.4959 | 3.3166 | 2.3979 | 0.0909 |
7 | 11 | 0.4959 | 3.3166 | 2.3979 | 0.0909 |
8 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
9 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
10 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
11 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
12 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
13 | 12 | 0.4965 | 3.4641 | 2.4849 | 0.0833 |
14 | 13 | 0.4970 | 3.6056 | 2.5649 | 0.0769 |
15 | 13 | 0.4970 | 3.6056 | 2.5649 | 0.0769 |
16 | 13 | 0.4970 | 3.6056 | 2.5649 | 0.0769 |
17 | 13 | 0.4970 | 3.6056 | 2.5649 | 0.0769 |
18 | 14 | 0.4974 | 3.7417 | 2.6391 | 0.0714 |
19 | 14 | 0.4974 | 3.7417 | 2.6391 | 0.0714 |
20 | 14 | 0.4974 | 3.7417 | 2.6391 | 0.0714 |
21 | 15 | 0.4978 | 3.8730 | 2.7081 | 0.0667 |
22 | 16 | 0.4980 | 4.0000 | 2.7726 | 0.0625 |
23 | 17 | 0.4983 | 4.1231 | 2.8332 | 0.0588 |
24 | 18 | 0.4985 | 4.2426 | 2.8904 | 0.0556 |
25 | 20 | 0.4988 | 4.4721 | 2.9957 | 0.0500 |
out <- capture.output(
jmv::descriptives(
formula = Transform ~ Type,
data = data,
box = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
out <- capture.output(
jmv::descriptives(
formula = Transform ~ Type,
data = data,
box = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE))
In this chapter we have considered the z and t test statistics used to test the null hypothesis that a population mean is equal to a constant. We found that the z statistic is used when the population variance is known and that the t statistic is used when the population variance is estimated using the sample variance. Each of these test statistics was illustrated in an exploratory mode of analysis, where the research hypothesis was nondirectional, and in three confirmatory modes, where specific research hypotheses were specified. For each test statistic we found that the assumption of independent units was necessary and that the test statistics were not robust to violations of this assumption. We also found that the assumption of normality of the dependent variable was common to both the z statistic and the t statistic, and that under most conditions both of these statistics were robust to violations of this assumption. We completed this chapter by considering transformations that may be used to transform skewed data into normally distributed data. Here we found the Box Cox transformation to be of value.
Procedures
Single-sample t test
Analyses to Run
• Use a SCALE variable Y. • Run descriptive statistics for Y • Run a one-sample t test for Y = some appropriate value
Using the output, respond to the following items
Report and interpret the numeric statistical information about shape provided by EXPLORE (e.g., skewness and kurtosis).
Using a confidence interval and/or t statistic for skewness (based on a t critical value with N-1 degrees of freedom), report whether you conclude that Y is symmetric in the population (and if not, why not).
Using a confidence interval and/or t statistic for kurtosis (based on a t critical value with N-1 degrees of freedom), report whether you conclude that Y is mesokurtic in the population (and if not, why not).
Using statistical significance (e.g., p value for Shapiro-Wilk if the sample is less than a couple hundred), report whether you conclude that Y is normally distributed in the population.
Discuss any similarities and/or differences you find among the various information used for the shape of Y. That is, do all numeric and graphical results suggest the same shape for the distribution of Y?
Which would be a better measure of central tendency for Y: mean or median? Why?
Using statistical significance (i.e., a Sig. or p value), report whether you conclude that the Y variable is normally distributed in the population. That is, provide your evidence for whether the assumption that the variable is normally distributed is tenable (i.e., defensible, believable, reasonable).
Report whether the Y distribution appears to have any outliers of concern. Provide your evidence and rationale.
Using the ONE-SAMPLE T TEST output, respond to the following items
Provide the most appropriate research question for this analysis
Using BOTH appropriate symbols AND words, provide is the non-directional (two-tailed) statistical Null Hypothesis for this single sample t test?
Report and interpret the difference between the sample mean and the hypothesized population mean?
How many cases were involved in this study?
Report and interpret the estimated population standard deviation for the variable Y
Report and interpret the estimated population variance for the variable Y
Report and interpret the estimated population standard error of the mean for the variable Y
Show or explain how the standard error of the mean for the variable Y is calculated (don’t just give the formula, fill it in with actual numbers from the results).
Using a t critical value, show or explain how to calculate the 95% confidence interval for the mean for the variable Y? Indicate clearly what critical t and what degrees of freedom you used for this answer.
Does the confidence interval in the previous item contain the hypothesized mean value (i.e., test value) used for the one-sample t test?
Using a t critical value, show or explain how to calculate the 95% confidence interval for the mean difference between the sample estimate and the hypothesized population parameter for the variable Y?
Does the confidence interval in the previous item contain the hypothesized mean difference (i.e., 0)? What does that mean in terms of statistical null hypothesis significance testing?
Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use a confidence interval for the mean difference as evidence to explain your answer
Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use the calculated t statistic compared to a t critical value as evidence to explain your answer
Using a two-tailed level of significance of = .05, was there a statistically significant difference between the estimated population mean for Y and the hypothesized population mean? Use a Sig. or p value to explain your answer
What type of hypothesis testing error might you have made when reaching your decision about the single sample t Null Hypothesis? Or was there definitely no error? Explain.
Show or explain how the t statistic for the variable Y is calculated in this analysis (i.e., what numbers are actually used in the formula to get these results). Recall that for the one-sample t test, the standard error for the mean is also the standard error of the mean difference.
Show or explain how to calculate the standardized mean difference effect size (Cohen’s d) for the difference between the estimated population mean (i.e., the actual sample mean) and the hypothesized population mean for the variable Y. Recall that for the one-sample t test, the standard deviation for the scores is also the standard deviation of the mean difference.
Using ALL the output in this section above, respond to the following item
Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/
This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for
most semesters 1987-1991 and again 2018-2025.