INTRODUCTION

In this chapter, we will begin our study of inferential statistics by considering its cornerstone, the random sample. We will examine three methods of selecting a random sample, and we will consider a theoretical distribution known as the sampling distribution. We will also consider the role the sampling distribution plays in determining the properties of statistics that are considered to be good estimates of their population parameters. Here, we will define and illustrate criteria by which statistics that are known as point estimates are judged.

Your jamovi objective will be to generate a list of N random numbers from which you will use the first n numbers to select a random sample.

RANDOM SAMPLES

Definition and Purposes

In chapter 4, a random sample was defined as: sample of n units from a population of N units, where each of the possible samples of units has the same probability of being selected. Random sampling has two purposes in experiments. It allows us to make inferences to population parameters based on sample statistics. It also controls pretreatment systematic differences between the units allocated to different treatments. In the following sections, we describe three methods for selecting a random sample from a finite population. In each case, we assume that the population has been defined, that each unit (for example, person) in the population has been numbered, and that each unit has been selected at random to form a simple random sample. The list of population units is called an accessible population, or more commonly a sampling frame.

Method 1: The “Fish bowl” Shuffle

The following steps describe what might be called a “layman’s” method of selecting a random sample.

  1. Number the units in the population.
  2. Write the number of each unit on a separate slip of paper.
  3. Put the slips of paper in a container (for example, a fishbowl, hat, or box).
  4. Thoroughly shuffle the slips of paper.
  5. Remove one slip of paper from the container and record the number.
  6. Repeat steps 4 and 5 until you have obtained the sample size you need.

One problem with this method of selecting a random sample is that there is a tendency for the slips of paper to stick together. When this happens, the sample that is selected will not be a random sample. This problem occurred during the Viet Nam War when the birth dates of young men eligible for the draft were put on plastic balls. These balls were put into a bowl and then shuffled. The young men were drafted in the order that their birth dates were drawn from the bowl. It was discovered, however, that once a date from one month was selected, an unusually large number of dates from that month were also selected. This indicated that the sampling method was not yielding a truly random sample of birth dates. A computer-generated method of drawing a random sample was then adopted.

Method 2: Table of Random Numbers

Table 10.1 contains a random collection of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 arranged in groups of 9. Each individual digit is randomly generated, not the group of 10. But because each digit is random, you can also get random 2-digit, 3-digit, 4-digit, and so forth, numbers by combining the digits together into larger numbers. The following steps show how to select a random sample of 10 subjects from a population of 100 subjects using Table 10.1.

Table 10.1 Random Numbers

Row Column1 Column2 Column3 Column4 Column5 Column6 Column7
1 101933318 905711576 671522731 108278632 438427426 919306592 539186364
2 674788178 438475141 119343377 626120793 282029804 279955782 807968849
3 852286888 109790763 855296230 306052532 671918934 446408478 552601949
4 635172620 637847535 318490860 936689237 998810016 385507061 450243822
5 819035120 383364335 311977125 197698225 395595207 268246391 530128931
6 201006283 294389988 652050753 909936874 138768344 443431671 121174750
7 115682247 884502614 064866906 853708718 685719367 346331449 014700711
8 886112681 953650363 605938044 790133197 636019472 345755195 266106203
9 725401822 064510586 846789078 453868392 401090001 618616979 304767833
10 807565837 062639585 889965096 507219732 256882474 225256838 541426545
11 327489316 182112333 461354158 930220027 368283670 310786904 773878600
12 331399171 772452873 573858998 464923955 981713218 305581352 864046964
13 635551285 132472875 743362628 524340887 080944058 486383376 777353849
14 771017636 512898727 879681051 458204319 572248299 508832986 265536543
15 939949716 607507358 820094882 372900166 402798895 321403201 865435153
16 990727226 307341312 251346956 704626702 872273400 033053599 276143660
17 000377442 176452501 872189314 279276567 693179186 589556407 462106132
18 478807807 534842576 066962177 049322335 515537151 412818445 020955777
19 100486257 714813518 329671352 797625534 393871273 762891649 743720164
20 801288852 387305354 433075084 095646566 106899139 375583663 098070750
21 338412564 095343641 714610386 964260504 004133151 222475642 077873557
22 376037848 341173985 079986646 784481796 108582201 676311486 723148437
23 469888202 970606163 506385357 070536180 781226492 549994861 080771457
24 403316982 635898235 566528112 251188380 180035513 715611739 007980888
25 199779357 605976429 526513420 538564205 998723011 286539617 515564055
26 014223073 441082761 549387434 178440249 752565032 843527323 419731919
27 341913548 746369269 826527253 705660700 353202136 100483117 969108222
28 626884804 015887409 392746106 479768693 285104477 358732383 607682630
29 269906390 061742638 101490118 320193932 293369157 692981470 945575141
30 613236639 106041713 707311287 906243223 701949868 065016309 771244403
31 131523374 649063230 814408793 217861931 041006439 947854668 673607627
32 527809157 995961681 142221938 205626278 350822853 771559654 105434012
33 474224383 121453978 002778731 653328151 672922026 702346771 818633028
34 160931627 125490591 254660148 927046225 961276565 140828585 268392333
35 842925948 125134134 204003422 823347552 327574625 573817917 817092039
36 966441348 439762091 831935981 477940167 538031844 911010397 601858259
37 359568539 954020176 186342940 832259394 605614816 033000464 369846405
38 473393191 807938060 912144713 198265128 082052685 906312064 942904108
39 391406612 622964900 830862122 246992036 624596058 046264815 361553151
40 118065552 541246696 690344589 089960450 282576926 871278767 736304521
41 724649810 772285811 903458917 800892057 801052863 424530086 734548304
42 501834881 099858580 901359898 849805082 216321002 663969277 913510852
43 183208200 844938544 474828413 777514273 890882186 346320580 596376308
44 663259921 215505831 854961955 604998708 860630190 044503747 614740697
45 543708555 829856999 256388623 476380286 066796475 249614066 324314161
46 075973299 625185879 777269244 993391552 130887387 588444190 571200729
47 044265718 525441576 440861107 075482470 637558116 111284118 766221093
48 932779799 328876565 643767732 658761197 378225086 429408280 387484595
49 127264933 322367220 723023648 490085913 623134925 155691547 908608850
50 728377036 112769525 064139205 774327378 902723197 568274943 233969621
  1. Roll a die to determine which column to use. Or you could flip a coin 6 times and count the number of TAILS you toss. In our example, we rolled a 5, and so we will start with column 5 of random numbers.

  2. Close your eyes and touch your computer screen inside the table. See what 2-digit number you are pointing at. When we did this, our finger was pointing to the number 15 (Row 17 Column 4 the second and third numbers). Therefore, we will use row 15 of column 5 as our starting place. Note that there are only 50 rows to this table, so if you point at a two-digit number larger than 50, first try to reverse it (e.g., 51 becomes 15). If that still doesn’t produce a two-digit number under 50, close your eyes and point again until you obtain a number 50 or less.

  3. In our example, the starting point was the number 40, the first 2-digit number in Row 15 and Column 5 (or 4 or 402 or 4027, etc., depending on how large a number you need). We will need 2-digit numbers because we have 100 people in our population (where 00 would indicate person 100). Simply choose the number of digits appropriate for your population size. If the number is unuasable, simply discard it and move on (e.g., if we have only 39 people in our population instead of 100, we would discard the 40 and move on as follows).

  4. Approach A. 4. Now we choose our next 9 two-digit numbers, by either moving across the row (either left or right as long as you are consistent) or up/down the column. If we move to the right, our next 2-digit numbers are: 27, 98, 89, 53, 21, 40, 32, 01 (i.e., 1), and 86. However, because 40 was already used it becomes unusable the second time, so we pick an additional number: 54. We could move down the column, instead, and choose after 40: 87, 69, 51, 39, 10, 00 (i.e., 100), 10, 78, 18. However, again, 10 is unusable so we choose another: 99. You could toss a coin to decide if you will move by rows or columns from the starting point found in step 3.

  5. Approach B. As you can see the latter approach requires that a large number of random numbers be considered before n unique random numbers are found. The number of random numbers considered can be reduced by using modular arithmetic to find numbers that mathematicians describe as being “congruent modulo N”. Using this approach each random number is divided by the population size, N. The remainder is then considered to be a random number, with remainders of zero considered to represent the number N. Our starting point using approach B is now the number 402, since we need a number larger than the population size 100. However, each number that we consider is divided by 100, our N, and the remainder is considered to represent a random digit. When we divided 402 by 100 the remainder was 02, and so the first number in our sample was 2. The following remainders were found for the three-digits numbers in the row following 402: 98, 95, 21, 03, 01, 65, 35, 53, 90. Therefore, the subjects with those ID numbers were selected to represent our random sample. Note that when we reached the end of the row, we moved to the beginning of the next row (Row 16 Column 1).

Method 3: Random Number Generator

In the following steps you are shown how to generate a random sample of 10 cases from a population of 20 cases using jamovi. We will use the data in Table 5a for this example. The process involves creating one or two new columns of numbers. If you do not have an ID number, you will need to create one simply as an ordered list of the numbers from one to your population size (i.e., 1 to N = 20). The second new column that contains a list of twenty-five numbers that have been generated at random from the uniform distribution. The end points of this uniform distribution will be set at zero and one. Then, the random list will be ordered, and in the process, the ordered list will be placed in random order.

For this example we will use the data in Figure 5a (hopefully you have it available as a dataset you can open, but if not you can enter it pretty easily). We will use only the 20 cases in the group 1 (the Yes-Pill cases). See Figure 10a. Even though we have an ID number, we will create one for illustrative purposes. We will use jamovi’s UNIF function to create random uniform numbers.

Figure 10a Data from Original Cholesterol data (Figure 5a) for example of using a random number function

Row ID1 HT1 CHOL1
1 225 64 210
2 736 63 215
3 1291 64 263
4 906 63 220
5 494 62 330
6 796 65 198
7 1637 65 260
8 2871 66 198
9 3349 65 250
10 2522 63 179
11 828 67 243
12 1309 62 247
13 2349 64 210
14 544 64 272
15 1326 66 210
16 59 62 253
17 473 67 297
18 1251 60 195
19 2271 66 158
20 1797 68 150

Steps:

  1. Open or Create the dataset
  2. Compute a new variable called IDNUM using the ROW() function
  3. Compute a new variable called RANDNUM using the UNIF(0, MAX(ROW)) function. If you are using a program where you can set a seed, that is useful so you can recreate the same list of numbers again if needed.
  4. The values in the variable RANDNUM column are now the randomly generated uniform numbers from 0 to 1.
  5. Because they are random, we can use them to choose a randomly selected list of cases. See Figure 10b. And, of course, you can sort ascending or descending, it doesn’t matter.
  6. If you need 10 randomly chosen cases, then you would choose the following Cases (IDs) with the smallest RANDNUMs. To make our lives a little easier with bigger datasets, you can rank the RANDNUMs and just choose the cases with ranks less than the desired number of cases. Here we chose ID1: 1797, 1309, 59, 544, 473, 828, 906, 2522, 1637, and 2349. These correspond to the new IDNUMs of: 20, 12, 16, 14, 17, 11, 4, 10, 7, 13.

Figure 10b Sorted Data for example of random number function

Row ID1 HT1 CHOL1 IDNUM RANDNUM RANKNUM
1 225 64 210 1 2.9367 1
2 736 63 215 2 18.2085 19
3 1291 64 263 3 13.7589 13
4 906 63 220 4 3.0573 2
5 494 62 330 5 9.3301 8
6 796 65 198 6 18.4668 20
7 1637 65 260 7 11.2445 11
8 2871 66 198 8 13.8210 15
9 3349 65 250 9 9.3310 9
10 2522 63 179 10 3.2675 4
11 828 67 243 11 12.8963 12
12 1309 62 247 12 6.3586 6
13 2349 64 210 13 6.3192 5
14 544 64 272 14 16.3514 16
15 1326 66 210 15 17.1935 17
16 59 62 253 16 3.0860 3
17 473 67 297 17 17.2506 18
18 1251 60 195 18 6.8150 7
19 2271 66 158 19 13.7665 14
20 1797 68 150 20 9.4818 10

You can use a similar process to generate random numbers that corresponding to row numbers in a list you do not have in jamovi. Let’s say your list has 300 names. You will need to create a column in jamovi with 300 cases (the column can be mostly missing values, but you’ll need some value in row 300 of the Data Editor). So scroll down as far as you can and enter values until you get to row 300 and enter something. Go to COMPUTE and Instead of UNIF(0,1) you would enter UNIF(1,300). The rank the random numbers and choose the row numbers to use in your list.

SAMPLING WITH AND WITHOUT REPLACEMENT

Sampling Without Replacement

The sampling method described above are examples of what is own as sampling without replacement. In sampling without replacement a number is sampled at random, but is not replaced in the population, so that it can not be chosen again. Therefore, sampling without replacement is a process which yields no repeats of numbers in the sample. Sampling without replacement is commonly used in practical applications and is the method used throughout this book.

Sampling With replacement

In sampling with replacement a number is randomly selected from a population of numbers and recorded. The number selected is then returned to the population and a second number is randomly chosen and recorded. This process is repeated, until a given sample size is obtained. Sampling with replacement can be done with the method referred to as the “fish bowl shuffle” by replacing each slip of paper back into the fish bowl and shaking the bowl before selecting another slip. Using a table of random numbers, repeats of a number are kept in the sample. In practice almost all sampling is done without replacement. However, there are situations where replacement is used. In particular, a relatively new robust statistical approach called “bootstrapping.”

RANDOM ASSSIGNMENT

You used the preceding sampling procedures to randomly select units. You can also use them for random assignment. After you have done your random selection, proceed as follows:

  • Use one of the preceding methods of finding a random sample to obtain a random order of your treatments. For example, if you had four treatments, you might find the following random order for the treatment numbers: 2, 1, 4, 3.
  • Use one of the preceding methods of finding a random sample to randomly select a unit and then place this unit into your first randomly ordered treatment (in our example, treatment number 2). Then place the next unit that is randomly selected into the next treatment. Continue this process of randomly selecting a unit and placing it into a treatment until all of your units have been placed.

SAMPLING DISTRIBUTIONS

A sampling distribution is a probability distribution of a statistic. Remember that a probability distribution is a theoretical distribution. In this sense, a sampling distribution is based on values of a statistic that has been calculated for each of an infinite number of random samples of size n. Statistics that are commonly considered in sampling distributions are the mean, variance, skewness, kurtosis, correlation coefficient, regression coefficient, proportions, and different test statistics. (We will consider the sampling distributions of different test statistics as we introduce them, starting in chapter 12.)

Sampling distributions of statistics are based on an infinite number of samples. We can obtain an idea of what a sampling distribution is, however, by considering an illustration based on a finite number of samples. For example, we can simulate a sampling distribution of the variance using 100 sample variances based on samples of IQ scores from the population of high school students in New York City. We obtain the variances needed for the sampling distribution by calculating the variance in each of 100 samples of students. For this example, each sample contains 30 students. To calculate a given variance, we select a random sample of 30 students and administer an IQ test to them. We then calculate the variance of the resulting IQ scores, and repeat this process for 100 such samples. When we finish, we have 100 variances, which we can put into a frequency distribution.

In this example, our frequency distribution meets our definition of a sampling distribution if it contains an infinite number of samples and we grade the y-axis with probabilities instead of frequencies. We could theoretically obtain an infinite number of samples of students by:

  • drawing a random sample of students
  • calculating their variance
  • putting he students back into the population
  • drawing another random sample of students
  • repeating these steps ad infinitum.

The ad infinitum (i.e., continuing the process infinitely) is the reason this distribution is called a theoretical distribution.

The Importance of the Sampling Distribution

The sampling distribution is important because mathematical statisticians can tell what shape the sampling distributions of many statistics will take (for example, normal, positively skewed, and so on). Furthermore, statisticians can tell what the mean and variance of these distributions will be.

Central Limit Theorem

If we know the mean, variance, and shape of a sampling distribution of a statistic, we can make inferential statements, that is, statements concerning parameter estimation and significance testing. A good example of this is the sampling distribution of the mean. A theorem, known as the Central Limit Theorem, states that:

If a population has a finite variance \(\sigma^2\) and mean \(\mu\), then the sampling distribution of the mean aim approaches a normal distribution when n (the sample size of the random samples upon which the sample means are calculated) increases. That is, when n is very large, the sampling distribution of the mean is approximately normal. Furthermore, the mean of the sampling distribution of the mean is \(\mu\), and the variance of the sampling distribution of the mean is \(\sigma^2/n\).

The variance of the sampling distribution of the mean is called the variance of the mean, and the standard deviation of the sampling distribution of the mean is called the standard error of the mean. Note that in this theorem nothing is said about the population distribution; that is, the population distribution can take any shape. If the population is known to have a normal distribution, however, the sampling distribution of the mean will be normal with any size sample.

An Example

We begin to see some of the practical applications of the Central Limit Theorem when we consider a sample of 25 that we believe was selected at random from a population whose mean is 50 and whose variance is 100. If the sample did come from this population, we know that the mean of the sampling distribution of such sample means is 50, and the variance of this sampling distribution is 100/25 = 4, with a standard deviation of 2.

Given this information, our knowledge of standard scores, and the normal probability distribution, we can ask and answer the following estimation and significance test questions. Use the sampling distribution of the mean shown in figure 10c to assist you in considering these questions.

Question 1

Between what two sample means would we expect 95% of the sample means to fall?

Answer 1

Given table 9.1(b), in the standard normal probability distribution, we would expect 95% of the scores to fall between z(.025) = -1.960 and z(.975) = 1.960. In the sampling distribution of the mean, sample means are our scores; and so a z statistic (a z score for a group) for a given mean is written as:

\[ \begin{equation} z_X = \frac {(M_X - \mu_X)} {\sigma_{M_X}} \\ \text{or} \\ z_X = \frac {(\overline{X} - \mu)} {\sigma_\overline{X}} \\ \tag{10-1} \end{equation} \] Here, \(\mu\) = 50 and \(\sigma_M\) = 2. We must consider z at z(.025) = -1.96 and z(.975) = 1.96, and solve for the values of M. In so doing, we have:

\[ \begin{align} 1.96 &= (M-50)/2 \\ 3.92 &= M-50 \\ 53.92 &= M \\ \\ -1.96 &= (M-50)/2 \\ -3.92 &= M-50 \\ 46.08 &= M \end{align} \] Therefore, we would expect that 95% of our sample means would fall between 46.08 and 53.92, that is, p(46.08 < M < 53.92) = .95.

Question 2

Between what two sample means would we expect 99% of the sample means to fall?

Answer 2

Given table A.1(b), in the standard normal distribution, we would expect 99% of the standard scores to fall between z(.005) = -2.576 and z(.995) = 2.576. Therefore, using the procedure described for answer 1, we have:

\[ \begin{align} 2.576 &= (M-50)/2 \\ 5.152 &= M-50 \\ 55.152 &= M \\ \\ -2.576 &= (M-50)/2 \\ -5.152 &= M-50 \\ 44.848 &= M \end{align} \]

Therefore, we would expect that 99% of our sample means would fall between 44.848 and 55.152, that is, p(44.848 < M < 55.152) = .99.

Question 3

If a sample mean of 52.4 were obtained, what would be a good estimate of the population mean? (In the following section, we will describe some criteria that help decide what good means.)

Answer 3

The sample mean of 52.4 is a good estimate of the population mean.

Question 4

If a sample mean of 52.4 were obtained, would it be reasonable to assume that the mean of the sampling distribution was 50? In other words, would it be reasonable to assume that the sample came from a population where the mean was 50?

Answer 4a

If the population mean was 50, we would expect that 95% of the samples randomly selected from this population have means that fall between 53.92 and 46.08. Therefore, since 52.4 falls within this interval, it would be reasonable to assume that the population mean was 50.

Answer 4b

The mean 52.4 differs from the suspected population mean by 2.4 points. We can answer question 4 in another way by asking: What is the probability of obtaining a sample mean that differs by 2.4 points or more from a population value of 50? Here, we are asking for the probability of obtaining a sample mean that is less than 47.6 (that is, 50- 2.4) or greater than 52.4 (that is, 50 + 2.4). We can easily answer this question by transforming the score of 52.4 into a z score and then finding the area above it, using table 9.1(a):

\[ z = \frac {52.4-50} {2} = 1.2 \]

\[ p(z>1.2)=.5000-.3849=.1151 \] Also, we can find the area below the z score corresponding to the score of 47.6 (which because of the symmetry of the normal distribution is the same as that area for z > 1.2) as:

\[ z = \frac{47.6-50} {2} = -1.2 \]

\[ p(z < -1.2) = .1151 \]

Therefore, the probability of obtaining a sample mean that is more than +2.4 points from a population mean of 50 is .2302 or \(p(z < -1.2) + p(z > 1.2) = .1151 + .1151 = .2302\). Because this probability is high, we can conclude that if the population mean was 50, it would be reasonable to expect a sample whose mean was 52.4. (In the next chapter, we will discuss what is meant by a high probability in this situation.)

Question 5

If a sample mean of 56 were obtained, would it be reasonable to assume that the mean of the sampling distribution was 50? In other words, would it be reasonable to assume that the mean of the population was 50?

Answer 5a

If the population mean were 50, we would expect that 95% of the sample means would fall between 46.08 and 53.92, and that 99% of the sample means would fall between 44.848 and 55.152. Since a sample mean of 56 is outside both of these ranges, we might conclude that the mean of the sampling distribution (the population mean) is not 50.

Answer 5b

As in answer 4b, we can use table 9.1(a) to find the probability of obtaining a sample mean that differs by 6 points (that is, between 44 and 56) from a population mean of 50. Here, the z score for a sample mean of 56 is:

\[ z = \frac{56-50} {2} = 3.00 \]

\[ p(z > 3.00) = .5000 - .4987 = .0013 \]

Similarly, the z score for the sample mean of 44 is:

\[ z = \frac{44-50}{2} = -3.00 \] \[ p(z < -3.00) = .0013 \] Therefore, if the population mean is 50, the probability of obtaining a sample mean that differs from this population mean by 6 or more points is p(z < -3.00) + p(z > 3.00) = .0013 + .0013 = .0026. Since this probability is small, we might conclude that a sample mean of 56 came from a population whose mean is greater than 50.

THE ROLE OF THE SAMPLING DISTRIBUTION IN INFERENTIAL STATISTICS

The preceding questions and answers gave you a sense of the role that a sampling distribution (here, of the mean) plays in inferential statistics. It is an extremely important role although it stays primarily in the background. We do not have to actually find the sampling distribution of a statistic, we only have to know what its shape is and what its parameters are. Based on our knowledge of the sampling distribution we can make a priori probability statements about an unknown sample statistic (as in questions 1 and 2) or an inference about a population parameter (as in questions 3, 4, and 5).

In the next chapter, we will further consider such questions when we consider the importance of the sampling distribution to hypothesis testing. For now, however, we will consider the question that is implied in question 3:

What criteria can be used to help decide if a sample statistic is a good estimate of its population parameter? Said another way: What are the properties of statistics of which we want to consider the sampling distributions? We will consider the formal definitions of these criteria and examine them for commonly used statistics.

POINT ESTIMATES

Definitions

A statistic is said to be a point estimate when it is used to infer the value of a population parameter. The equation used to derive the statistic is called the estimator. The following criteria are frequently used to evaluate a statistic:

Unbiased

A statistic is said to be unbiased when the mean of its sampling distribution is its population parameter.

Consistent

A statistic is said to be consistent when the probability that it is close to its population parameter increases as the sample size increases.

Efficient

Kendell and Buckland (1976, p. 47) described efficiency as follows:

The concept of efficiency in statistical estimation is due to Fisher (1921) and is an attempt to measure objectively the relative merits of several possible estimators.

The criterion adopted by Fisher was that of variance, an estimator being regarded as more “efficient” than another if it has smaller variance; and if there exists an estimator with minimum variance v the efficiency of another estimator of variance v1 is defined as the ratio of v/v1. It was implicit in this development that the estimator should obey certain criteria such as consistency. For small samples, where other considerations such as bias enter, the concept of efficiency may require extension or modification.

Sufficient

The definition of a sufficient statistic is beyond the scope of this book; suffice it to say that a sufficient statistic contains all of the information in its sample relative to the estimation of its population parameter.

SAMPLING DISTRIBUTIONS AND ESTIMATES

In this section, we will create finite sampling distributions whose properties and estimates we can examine more closely. This exercise will enable us to better conceptualize what a sampling distribution is, and what the properties of its scores (estimates) are. In this regard, we will consider samples from a uniform population distribution.

The uniform distribution was chosen to illustrate that the sampling distributions of most statistics based on samples from this distribution are not uniform. Indeed, considering the Central Limit Theorem, we know that the sampling distribution of the mean for large samples will be close enough to a normal distribution for most practical purposes.

The Discrete Uniform Probability Distribution

As was the case for the normal distribution, instead of considering a given population with a uniform distribution and a fixed sample size, we will consider a given uniform probability distribution that represents all sample sizes. Consider the discrete uniform probability distribution shown in figure 10d, which has a low boundary of 0 and a high boundary of 1000. The probability of sampling a given number from this distribution is 1/1001, since there are 1001 numbers and each has an equal chance of being selected. The following population parameters have been derived by mathematical statisticians for any discrete uniform probability distribution:

\[ \begin{align} Mean = \mu &= (a+(b-1))/2 \\ Median = Md. &= (a+(b-1))/2 \\ Variance = \sigma^2 &= (b^2-1)/12 \\ Skewness = b_1 &= 0 \end{align} \] Here, \(a\) is the lower boundary, and \(b\) is the upper boundary. Therefore, for the discrete uniform probability distribution shown in figure 10d, we have (for \(a = 0\) and \(b = 1000\)) that:

\[ \begin{align} Mean &= \mu = &&(0+(1000-1))/2 &&= 499.5 \\ Median&=Md. = &&(0+(1000-1))/2 &&= 499.5 \\ Variance &= \sigma^2 = &&(1000^2-1)/12 &&= 83333.25 \\ Skewness &= b_1 = && &&= 0 \end{align} \] We will select numbers at random from this theoretical probability distribution.

Figure 10c Sampling distribution of the Mean

Figure 10d A discrete uniform probability distribution with lower boundary at 0 and upper boundary at 1000

Finite Sampling Distributions Based on Different Sample Sizes

###The Raw Data and Its Statistics

To illustrate the properties of sampling distributions and their estimates, 1000 random samples based on the uniform probability distribution shown in figure 10d, were generated for sample sizes of 5, 10, 25, and 50 units.

The 1000 samples of size 5 are partially shown in figure 10e. Similarly, figure 10f shows the means, variances, standard deviations, ranges, and medians for the 1000 samples of size 5. Although the summary statistics for the samples of sizes 5, 10, 25, and 50 units will be examined, only the observations based on samples of size 5 are shown here to keep the presentation less cluttered.

The mean of the scores in Sample 1 in figure 10e is 549.4, which is the first mean shown in column 4 (labeled MEAN 5) of figure 10f. Also, the variance for the first sample in figure 10e is shown in the first row of figure 10f as 51179; the standard deviation is 226.2280 the range is 559; and the median is 481. In this manner, the statistics for a given sample of figure 10e are found in the corresponding row of figure 10f.

Descriptive Statistics

We sent the data (statistics) from each of the samples across each of the sample sizes to R’s Descriptive statistics program (see chapter 4). jamovi found descriptive statistics for the means derived from samples of size 5, 10, 25, and 50. jamovi also found descriptive statistics for the medians, variances, standard deviations, and ranges based on samples of size 5, 25, and 50. Figure 10g shows the resulting descriptive statistics for the means based on different sample sizes. The descriptive statistics for the medians, variances, standard deviations, and ranges are shown in figures 10h, 10i, 10j, and 10k, respectively.

Histograms

To further illustrate the features of the mean, we constructed finite sampling distributions using R’s histogram capabilities (see chapter 4) for the 1000 samples of each sample size. These finite sampling distributions are displayed in the histograms of figure 101. For example, in figure 101, histogram A represents the sampling distribution of the mean based on 1000 samples with 5 units in each sample. Histogram B has 10 units per sample mean; histogram C has 25 units per sample mean; and histogram D has 50 units per sample mean.

Variance Bar plots

We also constructed variance bar plots for each of the statistics across each of the sample sizes. Figures 10m, 10n, 10o, and 10p show these variance bar plots for the means, medians, standard deviations, and ranges, respectively.

For example, figure 10m shows from left to right the four variance bar plots of the 1000 means based on samples of size 5, 10, 25, and 50. The small lines extending from each bar plot just above and below the mean of the statistic, referred to as “whiskers,” represent the standard errors of the statistic. For example, in figure 10m, the first whisker above and below the mean represents one standard error (\(\sigma\) / √n ) from the mean, and the second whisker above or below the mean represents two standard errors (2\(\sigma\) / √n ) from the mean. (Note that the variance bar plots were not made for the variances, whose descriptive statistics are shown in figure 10i, because the variances are so large they require rescaling.)

What Do These Tables and Figures Illustrate?

Unbiased Estimates: Mean and Variance

Statisticians have found that the mean and variance are unbiased estimates of their population parameters. That is, if we could take an infinite number of samples for a given sample size, we would find that the means of the sampling distributions of these statistics would be their population parameters. Since the statistics illustrated here are based on only 1000 samples, we do not find them to be exactly equal to their population values, but they are close (1000 samples sounds like a lot, but for this kind of research we typically use 10,000 or more).

For example, the mean of the population is known to be 499.5, and the sampling distribution means reported in figure 10g are 505.38, 472.64, 497.30, and 490.14 for samples of size 5, 10, 25, and 50, respectively. The population variance is known to be 83333.25; and the sampling distribution means reported in figure 10i are 69937.23, 81804.37, 76857.27, and 80170.70 for samples of size 5, 10, 25, and 50, respectively.

Unbiased Estimate: The Variance

In chapter 5, we found that the population variance was calculated using equation (5-3) as:

\[ \sigma_X^2 = \sum \frac {(X-\mu)^2} {N} \]

The sample variance was found using the estimator, equation (5-2), as:

\[ s_X^2 = \sum \frac {(X-M_X)^2} {n-1} \] Here, a natural question to ask is: Why not use n instead of (n-1) as the denominator of the sample variance? The reason is that if n is used as the denominator of the sample variance, the mean of the sampling distribution of such variances is not the population variance; that is, the sample variance found with n as the denominator is biased. To have an unbiased estimate of the population variance, the estimator must consist of the sum of deviation scores squared, divided by (n-1).

A Biased Estimate: The Standard Deviation

The estimator for the standard deviation (that is, the square root of the unbiased estimator of the variance) yields a biased estimate of its population parameter. Fortunately, the bias of the standard deviation is small and can be considered to be negligible when n is greater than 20. The equation for the unbiased estimate of the population standard deviation is:

\[ \text{unbiased } s = \left[1 + \frac {1} {4(n-1)} \right]*s \]

This estimator is rarely used, however, because of the slight difference between its estimates and those found by taking the square root of the sample variance. The population standard deviation of the discrete uniform distribution that we have been considering is 288.67. In figure 10j, the means of the sampling distributions of standard deviations are 255.74, 282.14, 276.17, and 282.85 for samples of size 5, 10, 25, and 50, respectively. These standard deviations are all reasonably close to the population value.

A Biased Estimate: The Range

The range is a biased estimate of its population value. This can be easily seen because the mean of the sampling distribution of the range is dependent upon the sample size. You can observe this relationship when you consider the means of the sampling distributions of the ranges for different sample sizes shown in figure 10k. In figure 10k, the means of the ranges increase from 624.63 to 957.00 as the sample size increases from 5 to 50. This is one reason why the range is not used as an estimate of its population parameter.

Figure 10e An example of the samples of size five which are drawn from a discrete uniform distribution

ID Sample Subject Score
1 1 1 646
2 1 2 336
3 1 3 389
4 1 4 481
5 1 5 895
6 2 1 877
7 2 2 727
8 2 3 637
9 2 4 836
10 2 5 438
11 3 1 277
12 3 2 355
13 3 3 852
14 3 4 385
15 3 5 915
146 1000 1 276
147 1000 2 808
148 1000 3 647
149 1000 4 765
150 1000 5 564

Figure 10f A sample of 20 sample means, variances, standard deviations, ranges, and medians based on the corresponding 30 samples of size five as shown in figure 10e

Sample Mean Variance SD Range Median
1 549.4 51179 226.228 559 481
2 703 30781 175.444 439 727
3 556.8 90994 301.652 638 385
4 451.6 50790 225.366 593 455
5 624.2 53789 231.924 539 555
6 739 47223 217.309 565 704
7 503 111688 334.197 764 374
8 541.6 71011 266.479 705 579
9 421.2 24368 156.103 396 389
10 253 63757 252.5 610 151
11 427.8 121012 347.868 858 366
12 263.4 22750 150.832 376 309
13 489.4 64700 254.363 625 391
14 219.6 48903 221.14 503 177
15 454 156216 395.241 920 312
16 401.2 45586 213.508 514 288
17 526.4 61298 247.585 665 472
998 744.8 44470 210.878 568 801
999 440 82129 286.582 718 408
1000 612 44563 211.098 532 647
[1] 1000
[1] 1

Figure 10g Descriptive statistics for 1000 sample means based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(mean05, mean10, mean25, mean50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                  
 ───────────────────────────────────────────────────────────────────────────── 
                              mean05        mean10      mean25      mean50     
 ───────────────────────────────────────────────────────────────────────────── 
   N                                1000        1000        1000        1000   
   Missing                             0           0           0           0   
   Mean                           505.18      497.35      498.00      498.13   
   Std. error mean                4.1113      2.8558      1.7978      1.2824   
   95% CI mean lower bound        497.11      491.75      494.48      495.61   
   95% CI mean upper bound        513.25      502.96      501.53      500.64   
   Median                         504.42      494.71      498.95      498.83   
   Standard deviation             130.01      90.309      56.852      40.552   
   Variance                        16903      8155.8      3232.2      1644.5   
   IQR                            181.82      126.42      75.283      53.512   
   Range                          701.49      490.69      426.49      266.88   
   Minimum                        153.20      245.34      269.65      369.05   
   Maximum                        854.68      736.03      696.15      635.93   
   Skewness                   -0.0069409    0.017420    0.012627    0.077119   
   Std. error skewness          0.077344    0.077344    0.077344    0.077344   
   Kurtosis                     -0.41671    -0.37078    0.078074    0.048292   
   Std. error kurtosis           0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W                0.99669     0.99656     0.99840     0.99882   
   Shapiro-Wilk p                0.03408     0.02743     0.48969     0.76603   
 ───────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10h Descriptive statistics for 1000 sample medians based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(median05, median10, median25, median50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              median05     median10    median25    median50   
 ──────────────────────────────────────────────────────────────────────────── 
   N                               1000        1000        1000        1000   
   Missing                            0           0           0           0   
   Mean                          507.33      497.82      496.05      497.27   
   Std. error mean               5.9243      4.3500      3.0085      2.1690   
   95% CI mean lower bound       495.70      489.28      490.15      493.01   
   95% CI mean upper bound       518.95      506.36      501.95      501.53   
   Median                        506.98      496.43      491.76      497.03   
   Standard deviation            187.34      137.56      95.138      68.588   
   Variance                       35097       18923      9051.2      4704.4   
   IQR                           289.02      209.14      133.48      94.076   
   Range                         933.47      793.53      573.63      402.54   
   Minimum                       24.191      98.627      211.33      305.58   
   Maximum                       957.66      892.16      784.96      708.12   
   Skewness                   -0.055444    0.040829     0.11089    0.088909   
   Std. error skewness         0.077344    0.077344    0.077344    0.077344   
   Kurtosis                    -0.73310    -0.41119    -0.20510    -0.11654   
   Std. error kurtosis          0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W               0.98933     0.99639     0.99757     0.99842   
   Shapiro-Wilk p              < .00001     0.02069     0.14606     0.50094   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10i Descriptive statistics for 1000 sample variances based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(var05, var10, var25, var50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                    
 ─────────────────────────────────────────────────────────────────────────────── 
                              var05        var10        var25        var50       
 ─────────────────────────────────────────────────────────────────────────────── 
   N                               1000         1000         1000         1000   
   Missing                            0            0            0            0   
   Mean                           82894        84184        83525        84190   
   Std. error mean               1318.7       820.09       510.39       344.59   
   95% CI mean lower bound        80306        82574        82523        83513   
   95% CI mean upper bound        85481        85793        84526        84866   
   Median                         81496        83094        82875        84627   
   Standard deviation             41700        25933        16140        10897   
   Variance                   1.7389e+9    6.7254e+8    2.6050e+8    1.1874e+8   
   IQR                            57779        35994        22042        14103   
   Range                         226376       160305       102012        64108   
   Minimum                       1803.9        20291        42198        54434   
   Maximum                       228179       180596       144210       118541   
   Skewness                     0.37698      0.16218      0.15533      0.10568   
   Std. error skewness         0.077344     0.077344     0.077344     0.077344   
   Kurtosis                    -0.18064     -0.28044     0.056525    -0.091433   
   Std. error kurtosis          0.15453      0.15453      0.15453      0.15453   
   Shapiro-Wilk W               0.98461      0.99543      0.99710      0.99733   
   Shapiro-Wilk p              < .00001      0.00436      0.06761      0.09904   
 ─────────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10j Descriptive statistics for 1000 sample standard deviations based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(sd05, sd10, sd25, sd50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              sd05        sd10        sd25        sd50        
 ──────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000         1000   
   Missing                           0           0           0            0   
   Mean                         277.35      286.49      287.63       289.54   
   Std. error mean              2.4452      1.4522     0.89116      0.59597   
   95% CI mean lower bound      272.55      283.64      285.88       288.37   
   95% CI mean upper bound      282.14      289.34      289.38       290.71   
   Median                       285.47      288.26      287.88       290.91   
   Standard deviation           77.325      45.923      28.181       18.846   
   Variance                     5979.1      2108.9      794.16       355.18   
   IQR                          103.90      62.490      38.251       24.375   
   Range                        435.21      282.52      174.33       110.99   
   Minimum                      42.473      142.45      205.42       233.31   
   Maximum                      477.68      424.97      379.75       344.30   
   Skewness                   -0.31898    -0.25021    -0.13913    -0.077364   
   Std. error skewness        0.077344    0.077344    0.077344     0.077344   
   Kurtosis                   -0.24573    -0.22036    0.019787     -0.12256   
   Std. error kurtosis         0.15453     0.15453     0.15453      0.15453   
   Shapiro-Wilk W              0.98943     0.99395     0.99724      0.99761   
   Shapiro-Wilk p             < .00001     0.00046     0.08495      0.15525   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10k Descriptive statistics for 1000 sample ranges based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(range05, range10, range25, range50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                
 ─────────────────────────────────────────────────────────────────────────── 
                              range05     range10     range25     range50    
 ─────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000        1000   
   Missing                           0           0           0           0   
   Mean                         667.46      824.18      924.56      961.44   
   Std. error mean              5.6594      3.3525      1.6232     0.84428   
   95% CI mean lower bound      656.36      817.61      921.37      959.79   
   95% CI mean upper bound      678.57      830.76      927.74      963.10   
   Median                       688.00      842.57      936.09      966.57   
   Standard deviation           178.97      106.02      51.330      26.699   
   Variance                      32029       11239      2634.8      712.82   
   IQR                          244.76      145.93      61.926      33.887   
   Range                        884.67      523.29      332.57      172.87   
   Minimum                      100.89      473.00      666.85      826.94   
   Maximum                      985.57      996.29      999.42      999.81   
   Skewness                   -0.51438    -0.77897     -1.2853     -1.2381   
   Std. error skewness        0.077344    0.077344    0.077344    0.077344   
   Kurtosis                   -0.30067     0.13049      1.9565      2.0920   
   Std. error kurtosis         0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W              0.97194     0.94964     0.90481     0.91424   
   Shapiro-Wilk p             < .00001    < .00001    < .00001    < .00001   
 ─────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10k(i) Descriptive statistics for 1000 sample MADs based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(mad05, mad10, mad25, mad50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)

 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              mad05       mad10       mad25       mad50       
 ──────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000         1000   
   Missing                           0           0           0            0   
   Mean                         298.59      336.62      356.70       366.81   
   Std. error mean              3.9416      3.1057      2.2723       1.5807   
   95% CI mean lower bound      290.86      330.52      352.24       363.71   
   95% CI mean upper bound      306.33      342.71      361.16       369.92   
   Median                       291.30      334.92      355.96       368.50   
   Standard deviation           124.64      98.212      71.856       49.986   
   Variance                      15536      9645.5      5163.3       2498.6   
   IQR                          181.16      145.01      99.431       65.639   
   Range                        617.35      543.67      444.58       285.62   
   Minimum                      20.421      75.288      149.58       218.81   
   Maximum                      637.77      618.96      594.17       504.43   
   Skewness                    0.25259    0.099745     0.10954    -0.080518   
   Std. error skewness        0.077344    0.077344    0.077344     0.077344   
   Kurtosis                   -0.51973    -0.51154    -0.13483    -0.092645   
   Std. error kurtosis         0.15453     0.15453     0.15453      0.15453   
   Shapiro-Wilk W              0.98843     0.99418     0.99814      0.99818   
   Shapiro-Wilk p             < .00001     0.00064     0.34452      0.36828   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10l Histograms for 1000 sample means based on samples of size 5, 10, 25, and 50

Figure 10m Variance bar plots for 1000 sample means based on samples of size 5, 10, 25, and 50

Figure 10n Variance bar plots for 1000 sample medians based on samples of size 5, 10, 25, and 50

Figure 10o Variance bar plots for 1000 sample standard deviations based on samples of size 5, 10, 25, and 50

Figure 10p Variance bar plots for 1000 sample ranges based on samples of size 5, 10, 25, and 50

Figure 10q Variance bar plots for 1000 samples which illustrate that the sample mean is a more efficient estimator than the sample median

Consistent Estimates

The statistics shown in the tables and figures are all based on consistent estimators, and this fact is the most striking feature of these tables and figures. In all cases, as sample size increases, the variability of the sample estimates decreases.

This is vividly shown for all of the statistics on their variance bar plots. For example, the variance bar plots of the sample means in figure 10m shrink dramatically as the sample size upon which a given mean is based increases. These bars reflect the sampling variances in figure 10g of 23092.73, 6037.74, 4158.55, and 2055.69 for means based on samples of size 5, 10, 25, and 50, respectively (remember that the variance of the sample means is called the “variance of the mean”). These variances can also be seen in the error bar plots in Figure 10m.

The sampling distributions shown in the histograms of figure 10l illustrate the consistency of the sample mean by having fewer bars with larger frequencies (that is, less spread) as the sample size increases. For example, in figure 10l, there are 9 bars when the sample size is 5, but in the sampling distribution based on 50 units per sample, we have only 4 bars and two bars dominate the others with frequencies that are greater than or equal to 12.

Efficient Estimates: Mean Versus Median

In a uniform distribution, both the population mean and the population median are equal. Therefore, you might ask: In a uniform distribution, should one use the estimate of the mean or of the median to measure the center of the distribution? Since both the mean and the median are consistent and unbiased estimates, the answer to this question is found when you consider the relative efficiency of these two statistics.

Statisticians have shown that for symmetric distributions the sampling distribution of the mean has a smaller standard deviation than does the sampling distribution of the median. That is, the standard error of the mean is smaller than the standard error of the median. This fact is vividly displayed using the variance bar plots shown in figure 10q. In figure 10q, the first four variance bars are based on sample means, and the second four variance bars are based on sample medians. You can see that for both statistics the variance bar plots decrease as the sample size increases. For the same sample sizes, however, the variance bar plots of the means are always smaller than the variance bar plots of the medians.

Inefficient Estimation: The Median

For a symmetric population distribution the sampling distribution of the mean will always have a smaller standard estimate than will the sampling distribution of the median. For this reason, the mean should be used when the population distribution is symmetric. In a skewed distribution, however, the mean, even with its smaller standard error, provides a “false” impression of the center of the distribution. In this case, the median, because it is actually in the center of the scores, may be regarded as providing more useful information. (The terms false and useful require further definition, which is beyond the scope of this book.)

SUMMARY

This chapter explained how to acquire a random sample of units. Drawing a random sample using slips of paper and a fishbowl (or some other container) frequently leads to nonrandom samples because it is difficult to mix the slips of paper so they can be considered random. A better method is to use a table of random numbers, although this becomes an arduous task with large samples. jamovi can generate numbers at random.

Two methods of sampling used by statisticians were also discussed. The method used most often in practice was referred to as sampling without replacement. Using this method, once a number is drawn it is not replaced into the population. Therefore, in using sampling without replacement a sample does not contain repeats of a random number. The second method of sampling was referred to as sampling with replacement. Using this method each time a number is chosen it is replaced in the population and therefore could be chosen again. Sampling with replacement usually yields samples with repeats of numbers.

Next, a theoretical probability distribution called the sampling distribution was explained. A sampling distribution is a probability distribution of a given statistic, where the statistic is calculated on samples of a given size. Examples demonstrated the important role the sampling distribution plays as a basis for statistical testing. This role is discussed in detail in the next chapter. This chapter focused on the sampling distribution’s role in helping to illustrate criteria that are used to judge estimates of population parameters. Statistics are frequently evaluated to see if they are unbiased, consistent, efficient, and/or sufficient. These properties were defined and the first three were illustrated.

Chapter 10 Appendix A Study Guide for Z Statistics

Z Statistics

No output is needed here, but you will need a standard normal distribution table or calculator.

  1. What is the critical value for the Z statistic if you want to use a two-tailed level of significance of .05?

  2. What is the critical value for the Z statistic if you want to use a one-tailed level of significance of .05?

  3. What is the probability of getting the following Z statistics, or larger, as an absolute value (that is, that far or farther away from ZERO in both directions; that is, below -Z and above +Z) if the null hypothesis is true?

  1. Z = 2.2
  2. Z = 1.7
  1. Calculate the Z statistic from the following given information:
  1. the null hypothesis is H0: \(\mu\)=150
  2. the known population standard deviation \(\sigma=16\)
  3. the sample mean was 155
  4. the sample size was 64
  1. Calculate the 95% confidence interval around the sample mean in the previous item using the standard normal distribution probabilities (i.e., finding the appropriate Z critical value and using known population parameters).

Citation

Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/

This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for 
most semesters 1987-1991 and again 2018-2025.