Chapter 10: Random Samples, Sampling Distributions, and Point Estimates

Back to Chapter Index Page

INTRODUCTION

In this chapter, we will begin our study of inferential statistics by considering its cornerstone, the random sample. We will examine three methods of selecting a random sample, and we will consider a theoretical distribution known as the sampling distribution. We will also consider the role the sampling distribution plays in determining the properties of statistics that are considered to be good estimates of their population parameters. Here, we will define and illustrate criteria by which statistics that are known as point estimates are judged.

Your jamovi objective will be to generate a list of N random numbers from which you will use the first n numbers to select a random sample.

RANDOM SAMPLES

Definition and Purposes

In chapter 4, a random sample was defined as: sample of n units from a population of N units, where each of the possible samples of units has the same probability of being selected. Random sampling has two purposes in experiments. It allows us to make inferences to population parameters based on sample statistics. It also controls pretreatment systematic differences between the units allocated to different treatments. In the following sections, we describe three methods for selecting a random sample from a finite population. In each case, we assume that the population has been defined, that each unit (for example, person) in the population has been numbered, and that each unit has been selected at random to form a simple random sample. The list of population units is called an accessible population, or more commonly a sampling frame.

Method 1: The “Fish bowl” Shuffle

The following steps describe what might be called a “layman’s” method of selecting a random sample.

Number the units in the population.
Write the number of each unit on a separate slip of paper.
Put the slips of paper in a container (for example, a fishbowl, hat, or box).
Thoroughly shuffle the slips of paper.
Remove one slip of paper from the container and record the number.
Repeat steps 4 and 5 until you have obtained the sample size you need.

One problem with this method of selecting a random sample is that there is a tendency for the slips of paper to stick together. When this happens, the sample that is selected will not be a random sample. This problem occurred during the Viet Nam War when the birth dates of young men eligible for the draft were put on plastic balls. These balls were put into a bowl and then shuffled. The young men were drafted in the order that their birth dates were drawn from the bowl. It was discovered, however, that once a date from one month was selected, an unusually large number of dates from that month were also selected. This indicated that the sampling method was not yielding a truly random sample of birth dates. A computer-generated method of drawing a random sample was then adopted.

Method 2: Table of Random Numbers

Table 10.1 contains a random collection of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 arranged in groups of 9. Each individual digit is randomly generated, not the group of 10. But because each digit is random, you can also get random 2-digit, 3-digit, 4-digit, and so forth, numbers by combining the digits together into larger numbers. The following steps show how to select a random sample of 10 subjects from a population of 100 subjects using Table 10.1.

Table 10.1 Random Numbers

Row	Column1	Column2	Column3	Column4	Column5	Column6	Column7
1	101933318	905711576	671522731	108278632	438427426	919306592	539186364
2	674788178	438475141	119343377	626120793	282029804	279955782	807968849
3	852286888	109790763	855296230	306052532	671918934	446408478	552601949
4	635172620	637847535	318490860	936689237	998810016	385507061	450243822
5	819035120	383364335	311977125	197698225	395595207	268246391	530128931
6	201006283	294389988	652050753	909936874	138768344	443431671	121174750
7	115682247	884502614	064866906	853708718	685719367	346331449	014700711
8	886112681	953650363	605938044	790133197	636019472	345755195	266106203
9	725401822	064510586	846789078	453868392	401090001	618616979	304767833
10	807565837	062639585	889965096	507219732	256882474	225256838	541426545
11	327489316	182112333	461354158	930220027	368283670	310786904	773878600
12	331399171	772452873	573858998	464923955	981713218	305581352	864046964
13	635551285	132472875	743362628	524340887	080944058	486383376	777353849
14	771017636	512898727	879681051	458204319	572248299	508832986	265536543
15	939949716	607507358	820094882	372900166	402798895	321403201	865435153
16	990727226	307341312	251346956	704626702	872273400	033053599	276143660
17	000377442	176452501	872189314	279276567	693179186	589556407	462106132
18	478807807	534842576	066962177	049322335	515537151	412818445	020955777
19	100486257	714813518	329671352	797625534	393871273	762891649	743720164
20	801288852	387305354	433075084	095646566	106899139	375583663	098070750
21	338412564	095343641	714610386	964260504	004133151	222475642	077873557
22	376037848	341173985	079986646	784481796	108582201	676311486	723148437
23	469888202	970606163	506385357	070536180	781226492	549994861	080771457
24	403316982	635898235	566528112	251188380	180035513	715611739	007980888
25	199779357	605976429	526513420	538564205	998723011	286539617	515564055
26	014223073	441082761	549387434	178440249	752565032	843527323	419731919
27	341913548	746369269	826527253	705660700	353202136	100483117	969108222
28	626884804	015887409	392746106	479768693	285104477	358732383	607682630
29	269906390	061742638	101490118	320193932	293369157	692981470	945575141
30	613236639	106041713	707311287	906243223	701949868	065016309	771244403
31	131523374	649063230	814408793	217861931	041006439	947854668	673607627
32	527809157	995961681	142221938	205626278	350822853	771559654	105434012
33	474224383	121453978	002778731	653328151	672922026	702346771	818633028
34	160931627	125490591	254660148	927046225	961276565	140828585	268392333
35	842925948	125134134	204003422	823347552	327574625	573817917	817092039
36	966441348	439762091	831935981	477940167	538031844	911010397	601858259
37	359568539	954020176	186342940	832259394	605614816	033000464	369846405
38	473393191	807938060	912144713	198265128	082052685	906312064	942904108
39	391406612	622964900	830862122	246992036	624596058	046264815	361553151
40	118065552	541246696	690344589	089960450	282576926	871278767	736304521
41	724649810	772285811	903458917	800892057	801052863	424530086	734548304
42	501834881	099858580	901359898	849805082	216321002	663969277	913510852
43	183208200	844938544	474828413	777514273	890882186	346320580	596376308
44	663259921	215505831	854961955	604998708	860630190	044503747	614740697
45	543708555	829856999	256388623	476380286	066796475	249614066	324314161
46	075973299	625185879	777269244	993391552	130887387	588444190	571200729
47	044265718	525441576	440861107	075482470	637558116	111284118	766221093
48	932779799	328876565	643767732	658761197	378225086	429408280	387484595
49	127264933	322367220	723023648	490085913	623134925	155691547	908608850
50	728377036	112769525	064139205	774327378	902723197	568274943	233969621

Roll a die to determine which column to use. Or you could flip a coin 6 times and count the number of TAILS you toss. In our example, we rolled a 5, and so we will start with column 5 of random numbers.
Close your eyes and touch your computer screen inside the table. See what 2-digit number you are pointing at. When we did this, our finger was pointing to the number 15 (Row 17 Column 4 the second and third numbers). Therefore, we will use row 15 of column 5 as our starting place. Note that there are only 50 rows to this table, so if you point at a two-digit number larger than 50, first try to reverse it (e.g., 51 becomes 15). If that still doesn’t produce a two-digit number under 50, close your eyes and point again until you obtain a number 50 or less.
In our example, the starting point was the number 40, the first 2-digit number in Row 15 and Column 5 (or 4 or 402 or 4027, etc., depending on how large a number you need). We will need 2-digit numbers because we have 100 people in our population (where 00 would indicate person 100). Simply choose the number of digits appropriate for your population size. If the number is unuasable, simply discard it and move on (e.g., if we have only 39 people in our population instead of 100, we would discard the 40 and move on as follows).
Approach A. 4. Now we choose our next 9 two-digit numbers, by either moving across the row (either left or right as long as you are consistent) or up/down the column. If we move to the right, our next 2-digit numbers are: 27, 98, 89, 53, 21, 40, 32, 01 (i.e., 1), and 86. However, because 40 was already used it becomes unusable the second time, so we pick an additional number: 54. We could move down the column, instead, and choose after 40: 87, 69, 51, 39, 10, 00 (i.e., 100), 10, 78, 18. However, again, 10 is unusable so we choose another: 99. You could toss a coin to decide if you will move by rows or columns from the starting point found in step 3.
Approach B. As you can see the latter approach requires that a large number of random numbers be considered before n unique random numbers are found. The number of random numbers considered can be reduced by using modular arithmetic to find numbers that mathematicians describe as being “congruent modulo N”. Using this approach each random number is divided by the population size, N. The remainder is then considered to be a random number, with remainders of zero considered to represent the number N. Our starting point using approach B is now the number 402, since we need a number larger than the population size 100. However, each number that we consider is divided by 100, our N, and the remainder is considered to represent a random digit. When we divided 402 by 100 the remainder was 02, and so the first number in our sample was 2. The following remainders were found for the three-digits numbers in the row following 402: 98, 95, 21, 03, 01, 65, 35, 53, 90. Therefore, the subjects with those ID numbers were selected to represent our random sample. Note that when we reached the end of the row, we moved to the beginning of the next row (Row 16 Column 1).

Method 3: Random Number Generator

In the following steps you are shown how to generate a random sample of 10 cases from a population of 20 cases using jamovi. We will use the data in Table 5a for this example. The process involves creating one or two new columns of numbers. If you do not have an ID number, you will need to create one simply as an ordered list of the numbers from one to your population size (i.e., 1 to N = 20). The second new column that contains a list of twenty-five numbers that have been generated at random from the uniform distribution. The end points of this uniform distribution will be set at zero and one. Then, the random list will be ordered, and in the process, the ordered list will be placed in random order.

For this example we will use the data in Figure 5a (hopefully you have it available as a dataset you can open, but if not you can enter it pretty easily). We will use only the 20 cases in the group 1 (the Yes-Pill cases). See Figure 10a. Even though we have an ID number, we will create one for illustrative purposes. We will use jamovi’s UNIF function to create random uniform numbers.

Figure 10a Data from Original Cholesterol data (Figure 5a) for example of using a random number function

Row	ID1	HT1	CHOL1
1	225	64	210
2	736	63	215
3	1291	64	263
4	906	63	220
5	494	62	330
6	796	65	198
7	1637	65	260
8	2871	66	198
9	3349	65	250
10	2522	63	179
11	828	67	243
12	1309	62	247
13	2349	64	210
14	544	64	272
15	1326	66	210
16	59	62	253
17	473	67	297
18	1251	60	195
19	2271	66	158
20	1797	68	150

Steps:

Open or Create the dataset
Compute a new variable called IDNUM using the ROW() function
Compute a new variable called RANDNUM using the UNIF(0, MAX(ROW)) function. If you are using a program where you can set a seed, that is useful so you can recreate the same list of numbers again if needed.
The values in the variable RANDNUM column are now the randomly generated uniform numbers from 0 to 1.
Because they are random, we can use them to choose a randomly selected list of cases. See Figure 10b. And, of course, you can sort ascending or descending, it doesn’t matter.
If you need 10 randomly chosen cases, then you would choose the following Cases (IDs) with the smallest RANDNUMs. To make our lives a little easier with bigger datasets, you can rank the RANDNUMs and just choose the cases with ranks less than the desired number of cases. Here we chose ID1: 1797, 1309, 59, 544, 473, 828, 906, 2522, 1637, and 2349. These correspond to the new IDNUMs of: 20, 12, 16, 14, 17, 11, 4, 10, 7, 13.

Figure 10b Sorted Data for example of random number function

Row	ID1	HT1	CHOL1	IDNUM	RANDNUM	RANKNUM
1	225	64	210	1	2.9367	1
2	736	63	215	2	18.2085	19
3	1291	64	263	3	13.7589	13
4	906	63	220	4	3.0573	2
5	494	62	330	5	9.3301	8
6	796	65	198	6	18.4668	20
7	1637	65	260	7	11.2445	11
8	2871	66	198	8	13.8210	15
9	3349	65	250	9	9.3310	9
10	2522	63	179	10	3.2675	4
11	828	67	243	11	12.8963	12
12	1309	62	247	12	6.3586	6
13	2349	64	210	13	6.3192	5
14	544	64	272	14	16.3514	16
15	1326	66	210	15	17.1935	17
16	59	62	253	16	3.0860	3
17	473	67	297	17	17.2506	18
18	1251	60	195	18	6.8150	7
19	2271	66	158	19	13.7665	14
20	1797	68	150	20	9.4818	10

You can use a similar process to generate random numbers that corresponding to row numbers in a list you do not have in jamovi. Let’s say your list has 300 names. You will need to create a column in jamovi with 300 cases (the column can be mostly missing values, but you’ll need some value in row 300 of the Data Editor). So scroll down as far as you can and enter values until you get to row 300 and enter something. Go to COMPUTE and Instead of UNIF(0,1) you would enter UNIF(1,300). The rank the random numbers and choose the row numbers to use in your list.

SAMPLING WITH AND WITHOUT REPLACEMENT

Sampling Without Replacement

The sampling method described above are examples of what is own as sampling without replacement. In sampling without replacement a number is sampled at random, but is not replaced in the population, so that it can not be chosen again. Therefore, sampling without replacement is a process which yields no repeats of numbers in the sample. Sampling without replacement is commonly used in practical applications and is the method used throughout this book.

Sampling With replacement

In sampling with replacement a number is randomly selected from a population of numbers and recorded. The number selected is then returned to the population and a second number is randomly chosen and recorded. This process is repeated, until a given sample size is obtained. Sampling with replacement can be done with the method referred to as the “fish bowl shuffle” by replacing each slip of paper back into the fish bowl and shaking the bowl before selecting another slip. Using a table of random numbers, repeats of a number are kept in the sample. In practice almost all sampling is done without replacement. However, there are situations where replacement is used. In particular, a relatively new robust statistical approach called “bootstrapping.”

RANDOM ASSSIGNMENT

You used the preceding sampling procedures to randomly select units. You can also use them for random assignment. After you have done your random selection, proceed as follows:

Use one of the preceding methods of finding a random sample to obtain a random order of your treatments. For example, if you had four treatments, you might find the following random order for the treatment numbers: 2, 1, 4, 3.
Use one of the preceding methods of finding a random sample to randomly select a unit and then place this unit into your first randomly ordered treatment (in our example, treatment number 2). Then place the next unit that is randomly selected into the next treatment. Continue this process of randomly selecting a unit and placing it into a treatment until all of your units have been placed.

SAMPLING DISTRIBUTIONS

A sampling distribution is a probability distribution of a statistic. Remember that a probability distribution is a theoretical distribution. In this sense, a sampling distribution is based on values of a statistic that has been calculated for each of an infinite number of random samples of size n. Statistics that are commonly considered in sampling distributions are the mean, variance, skewness, kurtosis, correlation coefficient, regression coefficient, proportions, and different test statistics. (We will consider the sampling distributions of different test statistics as we introduce them, starting in chapter 12.)

Sampling distributions of statistics are based on an infinite number of samples. We can obtain an idea of what a sampling distribution is, however, by considering an illustration based on a finite number of samples. For example, we can simulate a sampling distribution of the variance using 100 sample variances based on samples of IQ scores from the population of high school students in New York City. We obtain the variances needed for the sampling distribution by calculating the variance in each of 100 samples of students. For this example, each sample contains 30 students. To calculate a given variance, we select a random sample of 30 students and administer an IQ test to them. We then calculate the variance of the resulting IQ scores, and repeat this process for 100 such samples. When we finish, we have 100 variances, which we can put into a frequency distribution.

In this example, our frequency distribution meets our definition of a sampling distribution if it contains an infinite number of samples and we grade the y-axis with probabilities instead of frequencies. We could theoretically obtain an infinite number of samples of students by:

drawing a random sample of students
calculating their variance
putting he students back into the population
drawing another random sample of students
repeating these steps ad infinitum.

The ad infinitum (i.e., continuing the process infinitely) is the reason this distribution is called a theoretical distribution.

The Importance of the Sampling Distribution

The sampling distribution is important because mathematical statisticians can tell what shape the sampling distributions of many statistics will take (for example, normal, positively skewed, and so on). Furthermore, statisticians can tell what the mean and variance of these distributions will be.

Central Limit Theorem

If we know the mean, variance, and shape of a sampling distribution of a statistic, we can make inferential statements, that is, statements concerning parameter estimation and significance testing. A good example of this is the sampling distribution of the mean. A theorem, known as the Central Limit Theorem, states that:

If a population has a finite variance \(\sigma^2\) and mean \(\mu\), then the sampling distribution of the mean aim approaches a normal distribution when n (the sample size of the random samples upon which the sample means are calculated) increases. That is, when n is very large, the sampling distribution of the mean is approximately normal. Furthermore, the mean of the sampling distribution of the mean is \(\mu\), and the variance of the sampling distribution of the mean is \(\sigma^2/n\).

The variance of the sampling distribution of the mean is called the variance of the mean, and the standard deviation of the sampling distribution of the mean is called the standard error of the mean. Note that in this theorem nothing is said about the population distribution; that is, the population distribution can take any shape. If the population is known to have a normal distribution, however, the sampling distribution of the mean will be normal with any size sample.

An Example

We begin to see some of the practical applications of the Central Limit Theorem when we consider a sample of 25 that we believe was selected at random from a population whose mean is 50 and whose variance is 100. If the sample did come from this population, we know that the mean of the sampling distribution of such sample means is 50, and the variance of this sampling distribution is 100/25 = 4, with a standard deviation of 2.

Given this information, our knowledge of standard scores, and the normal probability distribution, we can ask and answer the following estimation and significance test questions. Use the sampling distribution of the mean shown in figure 10c to assist you in considering these questions.

Question 1

Between what two sample means would we expect 95% of the sample means to fall?

Answer 1

Given table 9.1(b), in the standard normal probability distribution, we would expect 95% of the scores to fall between z(.025) = -1.960 and z(.975) = 1.960. In the sampling distribution of the mean, sample means are our scores; and so a z statistic (a z score for a group) for a given mean is written as:

\[ \begin{equation} z_X = \frac {(M_X - \mu_X)} {\sigma_{M_X}} \\ \text{or} \\ z_X = \frac {(\overline{X} - \mu)} {\sigma_\overline{X}} \\ \tag{10-1} \end{equation} \] Here, \(\mu\) = 50 and \(\sigma_M\) = 2. We must consider z at z(.025) = -1.96 and z(.975) = 1.96, and solve for the values of M. In so doing, we have:

\[ \begin{align} 1.96 &= (M-50)/2 \\ 3.92 &= M-50 \\ 53.92 &= M \\ \\ -1.96 &= (M-50)/2 \\ -3.92 &= M-50 \\ 46.08 &= M \end{align} \] Therefore, we would expect that 95% of our sample means would fall between 46.08 and 53.92, that is, p(46.08 < M < 53.92) = .95.

Question 2

Between what two sample means would we expect 99% of the sample means to fall?

Answer 2

Given table A.1(b), in the standard normal distribution, we would expect 99% of the standard scores to fall between z(.005) = -2.576 and z(.995) = 2.576. Therefore, using the procedure described for answer 1, we have:

\[ \begin{align} 2.576 &= (M-50)/2 \\ 5.152 &= M-50 \\ 55.152 &= M \\ \\ -2.576 &= (M-50)/2 \\ -5.152 &= M-50 \\ 44.848 &= M \end{align} \]

Therefore, we would expect that 99% of our sample means would fall between 44.848 and 55.152, that is, p(44.848 < M < 55.152) = .99.

Question 3

If a sample mean of 52.4 were obtained, what would be a good estimate of the population mean? (In the following section, we will describe some criteria that help decide what good means.)

Answer 3

The sample mean of 52.4 is a good estimate of the population mean.

Question 4

If a sample mean of 52.4 were obtained, would it be reasonable to assume that the mean of the sampling distribution was 50? In other words, would it be reasonable to assume that the sample came from a population where the mean was 50?

Answer 4a

If the population mean was 50, we would expect that 95% of the samples randomly selected from this population have means that fall between 53.92 and 46.08. Therefore, since 52.4 falls within this interval, it would be reasonable to assume that the population mean was 50.

Answer 4b

The mean 52.4 differs from the suspected population mean by 2.4 points. We can answer question 4 in another way by asking: What is the probability of obtaining a sample mean that differs by 2.4 points or more from a population value of 50? Here, we are asking for the probability of obtaining a sample mean that is less than 47.6 (that is, 50- 2.4) or greater than 52.4 (that is, 50 + 2.4). We can easily answer this question by transforming the score of 52.4 into a z score and then finding the area above it, using table 9.1(a):

\[ z = \frac {52.4-50} {2} = 1.2 \]

\[ p(z>1.2)=.5000-.3849=.1151 \] Also, we can find the area below the z score corresponding to the score of 47.6 (which because of the symmetry of the normal distribution is the same as that area for z > 1.2) as:

\[ z = \frac{47.6-50} {2} = -1.2 \]

\[ p(z < -1.2) = .1151 \]

Therefore, the probability of obtaining a sample mean that is more than +2.4 points from a population mean of 50 is .2302 or \(p(z < -1.2) + p(z > 1.2) = .1151 + .1151 = .2302\). Because this probability is high, we can conclude that if the population mean was 50, it would be reasonable to expect a sample whose mean was 52.4. (In the next chapter, we will discuss what is meant by a high probability in this situation.)

Question 5

If a sample mean of 56 were obtained, would it be reasonable to assume that the mean of the sampling distribution was 50? In other words, would it be reasonable to assume that the mean of the population was 50?

Answer 5a

If the population mean were 50, we would expect that 95% of the sample means would fall between 46.08 and 53.92, and that 99% of the sample means would fall between 44.848 and 55.152. Since a sample mean of 56 is outside both of these ranges, we might conclude that the mean of the sampling distribution (the population mean) is not 50.

Answer 5b

As in answer 4b, we can use table 9.1(a) to find the probability of obtaining a sample mean that differs by 6 points (that is, between 44 and 56) from a population mean of 50. Here, the z score for a sample mean of 56 is:

\[ z = \frac{56-50} {2} = 3.00 \]

\[ p(z > 3.00) = .5000 - .4987 = .0013 \]

Similarly, the z score for the sample mean of 44 is:

\[ z = \frac{44-50}{2} = -3.00 \] \[ p(z < -3.00) = .0013 \] Therefore, if the population mean is 50, the probability of obtaining a sample mean that differs from this population mean by 6 or more points is p(z < -3.00) + p(z > 3.00) = .0013 + .0013 = .0026. Since this probability is small, we might conclude that a sample mean of 56 came from a population whose mean is greater than 50.

THE ROLE OF THE SAMPLING DISTRIBUTION IN INFERENTIAL STATISTICS

The preceding questions and answers gave you a sense of the role that a sampling distribution (here, of the mean) plays in inferential statistics. It is an extremely important role although it stays primarily in the background. We do not have to actually find the sampling distribution of a statistic, we only have to know what its shape is and what its parameters are. Based on our knowledge of the sampling distribution we can make a priori probability statements about an unknown sample statistic (as in questions 1 and 2) or an inference about a population parameter (as in questions 3, 4, and 5).

In the next chapter, we will further consider such questions when we consider the importance of the sampling distribution to hypothesis testing. For now, however, we will consider the question that is implied in question 3:

What criteria can be used to help decide if a sample statistic is a good estimate of its population parameter? Said another way: What are the properties of statistics of which we want to consider the sampling distributions? We will consider the formal definitions of these criteria and examine them for commonly used statistics.

POINT ESTIMATES

Definitions

A statistic is said to be a point estimate when it is used to infer the value of a population parameter. The equation used to derive the statistic is called the estimator. The following criteria are frequently used to evaluate a statistic:

Unbiased

A statistic is said to be unbiased when the mean of its sampling distribution is its population parameter.

Consistent

A statistic is said to be consistent when the probability that it is close to its population parameter increases as the sample size increases.

Efficient

Kendell and Buckland (1976, p. 47) described efficiency as follows:

The concept of efficiency in statistical estimation is due to Fisher (1921) and is an attempt to measure objectively the relative merits of several possible estimators.

The criterion adopted by Fisher was that of variance, an estimator being regarded as more “efficient” than another if it has smaller variance; and if there exists an estimator with minimum variance v the efficiency of another estimator of variance v1 is defined as the ratio of v/v1. It was implicit in this development that the estimator should obey certain criteria such as consistency. For small samples, where other considerations such as bias enter, the concept of efficiency may require extension or modification.

Sufficient

The definition of a sufficient statistic is beyond the scope of this book; suffice it to say that a sufficient statistic contains all of the information in its sample relative to the estimation of its population parameter.

SAMPLING DISTRIBUTIONS AND ESTIMATES

In this section, we will create finite sampling distributions whose properties and estimates we can examine more closely. This exercise will enable us to better conceptualize what a sampling distribution is, and what the properties of its scores (estimates) are. In this regard, we will consider samples from a uniform population distribution.

The uniform distribution was chosen to illustrate that the sampling distributions of most statistics based on samples from this distribution are not uniform. Indeed, considering the Central Limit Theorem, we know that the sampling distribution of the mean for large samples will be close enough to a normal distribution for most practical purposes.

The Discrete Uniform Probability Distribution

As was the case for the normal distribution, instead of considering a given population with a uniform distribution and a fixed sample size, we will consider a given uniform probability distribution that represents all sample sizes. Consider the discrete uniform probability distribution shown in figure 10d, which has a low boundary of 0 and a high boundary of 1000. The probability of sampling a given number from this distribution is 1/1001, since there are 1001 numbers and each has an equal chance of being selected. The following population parameters have been derived by mathematical statisticians for any discrete uniform probability distribution:

\[ \begin{align} Mean = \mu &= (a+(b-1))/2 \\ Median = Md. &= (a+(b-1))/2 \\ Variance = \sigma^2 &= (b^2-1)/12 \\ Skewness = b_1 &= 0 \end{align} \] Here, \(a\) is the lower boundary, and \(b\) is the upper boundary. Therefore, for the discrete uniform probability distribution shown in figure 10d, we have (for \(a = 0\) and \(b = 1000\)) that:

\[ \begin{align} Mean &= \mu = &&(0+(1000-1))/2 &&= 499.5 \\ Median&=Md. = &&(0+(1000-1))/2 &&= 499.5 \\ Variance &= \sigma^2 = &&(1000^2-1)/12 &&= 83333.25 \\ Skewness &= b_1 = && &&= 0 \end{align} \] We will select numbers at random from this theoretical probability distribution.

Figure 10c Sampling distribution of the Mean

Figure 10d A discrete uniform probability distribution with lower boundary at 0 and upper boundary at 1000

Finite Sampling Distributions Based on Different Sample Sizes

###The Raw Data and Its Statistics

To illustrate the properties of sampling distributions and their estimates, 1000 random samples based on the uniform probability distribution shown in figure 10d, were generated for sample sizes of 5, 10, 25, and 50 units.

The 1000 samples of size 5 are partially shown in figure 10e. Similarly, figure 10f shows the means, variances, standard deviations, ranges, and medians for the 1000 samples of size 5. Although the summary statistics for the samples of sizes 5, 10, 25, and 50 units will be examined, only the observations based on samples of size 5 are shown here to keep the presentation less cluttered.

The mean of the scores in Sample 1 in figure 10e is 549.4, which is the first mean shown in column 4 (labeled MEAN 5) of figure 10f. Also, the variance for the first sample in figure 10e is shown in the first row of figure 10f as 51179; the standard deviation is 226.2280 the range is 559; and the median is 481. In this manner, the statistics for a given sample of figure 10e are found in the corresponding row of figure 10f.

Descriptive Statistics

We sent the data (statistics) from each of the samples across each of the sample sizes to R’s Descriptive statistics program (see chapter 4). jamovi found descriptive statistics for the means derived from samples of size 5, 10, 25, and 50. jamovi also found descriptive statistics for the medians, variances, standard deviations, and ranges based on samples of size 5, 25, and 50. Figure 10g shows the resulting descriptive statistics for the means based on different sample sizes. The descriptive statistics for the medians, variances, standard deviations, and ranges are shown in figures 10h, 10i, 10j, and 10k, respectively.

Histograms

To further illustrate the features of the mean, we constructed finite sampling distributions using R’s histogram capabilities (see chapter 4) for the 1000 samples of each sample size. These finite sampling distributions are displayed in the histograms of figure 101. For example, in figure 101, histogram A represents the sampling distribution of the mean based on 1000 samples with 5 units in each sample. Histogram B has 10 units per sample mean; histogram C has 25 units per sample mean; and histogram D has 50 units per sample mean.

Variance Bar plots

We also constructed variance bar plots for each of the statistics across each of the sample sizes. Figures 10m, 10n, 10o, and 10p show these variance bar plots for the means, medians, standard deviations, and ranges, respectively.

For example, figure 10m shows from left to right the four variance bar plots of the 1000 means based on samples of size 5, 10, 25, and 50. The small lines extending from each bar plot just above and below the mean of the statistic, referred to as “whiskers,” represent the standard errors of the statistic. For example, in figure 10m, the first whisker above and below the mean represents one standard error (\(\sigma\) / √n ) from the mean, and the second whisker above or below the mean represents two standard errors (2\(\sigma\) / √n ) from the mean. (Note that the variance bar plots were not made for the variances, whose descriptive statistics are shown in figure 10i, because the variances are so large they require rescaling.)

What Do These Tables and Figures Illustrate?

Unbiased Estimates: Mean and Variance

Statisticians have found that the mean and variance are unbiased estimates of their population parameters. That is, if we could take an infinite number of samples for a given sample size, we would find that the means of the sampling distributions of these statistics would be their population parameters. Since the statistics illustrated here are based on only 1000 samples, we do not find them to be exactly equal to their population values, but they are close (1000 samples sounds like a lot, but for this kind of research we typically use 10,000 or more).

For example, the mean of the population is known to be 499.5, and the sampling distribution means reported in figure 10g are 505.38, 472.64, 497.30, and 490.14 for samples of size 5, 10, 25, and 50, respectively. The population variance is known to be 83333.25; and the sampling distribution means reported in figure 10i are 69937.23, 81804.37, 76857.27, and 80170.70 for samples of size 5, 10, 25, and 50, respectively.

Unbiased Estimate: The Variance

In chapter 5, we found that the population variance was calculated using equation (5-3) as:

\[ \sigma_X^2 = \sum \frac {(X-\mu)^2} {N} \]

The sample variance was found using the estimator, equation (5-2), as:

\[ s_X^2 = \sum \frac {(X-M_X)^2} {n-1} \] Here, a natural question to ask is: Why not use n instead of (n-1) as the denominator of the sample variance? The reason is that if n is used as the denominator of the sample variance, the mean of the sampling distribution of such variances is not the population variance; that is, the sample variance found with n as the denominator is biased. To have an unbiased estimate of the population variance, the estimator must consist of the sum of deviation scores squared, divided by (n-1).

A Biased Estimate: The Standard Deviation

The estimator for the standard deviation (that is, the square root of the unbiased estimator of the variance) yields a biased estimate of its population parameter. Fortunately, the bias of the standard deviation is small and can be considered to be negligible when n is greater than 20. The equation for the unbiased estimate of the population standard deviation is:

\[ \text{unbiased } s = \left[1 + \frac {1} {4(n-1)} \right]*s \]

This estimator is rarely used, however, because of the slight difference between its estimates and those found by taking the square root of the sample variance. The population standard deviation of the discrete uniform distribution that we have been considering is 288.67. In figure 10j, the means of the sampling distributions of standard deviations are 255.74, 282.14, 276.17, and 282.85 for samples of size 5, 10, 25, and 50, respectively. These standard deviations are all reasonably close to the population value.

A Biased Estimate: The Range

The range is a biased estimate of its population value. This can be easily seen because the mean of the sampling distribution of the range is dependent upon the sample size. You can observe this relationship when you consider the means of the sampling distributions of the ranges for different sample sizes shown in figure 10k. In figure 10k, the means of the ranges increase from 624.63 to 957.00 as the sample size increases from 5 to 50. This is one reason why the range is not used as an estimate of its population parameter.

Figure 10e An example of the samples of size five which are drawn from a discrete uniform distribution

ID	Sample	Subject	Score
1	1	1	646
2	1	2	336
3	1	3	389
4	1	4	481
5	1	5	895
6	2	1	877
7	2	2	727
8	2	3	637
9	2	4	836
10	2	5	438
11	3	1	277
12	3	2	355
13	3	3	852
14	3	4	385
15	3	5	915
…	…	…	…
…	…	…	…
…	…	…	…
146	1000	1	276
147	1000	2	808
148	1000	3	647
149	1000	4	765
150	1000	5	564

Figure 10f A sample of 20 sample means, variances, standard deviations, ranges, and medians based on the corresponding 30 samples of size five as shown in figure 10e

Sample	Mean	Variance	SD	Range	Median
1	549.4	51179	226.228	559	481
2	703	30781	175.444	439	727
3	556.8	90994	301.652	638	385
4	451.6	50790	225.366	593	455
5	624.2	53789	231.924	539	555
6	739	47223	217.309	565	704
7	503	111688	334.197	764	374
8	541.6	71011	266.479	705	579
9	421.2	24368	156.103	396	389
10	253	63757	252.5	610	151
11	427.8	121012	347.868	858	366
12	263.4	22750	150.832	376	309
13	489.4	64700	254.363	625	391
14	219.6	48903	221.14	503	177
15	454	156216	395.241	920	312
16	401.2	45586	213.508	514	288
17	526.4	61298	247.585	665	472
…	…	…	…	…	…
…	…	…	…	…	…
…	…	…	…	…	…
998	744.8	44470	210.878	568	801
999	440	82129	286.582	718	408
1000	612	44563	211.098	532	647

[1] 1000

[1] 1

Figure 10g Descriptive statistics for 1000 sample means based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(mean05, mean10, mean25, mean50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                  
 ───────────────────────────────────────────────────────────────────────────── 
                              mean05        mean10      mean25      mean50     
 ───────────────────────────────────────────────────────────────────────────── 
   N                                1000        1000        1000        1000   
   Missing                             0           0           0           0   
   Mean                           505.18      497.35      498.00      498.13   
   Std. error mean                4.1113      2.8558      1.7978      1.2824   
   95% CI mean lower bound        497.11      491.75      494.48      495.61   
   95% CI mean upper bound        513.25      502.96      501.53      500.64   
   Median                         504.42      494.71      498.95      498.83   
   Standard deviation             130.01      90.309      56.852      40.552   
   Variance                        16903      8155.8      3232.2      1644.5   
   IQR                            181.82      126.42      75.283      53.512   
   Range                          701.49      490.69      426.49      266.88   
   Minimum                        153.20      245.34      269.65      369.05   
   Maximum                        854.68      736.03      696.15      635.93   
   Skewness                   -0.0069409    0.017420    0.012627    0.077119   
   Std. error skewness          0.077344    0.077344    0.077344    0.077344   
   Kurtosis                     -0.41671    -0.37078    0.078074    0.048292   
   Std. error kurtosis           0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W                0.99669     0.99656     0.99840     0.99882   
   Shapiro-Wilk p                0.03408     0.02743     0.48969     0.76603   
 ───────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10h Descriptive statistics for 1000 sample medians based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(median05, median10, median25, median50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              median05     median10    median25    median50   
 ──────────────────────────────────────────────────────────────────────────── 
   N                               1000        1000        1000        1000   
   Missing                            0           0           0           0   
   Mean                          507.33      497.82      496.05      497.27   
   Std. error mean               5.9243      4.3500      3.0085      2.1690   
   95% CI mean lower bound       495.70      489.28      490.15      493.01   
   95% CI mean upper bound       518.95      506.36      501.95      501.53   
   Median                        506.98      496.43      491.76      497.03   
   Standard deviation            187.34      137.56      95.138      68.588   
   Variance                       35097       18923      9051.2      4704.4   
   IQR                           289.02      209.14      133.48      94.076   
   Range                         933.47      793.53      573.63      402.54   
   Minimum                       24.191      98.627      211.33      305.58   
   Maximum                       957.66      892.16      784.96      708.12   
   Skewness                   -0.055444    0.040829     0.11089    0.088909   
   Std. error skewness         0.077344    0.077344    0.077344    0.077344   
   Kurtosis                    -0.73310    -0.41119    -0.20510    -0.11654   
   Std. error kurtosis          0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W               0.98933     0.99639     0.99757     0.99842   
   Shapiro-Wilk p              < .00001     0.02069     0.14606     0.50094   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10i Descriptive statistics for 1000 sample variances based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(var05, var10, var25, var50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                    
 ─────────────────────────────────────────────────────────────────────────────── 
                              var05        var10        var25        var50       
 ─────────────────────────────────────────────────────────────────────────────── 
   N                               1000         1000         1000         1000   
   Missing                            0            0            0            0   
   Mean                           82894        84184        83525        84190   
   Std. error mean               1318.7       820.09       510.39       344.59   
   95% CI mean lower bound        80306        82574        82523        83513   
   95% CI mean upper bound        85481        85793        84526        84866   
   Median                         81496        83094        82875        84627   
   Standard deviation             41700        25933        16140        10897   
   Variance                   1.7389e+9    6.7254e+8    2.6050e+8    1.1874e+8   
   IQR                            57779        35994        22042        14103   
   Range                         226376       160305       102012        64108   
   Minimum                       1803.9        20291        42198        54434   
   Maximum                       228179       180596       144210       118541   
   Skewness                     0.37698      0.16218      0.15533      0.10568   
   Std. error skewness         0.077344     0.077344     0.077344     0.077344   
   Kurtosis                    -0.18064     -0.28044     0.056525    -0.091433   
   Std. error kurtosis          0.15453      0.15453      0.15453      0.15453   
   Shapiro-Wilk W               0.98461      0.99543      0.99710      0.99733   
   Shapiro-Wilk p              < .00001      0.00436      0.06761      0.09904   
 ─────────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10j Descriptive statistics for 1000 sample standard deviations based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(sd05, sd10, sd25, sd50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              sd05        sd10        sd25        sd50        
 ──────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000         1000   
   Missing                           0           0           0            0   
   Mean                         277.35      286.49      287.63       289.54   
   Std. error mean              2.4452      1.4522     0.89116      0.59597   
   95% CI mean lower bound      272.55      283.64      285.88       288.37   
   95% CI mean upper bound      282.14      289.34      289.38       290.71   
   Median                       285.47      288.26      287.88       290.91   
   Standard deviation           77.325      45.923      28.181       18.846   
   Variance                     5979.1      2108.9      794.16       355.18   
   IQR                          103.90      62.490      38.251       24.375   
   Range                        435.21      282.52      174.33       110.99   
   Minimum                      42.473      142.45      205.42       233.31   
   Maximum                      477.68      424.97      379.75       344.30   
   Skewness                   -0.31898    -0.25021    -0.13913    -0.077364   
   Std. error skewness        0.077344    0.077344    0.077344     0.077344   
   Kurtosis                   -0.24573    -0.22036    0.019787     -0.12256   
   Std. error kurtosis         0.15453     0.15453     0.15453      0.15453   
   Shapiro-Wilk W              0.98943     0.99395     0.99724      0.99761   
   Shapiro-Wilk p             < .00001     0.00046     0.08495      0.15525   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10k Descriptive statistics for 1000 sample ranges based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(range05, range10, range25, range50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                
 ─────────────────────────────────────────────────────────────────────────── 
                              range05     range10     range25     range50    
 ─────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000        1000   
   Missing                           0           0           0           0   
   Mean                         667.46      824.18      924.56      961.44   
   Std. error mean              5.6594      3.3525      1.6232     0.84428   
   95% CI mean lower bound      656.36      817.61      921.37      959.79   
   95% CI mean upper bound      678.57      830.76      927.74      963.10   
   Median                       688.00      842.57      936.09      966.57   
   Standard deviation           178.97      106.02      51.330      26.699   
   Variance                      32029       11239      2634.8      712.82   
   IQR                          244.76      145.93      61.926      33.887   
   Range                        884.67      523.29      332.57      172.87   
   Minimum                      100.89      473.00      666.85      826.94   
   Maximum                      985.57      996.29      999.42      999.81   
   Skewness                   -0.51438    -0.77897     -1.2853     -1.2381   
   Std. error skewness        0.077344    0.077344    0.077344    0.077344   
   Kurtosis                   -0.30067     0.13049      1.9565      2.0920   
   Std. error kurtosis         0.15453     0.15453     0.15453     0.15453   
   Shapiro-Wilk W              0.97194     0.94964     0.90481     0.91424   
   Shapiro-Wilk p             < .00001    < .00001    < .00001    < .00001   
 ─────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10k(i) Descriptive statistics for 1000 sample MADs based on samples of size 5, 10, 25, and 50

  jmv::descriptives(
    data = data,
    vars = vars(mad05, mad10, mad25, mad50),
    variance = TRUE,
    range = TRUE,
    se = TRUE,
    ci = TRUE,
    iqr = TRUE,
    skew = TRUE,
    kurt = TRUE,
    sw = TRUE)


 DESCRIPTIVES

 Descriptives                                                                 
 ──────────────────────────────────────────────────────────────────────────── 
                              mad05       mad10       mad25       mad50       
 ──────────────────────────────────────────────────────────────────────────── 
   N                              1000        1000        1000         1000   
   Missing                           0           0           0            0   
   Mean                         298.59      336.62      356.70       366.81   
   Std. error mean              3.9416      3.1057      2.2723       1.5807   
   95% CI mean lower bound      290.86      330.52      352.24       363.71   
   95% CI mean upper bound      306.33      342.71      361.16       369.92   
   Median                       291.30      334.92      355.96       368.50   
   Standard deviation           124.64      98.212      71.856       49.986   
   Variance                      15536      9645.5      5163.3       2498.6   
   IQR                          181.16      145.01      99.431       65.639   
   Range                        617.35      543.67      444.58       285.62   
   Minimum                      20.421      75.288      149.58       218.81   
   Maximum                      637.77      618.96      594.17       504.43   
   Skewness                    0.25259    0.099745     0.10954    -0.080518   
   Std. error skewness        0.077344    0.077344    0.077344     0.077344   
   Kurtosis                   -0.51973    -0.51154    -0.13483    -0.092645   
   Std. error kurtosis         0.15453     0.15453     0.15453      0.15453   
   Shapiro-Wilk W              0.98843     0.99418     0.99814      0.99818   
   Shapiro-Wilk p             < .00001     0.00064     0.34452      0.36828   
 ──────────────────────────────────────────────────────────────────────────── 
   Note. The CI of the mean assumes sample means follow a t-distribution
   with N - 1 degrees of freedom

Figure 10l Histograms for 1000 sample means based on samples of size 5, 10, 25, and 50

Figure 10m Variance bar plots for 1000 sample means based on samples of size 5, 10, 25, and 50

Figure 10n Variance bar plots for 1000 sample medians based on samples of size 5, 10, 25, and 50

Figure 10o Variance bar plots for 1000 sample standard deviations based on samples of size 5, 10, 25, and 50

Figure 10p Variance bar plots for 1000 sample ranges based on samples of size 5, 10, 25, and 50

Figure 10q Variance bar plots for 1000 samples which illustrate that the sample mean is a more efficient estimator than the sample median

Consistent Estimates

The statistics shown in the tables and figures are all based on consistent estimators, and this fact is the most striking feature of these tables and figures. In all cases, as sample size increases, the variability of the sample estimates decreases.

This is vividly shown for all of the statistics on their variance bar plots. For example, the variance bar plots of the sample means in figure 10m shrink dramatically as the sample size upon which a given mean is based increases. These bars reflect the sampling variances in figure 10g of 23092.73, 6037.74, 4158.55, and 2055.69 for means based on samples of size 5, 10, 25, and 50, respectively (remember that the variance of the sample means is called the “variance of the mean”). These variances can also be seen in the error bar plots in Figure 10m.

The sampling distributions shown in the histograms of figure 10l illustrate the consistency of the sample mean by having fewer bars with larger frequencies (that is, less spread) as the sample size increases. For example, in figure 10l, there are 9 bars when the sample size is 5, but in the sampling distribution based on 50 units per sample, we have only 4 bars and two bars dominate the others with frequencies that are greater than or equal to 12.

Efficient Estimates: Mean Versus Median

In a uniform distribution, both the population mean and the population median are equal. Therefore, you might ask: In a uniform distribution, should one use the estimate of the mean or of the median to measure the center of the distribution? Since both the mean and the median are consistent and unbiased estimates, the answer to this question is found when you consider the relative efficiency of these two statistics.

Statisticians have shown that for symmetric distributions the sampling distribution of the mean has a smaller standard deviation than does the sampling distribution of the median. That is, the standard error of the mean is smaller than the standard error of the median. This fact is vividly displayed using the variance bar plots shown in figure 10q. In figure 10q, the first four variance bars are based on sample means, and the second four variance bars are based on sample medians. You can see that for both statistics the variance bar plots decrease as the sample size increases. For the same sample sizes, however, the variance bar plots of the means are always smaller than the variance bar plots of the medians.

Inefficient Estimation: The Median

For a symmetric population distribution the sampling distribution of the mean will always have a smaller standard estimate than will the sampling distribution of the median. For this reason, the mean should be used when the population distribution is symmetric. In a skewed distribution, however, the mean, even with its smaller standard error, provides a “false” impression of the center of the distribution. In this case, the median, because it is actually in the center of the scores, may be regarded as providing more useful information. (The terms false and useful require further definition, which is beyond the scope of this book.)

SUMMARY

This chapter explained how to acquire a random sample of units. Drawing a random sample using slips of paper and a fishbowl (or some other container) frequently leads to nonrandom samples because it is difficult to mix the slips of paper so they can be considered random. A better method is to use a table of random numbers, although this becomes an arduous task with large samples. jamovi can generate numbers at random.

Two methods of sampling used by statisticians were also discussed. The method used most often in practice was referred to as sampling without replacement. Using this method, once a number is drawn it is not replaced into the population. Therefore, in using sampling without replacement a sample does not contain repeats of a random number. The second method of sampling was referred to as sampling with replacement. Using this method each time a number is chosen it is replaced in the population and therefore could be chosen again. Sampling with replacement usually yields samples with repeats of numbers.

Next, a theoretical probability distribution called the sampling distribution was explained. A sampling distribution is a probability distribution of a given statistic, where the statistic is calculated on samples of a given size. Examples demonstrated the important role the sampling distribution plays as a basis for statistical testing. This role is discussed in detail in the next chapter. This chapter focused on the sampling distribution’s role in helping to illustrate criteria that are used to judge estimates of population parameters. Statistics are frequently evaluated to see if they are unbiased, consistent, efficient, and/or sufficient. These properties were defined and the first three were illustrated.

Chapter 10 Appendix A Study Guide for Z Statistics

Z Statistics

No output is needed here, but you will need a standard normal distribution table or calculator.

What is the critical value for the Z statistic if you want to use a two-tailed level of significance of .05?
What is the critical value for the Z statistic if you want to use a one-tailed level of significance of .05?
What is the probability of getting the following Z statistics, or larger, as an absolute value (that is, that far or farther away from ZERO in both directions; that is, below -Z and above +Z) if the null hypothesis is true?

Z = 2.2
Z = 1.7

Calculate the Z statistic from the following given information:

the null hypothesis is H0: \(\mu\)=150
the known population standard deviation \(\sigma=16\)
the sample mean was 155
the sample size was 64

Calculate the 95% confidence interval around the sample mean in the previous item using the standard normal distribution probabilities (i.e., finding the appropriate Z critical value and using known population parameters).

Citation

Please cite as:
Barcikowski, R. S., & Brooks, G. P. (2025). The Stat-Pro book:
A guide for data analysts (revised edition) [Unpublished manuscript].
Department of Educational Studies, Ohio University.
https://people.ohio.edu/brooksg/Rmarkdown/

This is a revision of an unpublished textbook by Barcikowski (1987).
This revision updates some text and uses R and JAMOVI as the primary
tools for examples. The textbook has been used as the primary textbook
in Ohio University EDRE 7200: Educational Statistics courses for 
most semesters 1987-1991 and again 2018-2025.