Notes on Modern Physics and Ionizing Radiation


II . The Analysis of Experimental Data


Previous   Next


Scientific experiments, whether instructional exercises or original research, routinely involve the measurement of multiple quantities. Typically, some of those quantities are controlled by the experimenter (the "causes" or "independent variables") while others are simply observed (the "effects" or "dependent variables"). In many situations, the connections among the directly measured quantities are most readily understood by comparing values that can be calculated from them. For example, you would expect the power delivered by a single-cylinder gasoline engine to be closely related to the displacement volume of the cylinder, but you may have directly measured only the diameter and stroke length. When we use mathematics to infer a quantity from direct measurements of other quantities, we often describe the result as an "indirect measurement."

The heart of any measurement process is deciding whether the quantity being measured is greater than, equal to, or less than some reference value. The simplest type of measurement involves the direct physical comparison of a standard to the unknown quantity (e.g., measuring the length of an object with a ruler, or the temperature with a conventional mercury thermometer). Such "analog" measurements provide the experimenter with an immediate, intuitive feel for their uncertainty, because the experimenter must directly judge the comparison between the object and the standard. The markings on the scale will have some non-zero width, the object may not have well-defined boundaries (where is the edge of a cotton ball, anyway?), and, depending on the spacing between the finest markings on the scale and the visual aquity of the experimenter, it may well be possible to honestly estimate more precisely than the scale is marked (although rarely more precisely than to the tenth of the finest markings).

The raw data of an experiment are sometimes digital in nature: counted radioactive decays, response bar pressings by a rat, etc. In such cases, provided that the apparatus functions properly, there will be no doubt about the number observed for a specific experiment (although, as we will discuss below, a different number might be observed if the experiment were repeated). Often, however, the data are analog signals, such as the temperature, which is converted by some suitable transducer into a voltage or current proportional to the physical quantity being measured, and then converted by an electronic analog-to-digital-converter ("ADC") into a number that measures the electrical quantity in some suitable units. This may be done with custom electronics or with a commercial digital meter. In either case, the uncertainty in the result is less likely to be intuitively obvious to the experimenter than the uncertainly of an analog measurement. We expand on this topic in section H.

Laboratory work provides more than just practical experience working with the equipment. Critical in the education of a scientist is the development of an understanding of the limitations of experimental results. The first stage of this error analysis is the identification of sources of errors in measured quantities, and their relative impact on the possible conclusions. The second stage is using the "rules of thumb" for significant figures. (It is wise to carry at least one extra figure through all intermediate calculations, rounding-off to the correct number of significant figures only at the end. This practice of using a "guard digit" reduces the compounding of round-off errors with the unavoidable experimental fluctuations, and in this day of cheap calculators adds little to the work or to the likelihood of mistakes.) In the third stage of error analysis, more exact propagation of error estimates is made in calculations, notebooks, and written lab reports, and the reports indicate first, what specific steps, not part of the original statement of the experiment, were taken to reduce the impact of the major sources of error, and second, suggest additional steps for others to use.

The focus of this chapter is basic statistical error analysis, the methods used to analyze data and to communicate honestly the quality of experimental results. The estimation of systematic errors can be refined with the aid of statistical techniques, but it remains a subject calling for educated judgement. The mathematical and philosophical rationale for some of the methods is discussed. No attempt is made to familiarize the student with particular computer routines for performing the calculations.

The references cited throughout this chapter are given in full in the Bibliography.


A. Distributions and True Values

The crux of modern science is the reproducible experiment. By this we do not mean that the numerical results would be exactly the same every time, if the experiment were repeated, but rather that the variations would be consistent with the idea of random (statistical) errors. Thus, repeated measurements of the same quantity mostly yield results close to the "true value," and only rarely yield results that are very different. Random errors must be distinguished from sheer mistakes and from systematic errors. There are two possible origins of random errors: the unavoidable variations in judging instrumental readings while measuring a well-defined quantity, and "variations that arise from chance associated with the sampling of a (non-uniform) population or of any random variable" (Rajnak, 1976). The word "error" has three primary meanings in the present context: first, the deviation of an individual measurement from the "true value," second, the uncertainty of a result, and third, the discrepancy between a result and the "true value." Usually we will intend the second meaning, but do be alert to judge from context which of the three is intended.

One way of presenting the results of a series of measurements is the "histogram," or "spectrum": the range of possible results is split evenly into intervals; the result of each measurement is tallied to the interval containing it; finally, the tally is graphed, as shown in Fig. 1a. The "distribution curve" for an experiment is the continuous curve that we expect as the limiting histogram when more and more measurements are taken and the results are shown tallied and graphed in smaller and smaller intervals; see Fig. 1b.

Many measurements yield distribution curves that are smooth, single-peaked, and symmetrical. Sometimes the curve is fairly accurately described by the standard bell-shaped curve, the "Gaussian" or "normal" distribution. Often, however, the distribution curve that describes the results will have a more complicated form. Any experiment that involves counting radioactive decays, for example, will have a distribution curve that is not symmetrical, since there is a smallest number of counts that can be observed in any given period of time: zero! (See Appendix D.) The process of digitizing an analog signal (see section H) involves rounding-off, which exhibits a flat-topped distribution, one of the few such in experimental science.

   

Figure 1: (a) Histogram                (b) Distribution

If the distribution curve is symmetrical we have no problem agreeing that the true value for the measurement is the central value. On the other hand, it is not so obvious how to identify the true value for an asymmetrical distribution. The three common "measures of central tendency" are the mode (Fig. 2a), the median (Fig. 2b), and the expectation value (Fig. 2c). The mode is the most likely result, namely that value for which the distribution curve is at its peak. The median is the value that splits the area under the distribution curve equally. (It is an even chance whether any given result will be above or below the median.) The expected value is just the limit (as the number of measurements increases to infinity) of the average value, defined in the usual way as the sum of the results divided by their number; in other words the expectation is the "population mean." We ordinarily regard the expectation value as the best indication of "central tendency" for a distribution, and will therefore be interested in the average of the measurements actually made.

(a) Mode                         (b) Median                         (c) Expectation

Figure 2: Which is the true value?


B. Confidence Intervals and Precision

The "precision" of a measurement is essentially the width of the peak in the distribution curve, and the "relative precision," where that idea is used, corresponds to the width of the curve divided by the true value (often expressed as a percentage). There are several ways to describe mathematically the width of the curve: the "full width at half maximum" (Fig. 3a), the "probable error" (Fig. 3b), and the "standard deviation" (Fig. 3c) are three very common methods. Each of the width parameters is defined using the distribution curve, which describes an infinite number of measurements. The probable error may be defined as half of the narrowest full width that encompasses half the area under the distribution, as shown by the speckled area in the central part of Fig. 3(b). This will be found as a region with end points at which the distribution is equally high.

(a) F.W.H.M.                         (b) P.E.                         (c) S.D.

Figure 3: Parameters for Distribution Width

The definition of the distribution (or "population") standard deviation, , (sigma) is "the square root of the average value of the squares of the deviations of the data from the true value," as shown in Eq. 1:

This is the distribution's root mean square (rms) deviation. In some cases you will be interested in the rms deviation of your data set, and in other cases you will want to estimate the distribution standard deviation, based on your data. Although the FWHM and the PE are always reasonable, there are experimental situations in which the true distribution curve gives an infinite value for the standard deviation!

Typical experimental situations involve so few measurements that the distribution curve is only roughly suggested. When analyzing experimental data, therefore, you can determine only estimates for the expectation value or any of the width parameters. The usual practice is to quote the result of the measurements as a single number, the "best estimate of the true value," and to give some indication of the width of the distribution. The two common formats are to specify a tolerance and to specify upper and lower limits for the true value. For example:

or equivalently,