In this chapter, we will try to discover what John Tukey, a well-known statistician meant when he said:” A summary picture is worth a thousand summary words.” Our objective in this chapter is to learn graphical techniques that will help us further understand our data. We will concentrate on creating and interpreting several types of graphs. It’s a good idea to use one or more of these graphics as a routine part of any data analysis you perform. That is, while you may not include every graph you produce in your written report of results, they are all valuable in helping you understand your data and your results better as an analyst.
A number of jamovi functions provide relevant graphs as part of their output (sometimes by default, but sometimes only through requesting the appropriate option). For example, among the procedures we will discuss the most, here are some graphs possible:
Types of Graphs Provided
In this chapter we will look at graphs that are available in basic jamovi (and also available in R). There are additional modules that can be added to jamovi that provide numerous additional graphical options, such as vijPlots, JJStatsPlot, FlexPlot, surveymv. We have provided some plots available from esci in this chapter, but there are others available. One very nice plot, called a raincloud plot, is available in surveymv, but not available using that package in R.
With charts and graphs, just like in your statistical analyses, you need to pay attention to the level of measurement of your variables. Some graphs just make no sense… Some graphs in this section are most appropriate when you have nominal variables, some for ordinal variables with relatively few categories (e.g., not an ordinal variable that provides the ranking of 450 students in a high school graduating class), and some for scale variables.
With several of the graphing procedures, you must pay attention to whether you are graphing variables or groups. That is, there is an option for many of the graph procedures to choose between “groups of cases” and “separate variables.” Even if you have just one variable, you will typically need to choose “separate variables” in order for the graph to work.
Many of the procedures also have options for “simple” or “clustered” data. The clustering provides additional complexity to the graph, but may be useful in some circumstances. Similarly, several of the procedures allow “paneling” that shows multiple copies of the same graph across multiple groups. Such graphs can usually be paneled by rows or columns. The nice thing about paneling is that the axes match automatically, making it much easier to make comparisons across groups.
We will use the Chapter 4 Cholesterol data for many of the examples below. Recall that CHOL was the cholesterol score, HT was height measured in inches, and GROUP was whether they were prescribed an herbal supplement pill to help lower their cholesterol. We will also use the X RATING and Y RATING variables from the student teacher rating data for two student teachers (student teacher X and student teacher Y) in Chapter 5. The headings for each of the graphs displayed below indicate what types of variables were used in the example. Each example also provides the commands used.
data = data,
vars = Pill_Group,
freq = TRUE,
bar = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
Frequencies of Pill_Group
Pill_Group Counts % of Total Cumulative %
1 Yes-Pill 20 50.000 50.000
0 No-Pill 20 50.000 100.000
data = data,
vars = Height,
bar = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
formula = Height ~ Pill_Group,
bar = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
formula = Pill_Group ~ Height,
bar = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
Note that when two interval or ratio variables have very different scales of measurement, it may not be appropriate to graph them together. Sometimes doing so results in confidence intervals like those for the HT variable. This will be true for many graphs of variables measured on very different scales.
data = data,
formula = Cholesterol ~ Pill_Group,
desc = "rows",
bar = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
vars = Cholesterol,
box = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
vars = Cholesterol,
box = TRUE,
dot = TRUE,
dotType = "stack",
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
vars = Cholesterol,
box = TRUE,
violin = TRUE,
dot = TRUE,
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
formula = Cholesterol ~ Pill_Group,
box = TRUE,
violin = TRUE,
dot = TRUE,
dotType = "stack",
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
formula = Scores ~ Rating,
box = TRUE,
dot = TRUE,
dotType = "stack",
boxMean = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
vars = Cholesterol,
students = FALSE,
testValue = 200,
plots = TRUE)
One Sample T-Test
Note. Hₐ μ
≠ 200
Note that the one group, one variable Error Bar Plot is just not very interesting…
data = data,
outcome_variable = Cholesterol,
alpha = "0.05")
| | | 0% | |......................................................................| 100%
Outcome variable <i>M</i> LL UL <i>Mdn</i> <i>s</i> <i>N</i> Missing
Cholesterol 223.65 209.24 238.06 219.00 45.064 40 0
| | | 0% | |......................................................................| 100%
data = data,
pairs = list( list(
students = FALSE,
plots = TRUE)
Paired Samples T-Test
Note. Hₐ
μ <sub>Measure
1 - Measure
2</sub> ≠ 0
data = data,
formula = Cholesterol ~ Pill_Group,
vars = Cholesterol,
students = FALSE,
plots = TRUE)
Independent Samples T-Test
Note. Hₐ μ <sub>1
Yes-Pill</sub> ≠
μ <sub>0
data = data,
vars = Cholesterol,
hist = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
Note that there are many ways to create these graphs and they do not always necessarily look the same… Compare this to the next histogram…
data = data,
outcome_variable = Cholesterol,
show_details = FALSE,
mark_mean = FALSE,
mark_median = FALSE,
mark_sd = FALSE,
mark_quartiles = FALSE,
mark_z_lines = FALSE,
mark_percentile = "0",
histogram_bins = "12",
es_plot_width = "500",
es_plot_height = "400",
ymin = "auto",
ymax = "auto",
breaks = "auto",
xmin = "auto",
xmax = "auto",
xbreaks = "auto",
ylab = "auto",
xlab = "auto",
axis.text.y = "14",
axis.title.y = "15",
axis.text.x = "14",
axis.title.x = "15",
fill_regular = "#008DF9",
fill_highlighted = "#E20134",
color = "black"
Outcome variable <i>M</i> <i>Mdn</i> <i>s</i> Minimum Maximum 25th 75th <i>N</i> Missing
Cholesterol 223.65 219.00 45.064 146.00 330.00 193.75 255.00 40 0
| | | 0% | |......................................................................| 100%
| | | 0% | |......................................................................| 100%
data = data,
formula = Cholesterol ~ Pill_Group,
hist = TRUE,
n = FALSE,
missing = FALSE,
mean = FALSE,
median = FALSE,
sd = FALSE,
min = FALSE,
max = FALSE)
data = data,
aVar = Cholesterol,
group = NULL,
facet = NULL,
colorPalette = "jmv")
data = data,
aVar = Cholesterol,
group = NULL,
facet = NULL,
normalCurve = TRUE,
binWidth = 20,
binBoundary = 0,
colorPalette = "jmv")
data = data,
aVar = Cholesterol,
group = NULL,
facet = Pill_Group,
normalCurve = FALSE,
binWidth = 20,
binBoundary = 0,
colorPalette = "jmv")
data = data,
aVar = Cholesterol,
group = NULL,
facet = Pill_Group,
histtype = "density",
normalCurve = TRUE,
binWidth = 20,
binBoundary = 0,
colorPalette = "jmv")
data = data,
vars = Cholesterol)
data = data,
vars = Cholesterol,
group = Pill_Group)
data = data,
x = 'Weight',
y = 'Cholesterol')
data = data,
x = 'Weight',
y = 'Cholesterol',
line = 'linear',
se = TRUE,
marg = 'dens') #box #dens
data = data,
x = 'Weight',
y = 'Cholesterol',
group = 'Pill_Group',
marg = 'box',
line = 'linear')
data = data,
x = 'Pill_Grp',
y = 'Cholesterol',
line = 'linear')
Note that this nominal-by-scale variable scatterplot ONLY makes sense when the nominal variable is numeric and binary (i.e., dichotomous, two-group). This is also the only time a correlation calculated for nominal and scale variables makes sense.
data = data,
x = 'Height',
y = 'Weight',
line = 'linear')
Note that, if you look carefully, this ordinal-like–by-scale variable scatterplot shows vertical “lines” of dots at each of the discrete values in the more ordinal-like variable (Height). So when you see lines like that in your graphs it is probably for this reason. In some scatterplots they will show up diagonally instead of vertically or horizontally. Sometimes it is more or less obvious.
data = data,
vars = vars(Height, Cholesterol, Weight, Pill_Grp),
sig = FALSE,
plots = TRUE,
plotDens = TRUE,
plotStats = TRUE)
Correlation Matrix
Height Cholesterol Weight Pill_Grp
Height —
Cholesterol -0.01077 —
Weight 0.25085 0.68236 —
Pill_Grp 0.13194 -0.09551 -0.05055 —
Note that here we have a numeric binary variable (Pill_Grp) and three scale variables. The same warning as above applies: a nominal-by-scale variable scatterplot ONLY makes sense when the nominal variable is binary (i.e., dichotomous, two-group). This is also the only time a correlation calculated for nominal and scale variables makes sense.
But whe we use the binary variable as a factor (where we have told jamovi it is a categorical nominal/ordinal variable), we get a different look to the matrix: paneled results by group (and this will work for multiple groups).
data = data,
vars = vars(Weight, Cholesterol, Pill_Group),
sig = FALSE,
plots = TRUE,
plotDens = TRUE,
plotStats = TRUE)
Correlation Matrix
Weight Cholesterol Pill_Group
Weight —
Cholesterol 0.68236 —
Pill_Group NaN ᵃ NaN ᵃ —
ᵃ Pearson correlation cannot be calculated for
non-numeric values
A large number of common graphs that can be produced were shown in this chapter.