4 Reporting

Using papaja, it is possible to run all your analyses from within an R Markdown document. Any output from R is included as you usually would using R Markdown. By default the R code will not be displayed in the final documents. If you wish to show off your code you need to set echo = TRUE in the chunk options.

This section describes how you can include the results of your analyses can be included in text. It starts with simple numerical values (like a sample mean) and then continues with more-complex output, like figures, tables, and the results from statistical models and hypothesis tests.

4.1 Numerical values

Consider a basic example, where you want to include a mean and a standard deviation into text: printnum() can be used to include numeric values in the text. The function performs the rounding and optionally omits leading zeros in accordance with APA guidelines:

Participants mean age was `r printnum(age_mean)` years ($SD = `r printnum(age_sd)`$).

The above line is rendered as follows:

Participants mean age was 22.84 years (\(SD = 3.78\)).

printnum() is quiet flexible and passes its arguments to the underlying base function formatC() (CAVE: We are currently considering to switch to format().), so changing the number of digits after the decimal point or the character to be used to indicate the decimal point (e.g., from a point "." to a comma ",") is always possible.

papaja automatically applies printnum() to any numbers that a printed in inline code chunks. Hence, `r age_mean` and `r printnum(age_mean)` are effectively identical and both yield “22.84”.

papaja additionally provides a shorthand version of printnum(), namely printp(), with defaults for correct typesetting of \(p\) values (e.g., \(p < .001\) instead of \(p = 0.000\)).

Typeset numerical values for greater control - printnum() - printp() as shorthand for p values

printnum(c(143234.34557, Inf))
## [1] "143,234.35" "$\\infty$"
printnum(42L, numerals = FALSE, capitalize = TRUE)
## [1] "Forty-two"
printp(c(1, 0.0008, 0))
## [1] "> .999" ".001"   "< .001"

4.2 Figures

As in any R Markdown document, you can include figures in your document. Figures can either consist of plots generated in R or external files. In accordance with APA guidelines, figures are not displayed in place but are deferred to the final pages of the document. To change this default behavior set the option figsintext in the YAML front matter to yes.

Figures are saved as 300 dpi PDF and PNG

  • e.g., my_manuscript_figures/figure-latex/chunk-name-1.pdf
  • Change defaults using chunk options (dev and dpi)

4.2.1 R-generated figures

Any plot that is generated by an R code chunk is automatically included in your document. Hence, a simple call to plot() is sufficient to create a figure:

(ref:simple-base-plot) A basic scatterplot of the cars dataset.


Figure 4.1: (ref:simple-base-plot)

Figure display size, resolution, file format and many other things can be controlled by setting the corresponding chunk options. Refer to the plot-related knitr chunk options for an overview of all options. In papaja-documents, by default, all figures as saved as vectorized PDF and pixel-based PNG files at a resolution of 300 DPI, which should in most cases be sufficient for print publication. When the target document format is PDF the vectorized PDF files are included; when the target document format is DOCX the pixel-based PNG files are included. The files can be found in the mydocument_files folder that is generated when mydocument.Rmd is knitted.

Note, if you define transparent colors in your plots (e.g., when you defining colors using rgb()), the rendered document will always display the pixel-based PNG files, regardless of the target document format. Given the lossless compression in PNG and the decent resolution of 300 DPI this should not be a problem in print. papaja plot functions

Factorial designs are ubiquotous in experimental psychology but notoriously cumbersome to visualize with R base graphics. That is why papaja provides a set of functions to facilitate plotting data from factorial designs.

The functions apa_beeplot(), apa_lineplot(), apa_barplot(), and the generic apa_factorial_plot() are intended to provide a convenient way to obtain a publication-ready figure while maintaining enough flexibility for customization. The basic arguments to these functions are all the same, so you only need to learn one syntax to generate different types of plots. Moreover, it is easy switch between the different types of plots by changing the name of the called function.

Consider the following example of a so-called beeswarm plot:

(ref:beeswarm-caption) An example beeswarm plot. Small points represent individual observations, large points represent condition means, and error bars represent 95% confidence intervals.

  data = mixed_data
  , id = "Subject"
  , dv = "Recall"
  , factors = c("Task", "Valence", "Dosage")

Figure 4.2: (ref:beeswarm-caption)

A data.frame that contains the data in long format is passed to the data argument. The other arguments expect the names of the columns that contain the subject identifier id, the dependent variable dv, and the between- or within-subjects factors of the design. Currently, zero to four factors are supported.

For each cell of the design, the functions plot

  1. central tendency,
  2. dispersion, and
  3. names of dependent variable, factors and levels, and optionally a legend.

Mesures of central tendency and dispersion default to the mean and 95% confidence intervals. However, the arguments tendency and dispersion can be used to overwrite these defaults and plot other statistics. For example, to plot within-subject rather than between-subject confidence intervals you can set dispersion = within_subjects_conf_int. If the data encompass multiple observations per participant-factor-level-combination, these observations are aggregated. By default, the condition means are computed but you can specify other aggregation functions via the fun_aggregate argument.

The visual elements of the plots can be customized by passing options to the arguments args_points, args_error_bars, args_legend, and so on. These arugments take named lists of arguments that are passed on to the respective graphics functions, such as points(), arrows(), and legend(). Consider the following code and the resulting Figure 4.3 for a customized version of the previous beeswarm plot.

(ref:customized-plot-caption) A customized beeswarm plot. Note that, by default, the swarm inherits parameters from the args_points argument. Small points represent individual observations, large points represent means, and error bars represent 95% within-subjects confidence intervals.

    data = mixed_data
  , id = "Subject"
  , dv = "Recall"
  , factors = c("Task", "Valence", "Dosage")
  , dispersion = within_subjects_conf_int
  , main = c(expression(italic(A)~"dosage"), expression(italic(B)~"dosage"), expression(italic(C)~"dosage"))
  , ylab = "Number of correctly recalled words"
  , args_points = list(cex = 2, bg = c("skyblue4", "indianred3", "seagreen4"), col = "white")
  , args_swarm = list(cex = 1.2)
  , args_error_bars = list(length = 0.02, col = "grey20")
  , args_legend = list(x = "bottom", inset = 0.05)
  , args_y_axis = list(las = 1)

Figure 4.3: (ref:customized-plot-caption)

  • Utilize variable labels

These plot functions also utilize variable labels, with some \(\LaTeX\) math support (see ?latex2exp::TeX)

variable_labels(cosmetic_surgery) <- c(
  Post_QoL = "Quality of life post surgery ($\\bar{x}_{post}$)"

  id = "ID"
  , dv = "Post_QoL"
  , factors = c("Reason", "Surgery", "Gender")
  , data = cosmetic_surgery
  # , ylab = "Quality of life post surgery"
  , las = 1
  , args_legend = list(x = 0.25, y = 30)
  , args_points = list(bg = c("skyblue2", "indianred1"))
  , args_error_bars = list(length = 0.03)
) Setting global plot options

As noted in the APA guidelines (p. 153) figures in an article should be consistent and in the same style. Hence, it is desireable to define plot styles once rather than every time a new plot is created. When using graphics plot functions global options can be set using par(). For example, the size of labels and symbols can be controled with cex:

(ref:par-in-chunk) A variant of the example beeswarm plot with larger labels and symbols.

par(cex = 1.2)
  data = mixed_data
  , id = "Subject"
  , dv = "Recall"
  , factors = c("Task", "Valence")

Figure 4.4: (ref:par-in-chunk)

However, the use of par() is restricted in knitr.

By default, knitr resets par() for every chunk; par() needs to be called anew in every chunk that creates a plot.

While there are advantages to this behavior it impedes defining a consistent plot style. Although not very convenient, it is possible to set global plot options in knitr and therefore also in papaja. The following example illustrates how to make knitr execute a specified call to par() before executing code in a given chunk. Here we rotate the \(y\)-axis label (las = 1), sets a regular font and increases the size of the main title (font.main = 1, cex.main = 1.05).

# Ensure that par()-settings are retained across chunks
knitr::opts_knit$set(global.par = TRUE)

# Define a function that sets desired global plot defaults
par(font.main = 1`, `cex.main = 1.05, las = 1)

The customizations apply to all subsequently generated plots. It is, thus, advisable to add this code at the top of the R Markdown document shortly after the YAML front matter.

These same limitations do not apply to users of ggplot2. Custom themes can be set as default as usual and are subsequently applied to all plots created with ggplot(). papaja provides a template theme_apa() that we feel is well suited to create publication-ready plots. The following example demonstrates how to set a default theme.

# Cosmetic surgery dataset from
# Field, A. P., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage.

    , aes(x = Base_QoL, y = Post_QoL, color = Reason)
  ) +
  geom_point() +
  geom_smooth(method = "lm") +
    x = "Baseline quality of life"
    , y = "Quality of life post surgery"
  ) +
  scale_color_brewer(palette = "Set1") +
  theme_apa(box = TRUE) +
  theme(legend.position = c(0.2, 0.8))


4.2.2 External files

4.2.3 Figure captions

The chunk option fig.cap sets a figure caption. In principle, it is possible to define a figure caption simply by passing a character string to fig.cap. However, we strongly recommend that you use text references—a feature of the bookdown package—instead. A text reference consist of a unique label and the corresponding text that it represents. Text references can be defined as follows:

This is a figure caption for my figure.

The text reference can now be inserted in the chunk options (or anywhere else), e.g. fig.cap = "(ref:my-figure-caption)". The use of text references has several advantages:

  1. The use of Markdown and \(\LaTeX\) syntax is not well supported in fig.cap. For example, \ and _ must be escaped to prevent either R or \(\LaTeX\) from erroring. Moreover, Markdown formatting and citation syntax is not supported at all. None of these limitations apply to text references.
  2. Long figure captions can impair the readability of the document when they are added directly to the chunk options.
  3. Text references are subject to RStudio spell checking, whereas the contents of fig.cap is not.
  4. Changing text reference does constitute a change to the code chunk and hence does not cause the cache of a chunk to be discarded.
  5. Text references can include inline code chunks.

We recommend to define the text reference for a figure caption just above the corresponding code chunk.

The figure caption defined in fig.cap is used for all figures that are created in the respective chunk. If multiple figures are created the caption will be duplicated. Unless this behavior is desired, only one figure should be created per chunk.
  • Caption is used for every figure in chunk
  • Only one figure per chunk
  • Combine multiple plots into one figure (e.g., layout())

If a given figure is tall or the figure caption is extensive, it may flow of the page of a PDF document. There are two approaches to accomodate long figure captions or tall figures. You can adjust the line spacing/font size of the caption or use a separate list of figure captions. Line spacing and font size

As mentioned Adjusting line spacing, you can adjust the line spacing but also the font size of figure captions. For single-spaced script-size figure captions, add the following to the YAML front matter:

  - \usepackage{setspace}
  - \captionsetup[figure]{font={stretch=1,scriptsize}}

For other font size options see the LaTeX Wikibook. List of figure captions

To add a list of figure captions after the reference section, add the following to the YAML front matter:

figurelist: yes

Additionally you can suppress the captions below all figures by adding LaTeX code to the document preamble:

  - \captionsetup[figure]{textformat=empty}

If you would rather suppress figure captions only where necessary you can instead use knitr to do this. To remove the caption below a figure set fig.cap = " " and instead set the figure short caption via the chunk option fig.scap = "My figure caption":

(ref:figure-caption) This is a long figure caption!

```{r fig.cap = " ", fig.scap = "(ref:figure-caption)", out.width = "\\textwidth"}
To ensure that fig.scap takes effect, knitr requires that the chunk specifies out.width, out.height, or fig.align, as explained here.

4.3 Tables

For prettier tables, I suggest you try apa_table(), which builds on knitr’s kable().

(ref:mixed-table-caption) Descriptive statistics of correct recall by dosage.

(ref:mixed-table-note) This table was created with apa_table().

descriptives <- mixed_data %>% group_by(Dosage) %>%
    Mean = mean(Recall)
    , Median = median(Recall)
    , SD = sd(Recall)
    , Min = min(Recall)
    , Max = max(Recall)
descriptives[, -1] <- printnum(descriptives[, -1])

  , caption = "(ref:mixed-table-caption)"
  , note = "(ref:mixed-table-note)"
  , escape = TRUE

Of course popular packages like xtable3 or tables can also be used to create tables when knitting PDF documents. These packages, however, cannot be used when you want to create Microsoft Word documents because they rely on \(\LaTeX\) for typesetting. apa_table() creates tables that conform to APA guidelines and are correctly rendered in PDF and Word documents. But don’t get too excited. In papaja, table formatting is somewhat limited for Word documents due to missing functionality in pandoc (e.g., it is not possible to have cells or headers span across multiple columns).

As required by the APA guidelines, tables are deferred to the final pages of the manuscript when creating a PDF. To place tables and figures in your text instead, set the figsintext parameter in the YAML header to yes or true, as I have done in this document. Again, this is not the case in Word documents due to limited pandoc functionality. The bottom line is, Word documents will be less polished than PDF. The resulting documents should suffice to enable collaboration with Wordy colleagues and prepare a journal submission with limited manual labor.

apa_table() extends knitr::kable()

  • Designed to create table designs from the APA manual
  • Does not perform any calculations
  • Can merge multiple data.frame of the same structure contained in a list
  • Much more powerful in PDF documents
    (see apa_table_example.Rmd)
  • Remember to set chunk option results = "asis"!

kableExtra is another powerful alternative

4.4 Results from statistical tests

Output from statistical models and tests needs to be properly formatted before it can be included in text. For this purpose, we created the function apa_print(). Simply drop the results from a statistical test (e.g. a \(t\) test or an ANOVA) into apa_print(), and you will get an output object (a list) that contains all you need to report the analysis according to APA guidelines. apa_print() currently supports output objects from a variety of statistical models/tests, including \(t\) tests, ANOVA, linear regression. For a full list of supported output objects, see Table ??.

afex_aov BFBayesFactorList* list summary.aovlist
anova BFBayesFactorTop* lm summary.glht*
Anova.mlm emmGrid* lsmobj* summary.glm
aov glht* summary_emm* summary.lm
aovlist glm summary.Anova.mlm summary.ref.grid*
BFBayesFactor* htest summary.aov

* Not fully tested, don’t trust blindly!

Process diagram illustrating the use of `apa_print()` methods and `apa_table()` to format results from statistical analyses according to APA guidelines. The formatted text and tables can be reported in a manuscript.

Figure 4.5: Process diagram illustrating the use of apa_print() methods and apa_table() to format results from statistical analyses according to APA guidelines. The formatted text and tables can be reported in a manuscript.

4.4.1 htest (t tests, correlations, etc.)

The results of a \(t\) test (regardless of the exact type of \(t\) test) are typically stored in an htest object.

# Cosmetic surgery dataset from
# Field, A. P., Miles, J., & Field, Z. (2012). Discovering statistics using R. London: Sage.

t_out <- t.test(
  x = cosmetic_surgery$Post_QoL
  , y = cosmetic_surgery$Base_QoL
  , paired = TRUE

# Let's see what the apa_print() output looks like
## $estimate
## [1] "$M_d = -3.95$, 95\\% CI $[-4.86$, $-3.04]$"
## $statistic
## [1] "$t(275) = -8.52$, $p < .001$"
## $full_result
## [1] "$M_d = -3.95$, 95\\% CI $[-4.86$, $-3.04]$, $t(275) = -8.52$, $p < .001$"
## $table

As you can see, apa_print() returns a list of four elements: List element estimate contains the sample estimate of mean difference; element statistic contains the result of the NHST with test statistic, dfs, and p value; element full_result contains both information combined. The fourth element, table, contains a table containing test results; while this is NULL for a \(t\) test, it will be quiet interesting for ANOVA or linear model results, as you will see in the next subsections.

4.4.2 lm (linear regression)

The results of a linear model are stored in an lm object.

lm_out <- lm(formula = Post_QoL ~ Base_QoL + BDI, data = cosmetic_surgery)
apa_lm <- apa_print(lm_out)

Again, apa_print returns a four-elements list. apa_lm$table contains a complete regression table that can be passed to apa_table() to get a full regression table, see Table @ref(tab:lm-table; remember to set the chunk option results = "asis").

  , caption = "A full regression table."
  , escape = FALSE
Table 4.1: A full regression table.
Predictor \(b\) 95% CI \(t(273)\) \(p\)
Intercept 18.50 \([13.10\), \(23.91]\) 6.74 < .001
Base QoL 0.59 \([0.50\), \(0.67]\) 13.23 < .001
BDI 0.17 \([0.11\), \(0.22]\) 6.08 < .001
lm_res <- lm(Post_QoL ~ Base_QoL + BDI, data = cosmetic_surgery)
lm_res_apa <- apa_print(lm_res, observed_predictors = TRUE)
## [1] "$b = 18.50$, 95\\% CI $[13.10$, $23.91]$"

\(b = 18.50\), 95% CI \([13.10\), \(23.91]\)

## [1] "$R^2 = .50$, 90\\% CI $[0.42$, $0.57]$, $F(2, 273) = 136.78$, $p < .001$"

\(R^2 = .50\), 90% CI \([0.42\), \(0.57]\), \(F(2, 273) = 136.78\), \(p < .001\)

Tables produced by apa_print() have variable labels

## A data.frame with 5 labelled columns:
##   predictor estimate                 ci statistic p.value
## 1 Intercept    18.50 $[13.10$, $23.91]$      6.74  < .001
## 2  Base QoL     0.59   $[0.50$, $0.67]$     13.23  < .001
## 3       BDI     0.17   $[0.11$, $0.22]$      6.08  < .001
## predictor: Predictor 
## estimate : $b$ 
## ci       : 95\% CI 
## statistic: $t(273)$ 
## p.value  : $p$

Tables produced by apa_print() have variable labels

## $predictor
## [1] "Predictor"
## $estimate
## [1] "$b$"
## $ci
## [1] "95\\% CI"
## $statistic
## [1] "$t(273)$"
## $p.value
## [1] "$p$"
variable_labels(letters) <- "Alphabet"
## Variable label     : Alphabet
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
## attr(,"class")
## [1] "papaja_labelled" "character"
variable_labels(cars$speed) <- "Speed [m/h]"
cars[1:15, "speed"]
## Variable label     : Speed [m/h]
##  [1]  4  4  7  7  8  9 10 10 10 11 11 12 12 12 12
## attr(,"class")
## [1] "papaja_labelled" "numeric"

Variable labels can be added to any vector or data.frame

variable_labels(cars) <- c(
  speed = "Speed [m/h]"
  , dist = "Stopping distance [ft]"
cars[1:3, ]
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4

Variable labels are attributes of the variables that make up the data.frame

  • Conflicting implementations
  • Ours is, in principle, compatible with Hmisc
## [1] "Stopping distance [ft]"
cars[1:15, "dist"]
## Variable label     : Stopping distance [ft]
##  [1]  2 10  4 22 16 10 18 26 34 17 28 14 20 24 28
## attr(,"class")
## [1] "papaja_labelled" "numeric" anova (model comparisons)

4.4.3 anova, Anova, Anova.mlm (Analysis of Variance – ANOVA)

Foreword – you can skip this if you are not interested in background. Amongst psychologists, analysis of variance is a still-popular and frequently used technique. In base R, however, analysis of variance is not implemented in a way that is too useful for psychologists, as the aov-function in the stats package does not provide the much sought-after Type-\(\rm{I\!I\!I}\) sums-of-squares. A variety of R packages has emerged to fill the gap, e.g. car (Fox & Weisberg, 2011), ez (Lawrence, 2016), and afex (Singmann et al., 2016). Unfortunately, each ANOVA function provides different output objects that need to be digested by apa_print(). To do so, we rely on the package authors to return standardized result objects (i.e., S3/S4 classes). For this reason apa_print() does not support results from the popular ezANOVA(). The object returned by this function has no class to inform us about its contents and, thus, cannot be processed. We recommend you try one of the other available packages, e.g. aov_ez() from the afex-package.


cosmetic_surgery_anova <- afex::aov_ez(
  data = cosmetic_surgery
  , dv = "Post_QoL"
  , id = "ID"
  , between = c("Clinic", "Gender")
apa_anova <- apa_print(cosmetic_surgery_anova)

Now, you can report the results of your analyses like so:

Clinic (`r apa_anova$full$Clinic`) and patient gender affected post-surgery quality of life, `r apa_anova$full$Gender`. However, the effect of gender differed by clinic, `r apa_anova$full$Clinic_Gender`.

Which will yield the following text:

Clinic (\(F(9, 256) = 19.98\), \(\mathit{MSE} = 47.58\), \(p < .001\), \(\hat{\eta}^2_G = .413\)) and patient gender affected post-surgery quality of life, \(F(1, 256) = 15.29\), \(\mathit{MSE} = 47.58\), \(p < .001\), \(\hat{\eta}^2_G = .056\). However, the effect of gender differed by clinic, \(F(9, 256) = 1.97\), \(\mathit{MSE} = 47.58\), \(p = .043\), \(\hat{\eta}^2_G = .065\).

Again, you can easily create a complete ANOVA table by passing recall_anova_results$table to apa_table(), see .

  , caption = "A really beautiful ANOVA table."
  , note = "Note that the column names contain beautiful mathematical copy: This is because the table has variable labels."
Table 4.2: A really beautiful ANOVA table.
Effect \(F\) \(\mathit{df}_1\) \(\mathit{df}_2\) \(\mathit{MSE}\) \(p\) \(\hat{\eta}^2_G\)
Clinic 19.98 9 256 47.58 < .001 .413
Gender 15.29 1 256 47.58 < .001 .056
Clinic \(\times\) Gender 1.97 9 256 47.58 .043 .065
Note. Note that the column names contain beautiful mathematical copy: This is because the table has variable labels. glht, lsmeans, ref.grid (post-hoc tests)

4.5 Caching expensive computations

If R codes take a long time to run, results can be cached

```{r heavy-computation, .highlight[cache = TRUE]}
# Imagine computationally expensive R code here

my_manuscript_cache directory is created that contains - Computed objects - List of loaded R packages in __packages

The next time the chunk is executed

  • If unchanged, cached results are loaded
  • If changed, cache is discarded and code is executed anew

Packages are not reloaded, which cause problems in subsequent uncached chunks

  • Do not include library()-calls in cached chunks!

Load cache into an interactive R session using lazyLoad()

If possible at all, avoid caching library()-calls. Subsequent use of functions from these packages in non-cached chunks will throw errors!

To avoid incorrect results due to caching, dependencies should be specified

```{r chunk1, cache = TRUE}
x <- 5

```{r chunk2, cache = TRUE, .highlight[dependson = “chunk1”]}
x <- x + 5

If code in chunk1 changes, chunk2 is recomputed

To avoid incorrect results due to caching, dependencies should be specified

knitr::opts_chunk$set(autodep = TRUE)

See knitr cache examples for details

Control when cache is discarded

  • Ignore changes to comments
knitr::opts_chunk$set(cache.comments = FALSE)
  • Discard cache when random seed changes
opts_chunk$set(cache.extra = knitr::rand_seed)

I would recommend to always set these options

  • Discard cache if data are modified
opts_chunk$set(cache.extra = file.info("data.csv")$mtime)
  • Discard cache if the R environment changes
  cache.extra = list(R.version, sessionInfo())

If time permits, always discard all cached results (i.e., delete my_manuscript_cache directory) and reknit the document before submission to ensure that all results are based on the current R code!

4.5.1 Inserting variables before they have been computed

R code in R Markdown documents in executed from top to bottom. Hence, variables are not available before the code chunk in which they are defined has been executed. However, sometimes it may be desireable to report a variable in a document before the corresponding code chunk. A common example is to include the sample size or focal test statistics in the abstract.

Cached code chunks store their results in a cache database in the subfolder my_document_cache (default). The function knitr::load_cache() can be used to access cache databases and report results of a chunk anywhere in the document. For the following example, we’ll assume that we want to report the sample size in the abstract. The sample size is calculated in the method section:

{r, sample-size, cache = TRUE} n <- nrow(cosmetic_surgery)

To report this variable in the abstract we need to use an inline code chunk in the YAML front matter field:

abstract: |
  This is the abstract.
  Our sample size was n = `r knitr::load_cache(label = "sample-size", object = "n")`.

When the document is first rendered the object n in the code chunk sample-size has not yet been computed and knitr::load_cache() will insert "NOT AVAILABLE" instead. The next time the document is rendered, the results of the previous run have been saved as cache database. These results will be loaded and inserted into the abstract.

For variables computed later in the document, knitr::load_cache() can only ever display the result of the previous run—if the cached code is changed, the document has to be rendered twice to display the new result.

  1. When you use xtable(), table captions are set to the left page margin.

Comments and Questions

Icons by Icons8