Building Statistical Applications

Mark Andrews

From toy examples to real analyses

The value of Shiny for researchers is wrapping actual statistical computations in interactive interfaces. This session: how to structure applications that do real work.

The problem with repeated computation

output$plot  <- renderPlot({ x <- rnorm(input$n); hist(x) })
output$stats <- renderPrint({ x <- rnorm(input$n); summary(x) })

rnorm(input$n) runs twice when input$n changes
Two different samples — plot and stats are inconsistent
For expensive operations this is a serious waste

Reactive expressions

samples <- reactive({ rnorm(input$n) })

output$plot  <- renderPlot({ hist(samples()) })
output$stats <- renderPrint({ summary(samples()) })

reactive({...}) creates a cached, reactive value
Called with () like a function
Re-runs only when its inputs change; caches result in between

The reactive graph with a shared expression

input$n  ──►  samples()  ──►  renderPlot   ──►  output$plot
                         └──►  renderPrint  ──►  output$stats

Diamond shape: one reactive expression feeds two outputs
samples() re-runs once; both outputs get the same sample

Exploring a likelihood function

ll <- reactive({
  sum(dnorm(observed, mean = input$mu, sd = input$sig, log = TRUE))
})
output$llvalue <- renderPrint({ cat(sprintf("log-lik: %.3f\n", ll())) })
output$llplot  <- renderPlot({
  mu_grid <- seq(-2, 6, length.out = 100)
  ll_grid <- sapply(mu_grid, function(m)
    sum(dnorm(observed, mean = m, sd = input$sig, log = TRUE)))
  plot(mu_grid, ll_grid, type = "l",
       xlab = expression(mu), ylab = "Log-likelihood")
  abline(v = input$mu, lty = 2, col = "red")
})

Slider for mu tracks where you are on the likelihood curve
Red line shows current parameter value

ggplot2 inside renderPlot

output$hist <- renderPlot({
  df <- data.frame(x = rnorm(input$n))
  ggplot(df, aes(x = x)) +
    geom_histogram(bins = input$bins, fill = "steelblue", colour = "white") +
    theme_minimal()
})

input$bins is read inside the ggplot call
No special treatment needed — ggplot is just R code inside a reactive context
renderPlot captures whatever is printed or returned

Sampling distributions

\[ \bar{X} \sim \mathcal{N}\!\left(\mu,\ \frac{\sigma^2}{n}\right) \]

Shiny makes it easy to show this empirically
Vary $n$ and watch the distribution of sample means narrow
Vary the population shape and watch the CLT take effect

Reactive linear regression

model <- reactive({
  fmla <- as.formula(paste("mpg ~", input$xvar))
  lm(fmla, data = mtcars)
})
output$regplot    <- renderPlot({ pred <- predict(model(), ...) ... })
output$coeftable  <- renderTable({ coef(summary(model())) })

model() is fitted once; both outputs read from it
Selecting a new predictor refits the model and updates both outputs simultaneously

Structuring server code

For applications beyond a few outputs, keep the server organised:

Define reactive expressions first (data, models)
Then define render blocks (plots, tables, text)
Separate observeEvent blocks for side effects

Clear names like data_clean, model_fitted, plot_main make the flow obvious.

When to use reactive expressions

Use a reactive expression when:

The same computation is needed by more than one output
The computation is expensive (fitting a model, reading a file, running a simulation)
You want to isolate and name an intermediate result for clarity

For cheap single-use computations, inline code inside renderPlot is fine.

Summary

reactive({...}) caches a computation and shares it across outputs
Call reactive expressions with ()
Structure the server: reactive expressions first, render blocks second
ggplot2 works inside renderPlot without any special adaptation
The reactive graph makes the data flow explicit and efficient