library(tidyverse); library(patchwork); library(tidybayes)
library(brms); library(marginaleffects); library(viridis)
library(easystats); library(kableExtra)
(Update 18/03/2024: Fixing minor formatting issues)
Load libraries
Set theme
Code
theme_set(theme_bw(16))
We’re first going to load all model objects stored in the stan
folder to avoid rerunning all those pesky Bayesian computations.
Code
# Make a vector of file paths
<- list.files(path = "stan/",
file_paths pattern = "\\.rds", full.names = TRUE)
# Make a vector of file names
<- gsub(pattern = "\\.rds$", replacement = "",
file_names x = basename(file_paths))
# Read all models into a list
<- lapply(file_paths, readRDS)
mods_list
# Assign file names to list elements
names(mods_list) <- file_names
Rationale
I’ve been working a lot with nonlinear models lately. In many cases across ecology and evolution, trends do not go on forever. Having modeling approaches that can account for boundaries and plateaus is therefore very important in specific situations. In my case I’m generally interested in how contamination exposure affects growth, reproduction and behavior of invertebrates and its consequences on individual fitness. In this post I’m exploring how to use nonlinear mixed effect models of growth to recover individual trajectories using simulated data. I’m mostly using the {brms}
package for now as it provides a very user-friendly syntax for fitting these models. The level of complexity I’m dealing with is veering dangerously toward a full STAN
implementation, especially when I’ll have to deal with jointly modeling fitness, but that’s a problem for future me!
Inspiration
A lot of the code in this document has been taken and adapted from Solomon Kurtz’s work and his translation of Richard McElreath’s Statistical Rethinking book into {brms}
syntax. Specifically, the simulation of individual differences is directly inspired from working through Chapter 14 section 1.3. Many thanks to them for putting out all this material in an open format!
Model description
I’m using the model described in Goussen et al. (2013) showing adaptation of C. elegans to Uranium. The model assumes nematodes body-length grows according to the following Gompertz function:
\[ L = L_{inf}\times e^{ln \left( \frac{L_0}{Linf} \right) \times e^{-\alpha t} }\]
where \(L_{inf}\) is the asymptotic body-length, \(L_0\) is the body-length at hatching and \(\alpha\) the growth rate.
Plugging in the values from the paper gives a sigmoid growth curve:
Code
# Store parameters for non-exposed populations
= 10 # 10 individuals
n_id = seq(0, 126, by = 24) # One observation every day
obs = 1370
Linf = 182
L0 = 0.028
alpha = seq(0,126, by = 1) # measure every h for 126 h
t = 100 # random noise
sigma
= function(t, Linf, L0, alpha){
Gomp.fun = Linf * exp(log(L0/Linf)*exp(-alpha*t)) # predicted growth
Lhat return(Lhat)
}
# Apply equation
set.seed(42)
= crossing(obs, 1:n_id) %>%
df mutate(Lhat = Gomp.fun(obs, Linf = Linf, L0 = L0, alpha = alpha)) %>%
mutate(L = rnorm(n(), Lhat, sigma))
%>%
df ggplot(aes(y = L, x = obs)) +
geom_point(alpha = .2, size = 2.5) +
geom_function(fun = Gomp.fun,
args = list(Linf = Linf, L0 = L0, alpha = alpha),
color = "red", size = 1) +
ylim(0, 1600) +
labs(x = "Hours since hatching", y = "Body-length (mum)")
Simulating individual differences
We can use the brms
package to make simulations according to some prior distributions for our parameters. We first need to store the values of the growth parameters for each individual
Data simulations
I’m using the same approach as above but now allow individual to have some deviation from the population mean for all parameters: \(\mu_i \sim N(mu + \text{offset}, \text{ } \sigma)\) where \(\mu\) and \(\mu_i\) are the individual and population mean for a given parameter and the offset is calculated as: \(\text{offset} \sim N(0, \text{ } \sigma_i)\) where \(\sigma_i\) is the among-individual standard deviation for a given parameters. In our simulation, we assume a coefficient of variation 10 % for all parameters such that \(\sigma_i = 0.10 \times \mu\)
There are different ways to interpret the \(L_{inf}\) parameter. In Goussen et al’ paper, this was assumed to be the maximum body-size for the population. Meaning that with enough time, all individuals will converge toward this value. Most individuals of course will die before that! Here, I am assuming that all individuals have their own maximum size they can reach. This means that two individuals with the same growth rate measured for the same amount of time may still differ in their maximum length.
Code
= 30 # 30 individuals
n_id = seq(0, 126, by = 24) # One observation every day
times = 10 # random noise
sd = 182 # initial length (micrometers)
L0_mu = 1370 # maximal length (micrometers)
Linf_mu = 0.028 # Growth rate (hour^-1)
alpha_mu
= 0 # Suppose all parameters are independent
rho
= c(L0_mu, Linf_mu, alpha_mu)
mu = c(L0_mu*.1, Linf_mu*.1, alpha_mu*.1) # 10 % CV around the mean
sigmas = matrix(c(1, rho, rho,
rho_mat 1, rho,
rho, 1),
rho, rho, nrow = 3)
= diag(sigmas) %*% rho_mat %*% diag(sigmas)
sigma
set.seed(42)
= MASS::mvrnorm(n_id, mu, sigma) %>%
ID data.frame() %>%
set_names("L0_i", "Linf_i", "alpha_i") %>%
mutate(ID = 1:n_id)
# Simulate individual growth
= ID %>%
df ::expand(nesting(ID, L0_i, Linf_i, alpha_i),
tidyrt = times) %>%
mutate(Lhat = Linf_i * exp(log(L0_i/Linf_i)*exp(-alpha_i*t))) %>%
mutate(L = rnorm(n(), Lhat, sd))
= df %>%
fig.title ggplot(aes(y = L, x = t, group = ID)) +
geom_point(alpha = .2, size = 2.5) +
geom_line(alpha = .2, size = .5) +
geom_function(fun = Gomp.fun,
args = list(Linf = Linf, L0 = L0, alpha = alpha),
color = "red", size = 1) +
ylim(0, 1600) +
labs(x = "Hours since hatching",
y = expression(paste("Body-length (", mu, "m)")))
fig.title
Defining the model in brms
In brms
, we can use the standard R
formula syntax to specify the Gompertz model
Code
= bf(L ~
bf * exp(log(L0 / Linf) * exp(-alpha * t)), # Gompertz population curve
Linf + Linf + alpha ~ 1 + (1|ID), # parameters with random effects
L0 nl = T)
= get_prior(bf, df)
priors %>% kable(digits = 2) priors
prior | class | coef | group | resp | dpar | nlpar | lb | ub | source |
---|---|---|---|---|---|---|---|---|---|
student_t(3, 0, 521.9) | sigma | 0 | default | ||||||
b | alpha | default | |||||||
b | Intercept | alpha | default | ||||||
student_t(3, 0, 521.9) | sd | alpha | 0 | default | |||||
sd | ID | alpha | default | ||||||
sd | Intercept | ID | alpha | default | |||||
b | L0 | default | |||||||
b | Intercept | L0 | default | ||||||
student_t(3, 0, 521.9) | sd | L0 | 0 | default | |||||
sd | ID | L0 | default | ||||||
sd | Intercept | ID | L0 | default | |||||
b | Linf | default | |||||||
b | Intercept | Linf | default | ||||||
student_t(3, 0, 521.9) | sd | Linf | 0 | default | |||||
sd | ID | Linf | default | ||||||
sd | Intercept | ID | Linf | default |
These are default priors and may not be well-suited for our problem. For some reason, brms
won’t let us sample from these default priors, but we can easily illustrate the point by setting unrealistically uninformative priors (i.e. normal(0, 100) here) on these parameters. I’m using the lower bound argument lb = 0
to keep the growth positive.
Code
= priors =
priors # Intercept priors
prior(normal(0, 100), nlpar = L0, class = b, lb = 0) +
prior(normal(0, 100), nlpar = Linf, class = b, lb = 0) +
prior(normal(0, 100), nlpar = alpha, class = b, lb = 0)
Next, we fit the model and simply specify sample_prior = "only"
in the brm()
function to only get the growth trends implied by the priors
Code
# plotting
plot(conditional_effects(mods_list$gomp.prior.default,
ndraws = 100, spaghetti = T))
Not very good right?! Let’s retry now with more reasonable priors
Code
=
priors # Intercept priors
prior(normal(200, 40), nlpar = L0, class = b, lb = 0) +
prior(normal(1500, 300), nlpar = Linf, class = b, lb = 0) +
prior(normal(.03,.006), nlpar = alpha, class = b, lb = 0) +
# Random effects priors (informative priors with 20 % CV)
prior(exponential(.025), nlpar = L0, class = sd, group = ID) +
prior(exponential(.003), nlpar = Linf, class = sd, group = ID) +
prior(exponential(170), nlpar = alpha, class = sd, group = ID) +
# Residual prior
prior(exponential(1), class = sigma)
# Plot priors
= priors %>%
p1 parse_dist() %>%
filter(class == "b") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Intercepts") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
= priors %>%
p2 parse_dist() %>%
filter(class == "sd") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Among-individual variance") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
= priors %>%
p3 parse_dist() %>%
filter(class == "sigma") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Residual variance") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
+ p2 + p3) + plot_layout(ncol = 1) (p1
Prior predictive checks
We can now check if the model shows a more appropriate gortwh curve while sampling only from these new priors.
Inspecting the model and plotting predicted growth
Code
$gomp.prior mods_list
Family: gaussian
Links: mu = identity; sigma = identity
Formula: L ~ Linf * exp(log(L0/Linf) * exp(-alpha * t))
L0 ~ 1 + (1 | ID)
Linf ~ 1 + (1 | ID)
alpha ~ 1 + (1 | ID)
Data: df (Number of observations: 180)
Draws: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
total post-warmup draws = 2000
Multilevel Hyperparameters:
~ID (Number of levels: 30)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(L0_Intercept) 40.67 42.23 1.14 156.94 1.00 2507 1141
sd(Linf_Intercept) 332.87 334.35 8.69 1223.07 1.00 2706 955
sd(alpha_Intercept) 0.01 0.01 0.00 0.02 1.00 2892 1169
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
L0_Intercept 198.73 41.39 120.14 281.26 1.00 3330 1271
Linf_Intercept 1492.43 311.92 885.14 2091.52 1.00 3261 1357
alpha_Intercept 0.03 0.01 0.02 0.04 1.00 3684 1428
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 1.03 1.00 0.04 3.46 1.00 2297 1327
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
plot(conditional_effects(mods_list$gomp.prior,
ndraws = 100, spaghetti = T))
Code
= crossing(t = seq(min(df$t),
re max(df$t),
length.out=100),
ID = unique(df$ID)) %>%
add_epred_draws(mods_list$gomp.prior, re_formula = NULL,
scale = "response", ndraws = 20)
%>%
re ggplot(aes(y = .epred, x = t)) +
geom_line(aes(y = .epred, x = t, group = .draw), size = .5, alpha = .5) +
geom_point(data = df, aes(y=L, x=t, color = ID)) +
facet_wrap(~ID, nrow = 6, ncol = 5) +
scale_color_viridis() +
ylab("Mass (mg)") +
xlab("Time") +
theme_bw(12) +
theme(legend.position = "none")
Fit model to simulated data
Next, we can fit the model to the simulated dataset
Inspecting the model and plotting predicted growth
Code
model_parameters(mods_list$gomp, effects = "all") %>%
kable(digits = 2)
Parameter | Effects | Component | Median | CI | CI_low | CI_high | pd | Rhat | ESS | Group |
---|---|---|---|---|---|---|---|---|---|---|
b_L0_Intercept | fixed | conditional | 179.45 | 0.95 | 170.68 | 453.99 | 1 | 16.77 | 2.01 | |
b_Linf_Intercept | fixed | conditional | 1362.79 | 0.95 | 0.08 | 1434.54 | 1 | 23.18 | 2.00 | |
b_alpha_Intercept | fixed | conditional | 0.03 | 0.95 | 0.00 | 0.03 | 1 | 32.25 | 2.00 | |
sd_ID__L0_Intercept | random | conditional | 18.31 | 0.95 | 1.50 | 27.59 | 1 | 1.23 | 6.34 | ID |
sd_ID__Linf_Intercept | random | conditional | 156.11 | 0.95 | 0.01 | 215.75 | 1 | 3.98 | 2.12 | ID |
sd_ID__alpha_Intercept | random | conditional | 0.00 | 0.95 | 0.00 | 0.00 | 1 | 1.51 | 3.46 | ID |
sigma | fixed | sigma | 9.75 | 0.95 | 8.36 | 139.79 | 1 | 20.58 | 2.01 |
Code
pp_check(mods_list$gomp, ndraws = 100)
Code
plot(conditional_effects(mods_list$gomp, ndraws = 100, spaghetti = T))
Code
= crossing(t = seq(min(df$t),
re max(df$t),
length.out=100),
ID = unique(df$ID)) %>%
add_epred_draws(mods_list$gomp, re_formula = NULL,
scale = "response", ndraws = 20)
%>%
re ggplot(aes(y = .epred, x = t)) +
geom_line(aes(y = .epred, x = t, group = .draw), size = .5, alpha = .5) +
geom_point(data = df, aes(y=L, x=t, color = ID)) +
facet_wrap(~ID, nrow = 6, ncol = 5) +
scale_color_viridis() +
ylab("Mass (mg)") +
xlab("Time") +
theme_bw(12) +
theme(legend.position = "none")
The model functions correctly but there are a lot of divergent transitions and some poor convergence diagnostics. Let’s try reparametrizing to improve the sampling.
Model reparametrization
We can express all body-length parameters as a function of the maximum length to help the model converge. Divinding each side of the Gompertz equation gives
\[\begin{aligned} l &= L/L_{inf}\\ l_0 &=L_0/L_{inf}\\ l &= e^{ln \left( l_0 \right) \times e^{-\alpha t} } \end{aligned}\]
Code
ggplot() +
geom_function(fun = Gomp.fun,
args = list(Linf = 1, L0 = L0/Linf, alpha = alpha),
color = "red", size = 1) +
xlim(0, 126) + ylim(0, 1) +
labs(x = "Hours since hatching", y = "Scaled body-length")
Prior predictive checks
We need to reformulate model formula and priors in order to get sensible outputs
Code
$l = with(df, L/Linf)
df
= bf(l ~
bf exp(log(l0) * exp(-alpha * t)), # Gompertz population curve
+ alpha ~ 1 + (1|ID), # parameters with random effects
l0 nl = T)
Defining and plotting priors
Code
=
priors # Intercept priors
prior(normal(.15, .03), nlpar = l0, class = b, lb = 0) +
prior(normal(.03,.006), nlpar = alpha, class = b, lb = 0) +
# Random effects priors (informative priors with 20 % CV)
prior(exponential(34), nlpar = l0, class = sd, group = ID) +
prior(exponential(170), nlpar = alpha, class = sd, group = ID) +
# Residual prior
prior(exponential(1), class = sigma)
# Plot priors
= priors %>%
p1 parse_dist() %>%
filter(class == "b") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Intercepts") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
= priors %>%
p2 parse_dist() %>%
filter(class == "sd") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Among-individual variance") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
= priors %>%
p3 parse_dist() %>%
filter(class == "sigma") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Residual variance") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
+ p2 + p3) + plot_layout(ncol = 1) (p1
And now running the model on priors only
Model inspection
Code
model_parameters(mods_list$gomp.sc.prior, effects = "all") %>%
kable(digits = 2)
Parameter | Effects | Component | Median | CI | CI_low | CI_high | pd | Rhat | ESS | Group |
---|---|---|---|---|---|---|---|---|---|---|
b_l0_Intercept | fixed | conditional | 0.15 | 0.95 | 0.09 | 0.20 | 1 | 1 | 2769.57 | |
b_alpha_Intercept | fixed | conditional | 0.03 | 0.95 | 0.02 | 0.04 | 1 | 1 | 2270.10 | |
sd_ID__l0_Intercept | random | conditional | 0.02 | 0.95 | 0.00 | 0.12 | 1 | 1 | 2925.44 | ID |
sd_ID__alpha_Intercept | random | conditional | 0.00 | 0.95 | 0.00 | 0.02 | 1 | 1 | 2850.84 | ID |
sigma | fixed | sigma | 0.69 | 0.95 | 0.02 | 3.91 | 1 | 1 | 2914.29 |
Code
plot(conditional_effects(mods_list$gomp.sc.prior,
ndraws = 100, spaghetti = T))
Code
= crossing(t = seq(min(df$t),
re max(df$t),
length.out=100),
ID = unique(df$ID)) %>%
add_epred_draws(mods_list$gomp.sc.prior, re_formula = NULL,
scale = "response", ndraws = 20)
%>%
re ggplot(aes(y = .epred, x = t)) +
geom_line(aes(y = .epred, x = t, group = .draw), size = .5, alpha = .5) +
geom_point(data = df, aes(y=l, x=t, color = ID)) +
facet_wrap(~ID, nrow = 6, ncol = 5) +
scale_color_viridis() +
ylab("Mass (mg)") +
xlab("Time") +
theme_bw(12) +
theme(legend.position = "none")
Fitting model to data
As above we rerun the scaled model on the scaled data
Model inspection
Code
model_parameters(mods_list$gomp.sc, effects = "all") %>%
kable(digits = 2)
Parameter | Effects | Component | Median | CI | CI_low | CI_high | pd | Rhat | ESS | Group |
---|---|---|---|---|---|---|---|---|---|---|
b_l0_Intercept | fixed | conditional | 0.13 | 0.95 | 0.12 | 0.14 | 1 | 1.00 | 4283.00 | |
b_alpha_Intercept | fixed | conditional | 0.03 | 0.95 | 0.03 | 0.03 | 1 | 1.00 | 657.83 | |
sd_ID__l0_Intercept | random | conditional | 0.01 | 0.95 | 0.00 | 0.02 | 1 | 1.01 | 989.17 | ID |
sd_ID__alpha_Intercept | random | conditional | 0.01 | 0.95 | 0.00 | 0.01 | 1 | 1.00 | 733.22 | ID |
sigma | fixed | sigma | 0.04 | 0.95 | 0.03 | 0.04 | 1 | 1.00 | 3774.40 |
Code
plot(conditional_effects(mods_list$gomp.sc,
ndraws = 100, spaghetti = T))
Code
pp_check(mods_list$gomp.sc, ndraws = 100)
Code
= crossing(t = seq(min(df$t),
re max(df$t),
length.out=100),
ID = unique(df$ID)) %>%
add_epred_draws(mods_list$gomp.sc, re_formula = NULL,
scale = "response", ndraws = 20)
%>%
re ggplot(aes(y = .epred, x = t)) +
geom_line(aes(y = .epred, x = t, group = .draw), size = .5, alpha = .5) +
geom_point(data = df, aes(y=l, x=t, color = ID)) +
facet_wrap(~ID, nrow = 6, ncol = 5) +
scale_color_viridis() +
ylab("Mass (mg)") +
xlab("Time") +
theme_bw(12) +
theme(legend.position = "none")
Much better!
Assuming a trade-off between maximum size and growth-rate
So far, we have assumed that all our parameters are independent. However, trade-offs between various life-history traits are also likely to occur. One such trade-off is the compromise between fast growth and reproductive rate, such that individuals that mature sooner reach a smaller asymptotic length. This is the case in collembolas exposed to Cadmium for example. We can easily modify our simulation function to incorporate a similar trade-off in our data.
Data simulations and modeling
Code
= 30 # 30 individuals
n_id = seq(0, 126, by = 24) # One observation every day
times = 10 # random noise
sd = 182 # initial length (micrometers)
L0_mu = 1370 # maximal length (micrometers)
Linf_mu = 0.028 # Growth rate (hour^-1)
alpha_mu
= -.7 # Assume strong negative correlation between Linf and alpha
rho
= c(L0_mu, Linf_mu, alpha_mu)
mu = c(L0_mu*.1, Linf_mu*.1, alpha_mu*.1) # 10 % CV around the mean
sigmas = matrix(c(1, 0, 0,
rho_mat 0, 1, rho,
0, rho, 1),
nrow = 3)
= diag(sigmas) %*% rho_mat %*% diag(sigmas)
sigma
set.seed(42)
= MASS::mvrnorm(n_id, mu, sigma) %>%
ID data.frame() %>%
set_names("L0_i", "Linf_i", "alpha_i") %>%
mutate(ID = 1:n_id)
# Plot
%>%
ID select(-ID) %>%
::ggpairs() +
GGallytheme_bw()
And now we simulate growth over 126 hours as before
Code
# Simulate individual growth
= ID %>%
df ::expand(nesting(ID, L0_i, Linf_i, alpha_i),
tidyrt = times) %>%
mutate(Lhat = Linf_i * exp(log(L0_i/Linf_i)*exp(-alpha_i*t))) %>%
mutate(L = rnorm(n(), Lhat, sd))
%>%
df ggplot(aes(y = L, x = t, group = ID)) +
geom_point(alpha = .2, size = 2.5) +
geom_line(alpha = .2, size = .5) +
geom_function(fun = Gomp.fun,
args = list(Linf = Linf_mu, L0 = L0_mu, alpha = alpha_mu),
color = "red", size = 1) +
ylim(0, 1600) +
labs(x = "Hours since hatching", y = "Body-length (mum)")
Switching directly to fitting the model to the data, we make some small adjustments to the formula
Code
$l = with(df, L/Linf_mu)
df
= bf(l ~
bf *exp(log(l0/linf) * exp(-alpha * t)), # Gompertz population curve
linf+ linf + alpha ~ 1 + (1|c|ID), # parameters with random effects
l0 nl = T)
We introduce a scaled version of \(L_{inf}\) into the formula and allow random effects to be correlated with the (1|c|ID)
chunk. Next we need to specify a strong prior for on \(l_{inf}\) to constrain it around 1. We also specify a prior on the correlation parameter using an LKJ prior of parameter \(\eta>1\)
Code
=
priors # Intercept priors
prior(normal(.15, .03), nlpar = l0, class = b, lb = 0) +
prior(normal(1, .03), nlpar = linf, class = b, lb = 0) +
prior(normal(.03,.006), nlpar = alpha, class = b, lb = 0) +
# Random effects priors (informative priors with 20 % CV)
prior(exponential(34), nlpar = l0, class = sd, group = ID) +
prior(exponential(5), nlpar = linf, class = sd, group = ID) +
prior(exponential(170), nlpar = alpha, class = sd, group = ID) +
# Residual prior
prior(exponential(1), class = sigma) +
# Correlation prior
prior(lkj(2), class = cor)
# Plot priors for linf
%>%
priors parse_dist() %>%
filter(nlpar == "linf") %>%
ggplot(aes(xdist = .dist_obj, y = format(.dist_obj))) +
stat_dist_halfeye() +
facet_wrap(~nlpar, scales = "free") +
ggtitle("Scaled asymptotic length priors") +
xlab("Value") + ylab("Density") +
theme_bw(12) +
theme(axis.text.y = element_text(angle = 90))
To get a better sense of what these prior imply, we can simulate the distribution of individual differences in \(l_{inf}\)
Code
= data.frame(offsets = rnorm(n_id*100, 0, .2)) %>%
linf_sim mutate(linf_mu = rnorm(n(), (1 + offsets), sigma))
%>%
linf_sim ggplot(aes(x = linf_mu)) +
stat_halfeye() +
xlim(-2,3)
We now fit the model to the simulated data
Model inspection
Code
model_parameters(mods_list$gomp.to, effects = "all") %>%
kable(digits = 2)
Parameter | Effects | Component | Median | CI | CI_low | CI_high | pd | Rhat | ESS | Group |
---|---|---|---|---|---|---|---|---|---|---|
b_l0_Intercept | fixed | conditional | 0.13 | 0.95 | 0.12 | 0.14 | 1.00 | 1 | 2221.01 | |
b_linf_Intercept | fixed | conditional | 1.00 | 0.95 | 0.97 | 1.04 | 1.00 | 1 | 1853.79 | |
b_alpha_Intercept | fixed | conditional | 0.03 | 0.95 | 0.03 | 0.03 | 1.00 | 1 | 1865.96 | |
sd_ID__l0_Intercept | random | conditional | 0.01 | 0.95 | 0.01 | 0.02 | 1.00 | 1 | 2073.78 | ID |
sd_ID__linf_Intercept | random | conditional | 0.12 | 0.95 | 0.09 | 0.15 | 1.00 | 1 | 1506.85 | ID |
sd_ID__alpha_Intercept | random | conditional | 0.00 | 0.95 | 0.00 | 0.00 | 1.00 | 1 | 1640.96 | ID |
cor_ID__l0_Intercept__linf_Intercept | random | conditional | -0.33 | 0.95 | -0.62 | 0.02 | 0.97 | 1 | 1003.81 | ID |
cor_ID__l0_Intercept__alpha_Intercept | random | conditional | 0.07 | 0.95 | -0.30 | 0.43 | 0.64 | 1 | 1036.33 | ID |
cor_ID__linf_Intercept__alpha_Intercept | random | conditional | -0.72 | 0.95 | -0.86 | -0.46 | 1.00 | 1 | 3286.74 | ID |
sigma | fixed | sigma | 0.01 | 0.95 | 0.01 | 0.01 | 1.00 | 1 | 1478.80 |
Code
plot(conditional_effects(mods_list$gomp.to,
ndraws = 100, spaghetti = T))
Code
pp_check(mods_list$gomp.to, ndraws = 100)
Code
= crossing(t = seq(min(df$t),
re max(df$t),
length.out=100),
ID = unique(df$ID)) %>%
add_epred_draws(mods_list$gomp.to, re_formula = NULL,
scale = "response", ndraws = 20)
%>%
re ggplot(aes(y = .epred, x = t)) +
geom_line(aes(y = .epred, x = t, group = .draw), size = .5, alpha = .5) +
geom_point(data = df, aes(y=l, x=t, color = ID)) +
facet_wrap(~ID, nrow = 6, ncol = 5) +
scale_color_viridis() +
ylab("Mass (mg)") +
xlab("Time") +
theme_bw(12) +
theme(legend.position = "none")
This recovers the parameter values pretty well! Note that there is some moderate correlation between \(l_0\) and \(l_{inf}\) due to sampling even if the true value is 0. The model seems to overestimate slightly this value compared to the value found in the simulated data. Let’s now plot these correlations according to the model estimates
Code
= mods_list$gomp.to %>%
re spread_draws(# Population values
b_l0_Intercept, b_linf_Intercept, b_alpha_Intercept, # Individual offsets
r_ID__l0[ID,Intercept], r_ID__linf[ID,Intercept], r_ID__alpha[ID,Intercept],# Individual variances
sd_ID__l0_Intercept, sd_ID__linf_Intercept, sd_ID__alpha_Intercept,%>%
sigma) # Individual offsets converted onto the original length scale (in micrometers)
mutate(L0_i = (b_l0_Intercept + r_ID__l0) * Linf_mu,
Linf_i = (b_linf_Intercept + r_ID__linf) * Linf_mu,
alpha_i = b_alpha_Intercept + r_ID__alpha) %>%
# Population averge distribution
mutate(L0_dist = rnorm(n(), b_l0_Intercept, sd_ID__l0_Intercept)*Linf_mu,
Linf_dist = rnorm(n(), b_linf_Intercept, sd_ID__linf_Intercept)*Linf_mu,
alpha_dist = rnorm(n(), b_alpha_Intercept, sd_ID__alpha_Intercept))
= re %>%
re.mean select(.chain, .iteration, .draw, L0_i, Linf_i, alpha_i) %>%
mean_qi(L0_i, Linf_i, alpha_i)
# Summarize individual values into mean, lower and upper 95 % quantiles
# Plot population average (diagonal elements)
= re %>%
L0_dist ggplot(aes(x = L0_dist)) +
stat_histinterval(slab_color = "gray45",
outline_bars = TRUE) +
labs(x = "", y = expression(L[0])) +
theme_bw(12) +
theme(aspect.ratio=1)
= re %>%
Linf_dist ggplot(aes(x = Linf_dist)) +
stat_histinterval(slab_color = "gray45",
outline_bars = TRUE) +
labs(x = "", y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
= re %>%
alpha_dist ggplot(aes(x = alpha_dist)) +
stat_histinterval(slab_color = "gray45",
outline_bars = TRUE) +
labs(x = expression(alpha), y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
# Plot individual average with CI (lower diagonal elements)
= re.mean %>%
corr1 ggplot(aes(x = L0_i, y = Linf_i)) +
geom_errorbarh(aes(xmin = L0_i.lower, xmax = L0_i.upper)) +
geom_errorbar(aes(ymin = Linf_i.lower, ymax = Linf_i.upper)) +
geom_point(alpha = .8, size = 3) +
labs(x = "", y = expression(L[inf])) +
theme_bw(12) +
theme(aspect.ratio=1)
= re.mean %>%
corr2 ggplot(aes(x = L0_i, y = alpha_i)) +
geom_errorbarh(aes(xmin = L0_i.lower, xmax = L0_i.upper)) +
geom_errorbar(aes(ymin = alpha_i.lower, ymax = alpha_i.upper)) +
geom_point(alpha = .8, size = 3) +
labs(x = expression(L[0]), y = expression(alpha)) +
theme_bw(12) +
theme(aspect.ratio=1)
= re.mean %>%
corr3 ggplot(aes(x = Linf_i, y = alpha_i)) +
geom_errorbarh(aes(xmin = Linf_i.lower, xmax = Linf_i.upper)) +
geom_errorbar(aes(ymin = alpha_i.lower, ymax = alpha_i.upper)) +
geom_point(alpha = .8, size = 3) +
labs(x = expression(L[inf]), y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
# Plot correlation estimate (upper diagonal elements)
= mods_list$gomp.to %>%
dcorr1 spread_draws(`cor.*`, regex = TRUE) %>%
ggplot(aes(x = cor_ID__l0_Intercept__linf_Intercept)) +
stat_halfeye() +
geom_vline(xintercept = 0, linewidth = 1, color = "black", linetype = "dashed") +
xlim(-1, 1) +
labs(x = "", y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
= mods_list$gomp.to %>%
dcorr2 spread_draws(`cor.*`, regex = TRUE) %>%
ggplot(aes(x = cor_ID__l0_Intercept__alpha_Intercept)) +
stat_halfeye() +
geom_vline(xintercept = 0, linewidth = 1, color = "black", linetype = "dashed") +
xlim(-1, 1) +
labs(x = "", y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
= mods_list$gomp.to %>%
dcorr3 spread_draws(`cor.*`, regex = TRUE) %>%
ggplot(aes(x = cor_ID__linf_Intercept__alpha_Intercept)) +
stat_halfeye() +
geom_vline(xintercept = 0, linewidth = 1, color = "black", linetype = "dashed") +
xlim(-1, 1) +
labs(x = "", y = "") +
theme_bw(12) +
theme(aspect.ratio=1)
# Arrange plot into 3 x 3 grid
+ dcorr1 + dcorr2 +
L0_dist + Linf_dist + dcorr3 +
corr1 + corr3 + alpha_dist +
corr2 plot_layout(ncol = 3, nrow = 3, byrow = T)
Another possibility is to sample correlation estimates and plot those on a small grid with each color corresponding to the correlation strength. Solution found on Matthew Kay and Solomon Kurtz websites
Code
= c("l[0]", "l[inf]", "alpha")
levels
= as_draws_df(mods_list$gomp.to) %>%
rho select(starts_with("cor_")) %>%
slice_sample(n = 60 * 60) %>%
bind_cols(crossing(x = 1:60, y = 1:60)) %>%
pivot_longer(cols = -c(x:y)) %>%
mutate(name = str_remove(name, "cor_ID__")) %>%
separate(name, into = c("col", "row"), sep = "__") %>%
mutate(
col = case_when(
== "l0_Intercept" ~ "l[0]",
col == "linf_Intercept" ~ "l[inf]",
col == "alpha_Intercept" ~ "alpha"),
col row = case_when(
== "l0_Intercept" ~ "l[0]",
row == "linf_Intercept" ~ "l[inf]",
row == "alpha_Intercept" ~ "alpha")) %>%
row mutate(col = factor(col, levels = levels),
row = factor(row, levels = levels))
%>%
rho full_join(rename(rho, col = row, row = col),
by = c("x", "y", "col", "row", "value")) %>%
ggplot(aes(x = x, y = y, fill = value)) +
geom_raster() +
scale_fill_distiller(type = "div",
palette = "RdBu",
limits = c(-1, 1), name = expression(rho)) +
# scale_fill_gradient2(expression(rho),
# low = "#59708b", mid = "#FCF9F0", high = "#A65141", midpoint = 0,
# labels = c(-1, "", 0, "", 1), limits = c(-1, 1)) +
scale_x_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0, 0)) +
theme(strip.text = element_text(size = 12)) +
facet_grid(row ~ col, labeller = label_parsed, switch = "y")
In conclusion
Mathematical description of the statistical models
Formally, our statistical model can be described with the following set of equations
Non-scaled model
$$ \[\begin{aligned} \large \text{Model Likelihood}\\ \text{Average growth rate}\\ L_{i,t} &\sim N(\mu_{i,t}, \text{ } \sigma)\\ \mu_{i,t} &= L_{inf_{i}}\times e^{ln \left( \frac{L_{0_{i}}}{L_{inf_{i}}} \right) \times e^{-\alpha_i t} } \\ \text{Among-individual covariance}\\ \begin{bmatrix} \mu_{L_{0_{i}}}\\ \mu_{L_{inf_{i}}}\\ \mu_{\alpha_i}\\ \end{bmatrix} &\sim MVN(\begin{bmatrix} \mu_{L_{0}}\\ \mu_{L_{inf}}\\ \mu_{\alpha}\\ \end{bmatrix}, \Sigma)\\ \Sigma &= \begin{bmatrix} \sigma_{L_{0_{i}}} & 0 & 0\\ 0 & \sigma_{L_{inf_{i}}} & 0 \\ 0 & 0 & \sigma_{\alpha_i} \end{bmatrix} R \begin{bmatrix} \sigma_{L_{0_{i}}} & 0 & 0\\ 0 & \sigma_{L_{inf_{i}}} & 0 \\ 0 & 0 & \sigma_{\alpha_i} \end{bmatrix} \\ R &= \begin{bmatrix} 1 & \rho_{1,2} & \rho_{1,3}\\ \rho_{1,2} & 1 & \rho_{2,3} \\ \rho_{1,3} & \rho_{2,3} & 1 \end{bmatrix} \\ \large \text{Priors}\\ \text{Population Intercepts}\\ \mu_{L_{0}} &\sim N(200, 40) \\ \mu_{L_{inf}} &\sim N(1500, 300) \\ \mu_{\alpha} &\sim N(0.03, 0.006) \\ \text{Among-individual variances}\\ \sigma_{L_{0_{i}}} &\sim Exp(.025) \\ \sigma_{L_{inf_{i}}} &\sim Exp(.003) \\ \sigma_{\alpha_i} &\sim Exp(170) \\ \sigma &\sim Exp(1) \\ \text{Among-individual covariance}\\ R &\sim LKJ(2) \end{aligned}\]$$
Scaled model
Compared to the non-scaled model, the major difference here is that we put a strong prior around 1 for the \(l_{inf}\) parameter.
$$ \[\begin{aligned} \large \text{Model Likelihood}\\ \text{Average growth rate}\\ l_{i,t} &\sim N(\mu_{i,t}, \text{ } \sigma)\\ \mu_{i,t} &= l_{inf_{i}}\times e^{ln \left( \frac{l_{0_{i}}}{l_{inf_{i}}} \right) \times e^{-\alpha_i t} } \\ \text{Among-individual covariance}\\ \begin{bmatrix} \mu_{l_{0_{i}}}\\ \mu_{l_{inf_{i}}}\\ \mu_{\alpha_i}\\ \end{bmatrix} &\sim MVN(\begin{bmatrix} \mu_{l_{0}}\\ \mu_{l_{inf}}\\ \mu_{\alpha}\\ \end{bmatrix}, \Sigma)\\ \Sigma &= \begin{bmatrix} \sigma_{l_{0_{i}}} & 0 & 0\\ 0 & \sigma_{l_{inf_{i}}} & 0 \\ 0 & 0 & \sigma_{\alpha_i} \end{bmatrix} R \begin{bmatrix} \sigma_{L_{0_{i}}} & 0 & 0\\ 0 & \sigma_{L_{inf_{i}}} & 0 \\ 0 & 0 & \sigma_{\alpha_i} \end{bmatrix} \\ R &= \begin{bmatrix} 1 & \rho_{1,2} & \rho_{1,3}\\ \rho_{1,2} & 1 & \rho_{2,3} \\ \rho_{1,3} & \rho_{2,3} & 1 \end{bmatrix} \\ \large \text{Priors}\\ \text{Population Intercepts}\\ \mu_{L_{0}} &\sim N(0.15, 0.03) \\ \mu_{L_{inf}} &\sim N(1, 0.3) \\ \mu_{\alpha} &\sim N(0.03, 0.006) \\ \text{Among-individual variances}\\ \sigma_{L_{0_{i}}} &\sim Exp(34) \\ \sigma_{L_{inf_{i}}} &\sim Exp(5) \\ \sigma_{\alpha_i} &\sim Exp(170) \\ \sigma &\sim Exp(1) \\ \text{Among-individual covariance}\\ R &\sim LKJ(2) \end{aligned}\]$$
Session info
Code
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Paris
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] kableExtra_1.4.0 see_0.8.2 report_0.5.8
[4] parameters_0.21.5 performance_0.10.9 modelbased_0.8.7
[7] insight_0.19.8 effectsize_0.8.6 datawizard_0.9.1
[10] correlation_0.8.4 bayestestR_0.13.2 easystats_0.7.0
[13] viridis_0.6.5 viridisLite_0.4.2 marginaleffects_0.18.0
[16] brms_2.20.12 Rcpp_1.0.12 tidybayes_3.0.6
[19] patchwork_1.2.0 lubridate_1.9.3 forcats_1.0.0
[22] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[25] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[28] ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 tensorA_0.36.2.1 rstudioapi_0.15.0
[4] jsonlite_1.8.8 magrittr_2.0.3 farver_2.1.1
[7] rmarkdown_2.25 ragg_1.2.7 vctrs_0.6.5
[10] base64enc_0.1-3 htmltools_0.5.7 distributional_0.4.0
[13] curl_5.2.0 StanHeaders_2.32.5 htmlwidgets_1.6.4
[16] plyr_1.8.9 zoo_1.8-12 igraph_2.0.2
[19] mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
[22] colourpicker_1.3.0 Matrix_1.6-5 R6_2.5.1
[25] fastmap_1.1.1 shiny_1.8.0 digest_0.6.34
[28] GGally_2.2.1 colorspace_2.1-0 ps_1.7.6
[31] textshaping_0.3.7 crosstalk_1.2.1 labeling_0.4.3
[34] fansi_1.0.6 timechange_0.3.0 abind_1.4-5
[37] compiler_4.3.2 withr_3.0.0 backports_1.4.1
[40] inline_0.3.19 shinystan_2.6.0 ggstats_0.5.1
[43] QuickJSR_1.1.3 pkgbuild_1.4.3 MASS_7.3-60.0.1
[46] gtools_3.9.5 loo_2.7.0 tools_4.3.2
[49] httpuv_1.6.14 threejs_0.3.3 glue_1.7.0
[52] nlme_3.1-164 promises_1.2.1 grid_4.3.2
[55] cmdstanr_0.7.1 checkmate_2.3.1 reshape2_1.4.4
[58] generics_0.1.3 gtable_0.3.4 tzdb_0.4.0
[61] data.table_1.15.0 hms_1.1.3 xml2_1.3.6
[64] utf8_1.2.4 pillar_1.9.0 ggdist_3.3.1
[67] markdown_1.12 posterior_1.5.0 later_1.3.2
[70] lattice_0.22-5 tidyselect_1.2.0 miniUI_0.1.1.1
[73] knitr_1.45 arrayhelpers_1.1-0 gridExtra_2.3
[76] V8_4.4.2 svglite_2.1.3 stats4_4.3.2
[79] xfun_0.42 bridgesampling_1.1-2 matrixStats_1.2.0
[82] DT_0.32 rstan_2.32.5 stringi_1.8.3
[85] yaml_2.3.8 evaluate_0.23 codetools_0.2-19
[88] cli_3.6.2 RcppParallel_5.1.7 shinythemes_1.2.0
[91] xtable_1.8-4 systemfonts_1.0.5 munsell_0.5.0
[94] processx_3.8.3 coda_0.19-4.1 svUnit_1.0.6
[97] parallel_4.3.2 rstantools_2.4.0 ellipsis_0.3.2
[100] dygraphs_1.1.1.6 bayesplot_1.11.1 Brobdingnag_1.2-9
[103] mvtnorm_1.2-4 scales_1.3.0 xts_0.13.2
[106] rlang_1.1.3 shinyjs_2.1.0