# A step-by-step guide to confidence interval visual predictive checks (VPC) of NONMEM models

Last time, we focused on the percentile VPC, which was already quite an improvement to the scatter VPC but one may wonder when looking at a percentile VPC, how certain are we about the simulated median and the 10-90% distributions?

In reality, there is a confidence interval around these percentile distributions (e.g. how sure are we about the median) and when is a deviation from the observations acceptable? Since every time we run the Monte Carlo simulation, the samples taken will be different. In order to visualize this, a confidence interval VPC can be generated. Luckily, the majority of the steps will be equal to the ones that we have already performed for the percentile VPC.

Our confidence VPC will be developed on the same data as we did for the percentile VPC. For this example, data was generated in which opportunistic sampling around a time point (e.g. 2, 4, 6, 8, 12, 16h) was performed. This results in a scatter of observations around each time point and we will again use binning during this analysis.

### Analysis outline

When we use the PsN function vpc, our dataset used for modelling will be used as the basis for the simulation. This will result in every observation of the dataset being simulated 500/1000 times, depending on the chosen number of samples. The no_of_bins and bin_by_count will not be used in this tutorial since we will create our own binning function. In the end, we want to generate the 80% confidence/prediction interval of the simulated data with the 95% confidence interval around these simulated percentiles.

This is the PsN function to create the simulated dataset from Model1.mod:

`vpc -samples=500 -no_of_bins=8 -bin_by_count=1 -dir=vpc_Model1 Model1.mod`

### Getting started

In R, we first want to define the working directory of our ‘final’ model and the prediction and confidence interval we want to use at a later stage:

```##################################################################################
############################ CODING BLOCK ########################################
##################################################################################
# Run-Time Environment: R version 3.4.0
# Short title: Confidence VPC using ggplot2
#
#
##################################################################################
##################################################################################

################
# Clean workspace
rm(list=ls(all=TRUE))

library(reshape)
library(ggplot2)
library(xpose)
library(dplyr)

#########################################
# Set working directory of model file
setwd("...")

Number_of_simulations_performed <- 500 ## Used as 'samples' argument in PsN vpc function. Needs to be a round number.

#########################################
# Set the percentile intervals

PI <- 80 #specify prediction interval in percentage. Common is 80% interval (10-90%)
CI <- 95 #specify confidence interval in percentage. Common is 95%

# Make a vector for the upper and lower percentile
perc_PI <- c(0+(1-PI/100)/2, 1-(1-PI/100)/2)
perc_CI <- c(0+(1-CI/100)/2, 1-(1-CI/100)/2) ```

### Binning in R

Then, we need to define what kind of bins we want to use (how we specify which observations should be grouped together). We can, for example, divide the bins on different time intervals or by an equal number of observations. In this example we take the easy route, by manually defining the boundaries of the bins around each sampling time point.

```#########################################
# Specify the bin times manually
bin_times <- c(0,3,5,7,10,14,18)
Number_of_bins <- length(bin_times)-1```

This results in bins between 0-3h, 3-5h, 5-7h, etc. and creates an object which includes the number of bins.

### Distribution of observations

Since nothing has changed compared to the percentile VPC, the following code will be used to calculate the distribution of the observations. We again use the piping function from dplyr to calculate the statistics per bin:

```############################################################
######### Read dataset with original observations

Obs<- Obs[Obs\$CMT == 2,] # Remove dosing lines from the dataset

#########################################
######### Create bins in original dataset

### Set bins by backward for loop
for(i in Number_of_bins:1){
Obs\$BIN[Obs\$TIME <= bin_times[i+1]] <-i
}

############################################
## Calculate confidence intervals of the observations

obs_vpc <- Obs %>%
group_by(BIN) %>% ## Set the stratification identifiers (e.g. TIME, DOSE, CMT, BIN)
summarize(C_median = median(DV, na.rm = T), C_lower = quantile(DV, perc_PI, na.rm = T), C_upper = quantile(DV, perc_PI, na.rm = T))

## Get median time of each bin of observations for plotting purposes
bin_middle <- Obs %>%
group_by(BIN) %>%
summarize(bin_middle = median(TIME, na.rm = T))

# As a vector
bin_middle <- bin_middle\$bin_middle

obs_vpc\$TIME <- bin_middle ```

### Distribution of Simulated replicates

In order to create the confidence interval around our percentiles, we first need to calculate the percentiles for each iteration separately. We therefore load in the data, and add a new column with the value of the replicate.

```#########################################

## Search for the generated file by PsN
files <- list.files(pattern = "1.npctab.dta", recursive = TRUE, include.dirs = TRUE)

# This reads the VPC npctab.dta file and save it in dataframe_simulations
dataframe_simulations <- dataframe_simulations[dataframe_simulations\$MDV == 0,]

## Set the replicate number to the simulated dataset as a new column
dataframe_simulations\$replicate <- rep(1:Number_of_simulations_performed,each=nrow(dataframe_simulations)/Number_of_simulations_performed)

# Set vector with unique replicates
Replicate_vector <- unique(dataframe_simulations\$replicate)

```

We can then again specify which observations are in which bin, using the user specified bin limits.

```#########################################
######### Create bins in simulated dataset

Set bins by backward for-loop
for(i in Number_of_bins:1){
dataframe_simulations\$BIN[dataframe_simulations\$TIME <= bin_times[i+1]] <-i
} ```

Then, we loop trough all the replicates and calculate the distributions of the predictions which will be saved in the sim_PI dataframe. This dataframe now consists of the summary statistics per replicate (e.g. 500 different median values).

```sim_PI <- NULL

## Calculate predictions intervals per bin
for(i in Replicate_vector){

# Run this for each replicate
sim_vpc_ci <- dataframe_simulations %>%

filter(replicate %in% i) %>% # Select an individual replicate

group_by(BIN) %>% # Calculate everything per bin

summarize(C_median = median(DV),
C_lower = quantile(DV, perc_PI),
C_upper = quantile(DV, perc_PI)) %>% # Calculate prediction intervals

mutate(replicate = i) # Include replicate number

sim_PI <- rbind(sim_PI, sim_vpc_ci)
} ```

This dataframe can then be used to calculate the 95% confidence interval of each percentile (10%, 50% and 90%) between all the replicates. We also add the time of the boundaries of each bin to be able to create the rectangles in ggplot later on.

```###########################
# Calculate confidence intervals around these prediction intervals calculated with each replicate

sim_CI <- sim_PI %>%
group_by(BIN) %>%
summarize(C_median_CI_lwr = quantile(C_median, perc_CI), C_median_CI_upr = quantile(C_median, perc_CI), # Median
C_low_lwr = quantile(C_lower, perc_CI), C_low_upr = quantile(C_lower, perc_CI), # Lower percentages
C_up_lwr = quantile(C_upper, perc_CI), C_up_upr = quantile(C_upper, perc_CI) # High percentages
)

### Set bin boundaries in dataset
sim_CI\$x1 <- NA
sim_CI\$x2 <- NA

for(i in 1:Number_of_bins){
sim_CI\$x1[sim_CI\$BIN == i] <-bin_times[i]
sim_CI\$x2[sim_CI\$BIN == i] <-bin_times[i+1]
}  ```

### Graphics

Instead of depicting the simulated intervals with lines, we use the geom_rect function of ggplot to plot rectangles in our figure. This rectangle is the 95% confidence interval around the corresponding value.

For the observations, we add the distributions with geom_line (black), and add all observations as separate points. And we of course want to save it as a high resolution .png image.

```#################################################
############### Plotting #########################

### VPC on a linear scale
CI_VPC_lin <- ggplot() +

## Set the rectangles from the simulations
geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_median_CI_lwr, ymax=C_median_CI_upr), fill='red', alpha=0.25) + # Median
geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_low_lwr, ymax=C_low_upr), fill='blue', alpha=0.25) + # 10% percentile
geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_up_lwr, ymax=C_up_upr), fill='blue', alpha=0.25) + # 90% percentile

# Lines of the observations
geom_line(data = obs_vpc, aes(TIME, C_median), col = 'black', linetype = 'dashed', size = 1.25) +
geom_line(data = obs_vpc, aes(TIME, C_lower), col = 'black', linetype = 'dashed', size = 1.25) +
geom_line(data = obs_vpc, aes(TIME, C_upper), col = 'black', linetype = 'dashed', size = 1.25) +

geom_point(data = Obs,aes(x=TIME,y=DV),color='black')+

####### Set ggplot2 theme and settings
theme_bw()+
theme(axis.title=element_text(size=9.0))+
theme(strip.background = element_blank(),
strip.text.x = element_blank(),legend.position="none") +

# Set axis labels
labs(x="Time (h)",y="Concentration (mg/L)")+

# Add vertical lines to indicate dosing
geom_vline(xintercept = 0, linetype="dashed", size=1) +

# Set axis
scale_y_continuous(expand=c(0.01,0))+
scale_x_continuous(expand=c(0.01,0))+

ggtitle("Percentile VPC","")

#####################################
## Save as png
png("Figure_Simulation_Confidence_VPC_with_Observations.png",width = 10, height = 5, units='in',res=600)

print(CI_VPC_lin)

dev.off() ```

This analysis will result in the following confidence interval VPC for our developed model. We can clearly see that the lines of the observations are well within the simulated distributions, indicating adequate model performance. Black dots = observations, black dashed lines = 80%-interval and median of the observations, red shaded area = 95%-confidence interval (CI) of the median prediction, blue shaded area = 95%-CI of the 10 and 90th prediction interval.

### 2 thoughts on “A step-by-step guide to confidence interval visual predictive checks (VPC) of NONMEM models”

1. 2. 