Last time, we focused on the percentile VPC, which was already quite an improvement to the scatter VPC but one may wonder when looking at a percentile VPC, how certain are we about the simulated median and the 10-90% distributions?
In reality, there is a confidence interval around these percentile distributions (e.g. how sure are we about the median) and when is a deviation from the observations acceptable? Since every time we run the Monte Carlo simulation, the samples taken will be different. In order to visualize this, a confidence interval VPC can be generated. Luckily, the majority of the steps will be equal to the ones that we have already performed for the percentile VPC.
Our confidence VPC will be developed on the same data as we did for the percentile VPC. For this example, data was generated in which opportunistic sampling around a time point (e.g. 2, 4, 6, 8, 12, 16h) was performed. This results in a scatter of observations around each time point and we will again use binning during this analysis.
Analysis outline
When we use the PsN function vpc, our dataset used for modelling will be used as the basis for the simulation. This will result in every observation of the dataset being simulated 500/1000 times, depending on the chosen number of samples. The no_of_bins and bin_by_count will not be used in this tutorial since we will create our own binning function. In the end, we want to generate the 80% confidence/prediction interval of the simulated data with the 95% confidence interval around these simulated percentiles.
This is the PsN function to create the simulated dataset from Model1.mod:
vpc -samples=500 -no_of_bins=8 -bin_by_count=1 -dir=vpc_Model1 Model1.mod
Getting started
In R, we first want to define the working directory of our ‘final’ model and the prediction and confidence interval we want to use at a later stage:
################################################################################## ############################ CODING BLOCK ######################################## ################################################################################## # Run-Time Environment: R version 3.4.0 # Short title: Confidence VPC using ggplot2 # # ################################################################################## ################################################################################## ################ # Clean workspace rm(list=ls(all=TRUE)) ## Load libraries library(reshape) library(ggplot2) library(xpose) library(dplyr) ######################################### # Set working directory of model file setwd("...") Number_of_simulations_performed <- 500 ## Used as 'samples' argument in PsN vpc function. Needs to be a round number. ######################################### # Set the percentile intervals PI <- 80 #specify prediction interval in percentage. Common is 80% interval (10-90%) CI <- 95 #specify confidence interval in percentage. Common is 95% # Make a vector for the upper and lower percentile perc_PI <- c(0+(1-PI/100)/2, 1-(1-PI/100)/2) perc_CI <- c(0+(1-CI/100)/2, 1-(1-CI/100)/2)
Binning in R
Then, we need to define what kind of bins we want to use (how we specify which observations should be grouped together). We can, for example, divide the bins on different time intervals or by an equal number of observations. In this example we take the easy route, by manually defining the boundaries of the bins around each sampling time point.
######################################### # Specify the bin times manually bin_times <- c(0,3,5,7,10,14,18) Number_of_bins <- length(bin_times)-1
This results in bins between 0-3h, 3-5h, 5-7h, etc. and creates an object which includes the number of bins.
Distribution of observations
Since nothing has changed compared to the percentile VPC, the following code will be used to calculate the distribution of the observations. We again use the piping function from dplyr to calculate the statistics per bin:
############################################################ ######### Read dataset with original observations Obs <- read.csv("Original_dataset.csv",na.strings = ".") Obs<- Obs[Obs$CMT == 2,] # Remove dosing lines from the dataset ######################################### ######### Create bins in original dataset ### Set bins by backward for loop for(i in Number_of_bins:1){ Obs$BIN[Obs$TIME <= bin_times[i+1]] <-i } ############################################ ## Calculate confidence intervals of the observations obs_vpc <- Obs %>% group_by(BIN) %>% ## Set the stratification identifiers (e.g. TIME, DOSE, CMT, BIN) summarize(C_median = median(DV, na.rm = T), C_lower = quantile(DV, perc_PI[1], na.rm = T), C_upper = quantile(DV, perc_PI[2], na.rm = T)) ## Get median time of each bin of observations for plotting purposes bin_middle <- Obs %>% group_by(BIN) %>% summarize(bin_middle = median(TIME, na.rm = T)) # As a vector bin_middle <- bin_middle$bin_middle obs_vpc$TIME <- bin_middle
Distribution of Simulated replicates
In order to create the confidence interval around our percentiles, we first need to calculate the percentiles for each iteration separately. We therefore load in the data, and add a new column with the value of the replicate.
######################################### ######### Load in simulated data ## Search for the generated file by PsN files <- list.files(pattern = "1.npctab.dta", recursive = TRUE, include.dirs = TRUE) # This reads the VPC npctab.dta file and save it in dataframe_simulations dataframe_simulations <- read_nm_tables(paste('.\\',files,sep="")) dataframe_simulations <- dataframe_simulations[dataframe_simulations$MDV == 0,] ## Set the replicate number to the simulated dataset as a new column dataframe_simulations$replicate <- rep(1:Number_of_simulations_performed,each=nrow(dataframe_simulations)/Number_of_simulations_performed) # Set vector with unique replicates Replicate_vector <- unique(dataframe_simulations$replicate)
We can then again specify which observations are in which bin, using the user specified bin limits.
######################################### ######### Create bins in simulated dataset Set bins by backward for-loop for(i in Number_of_bins:1){ dataframe_simulations$BIN[dataframe_simulations$TIME <= bin_times[i+1]] <-i }
Then, we loop trough all the replicates and calculate the distributions of the predictions which will be saved in the sim_PI dataframe. This dataframe now consists of the summary statistics per replicate (e.g. 500 different median values).
sim_PI <- NULL ## Calculate predictions intervals per bin for(i in Replicate_vector){ # Run this for each replicate sim_vpc_ci <- dataframe_simulations %>% filter(replicate %in% i) %>% # Select an individual replicate group_by(BIN) %>% # Calculate everything per bin summarize(C_median = median(DV), C_lower = quantile(DV, perc_PI[1]), C_upper = quantile(DV, perc_PI[2])) %>% # Calculate prediction intervals mutate(replicate = i) # Include replicate number sim_PI <- rbind(sim_PI, sim_vpc_ci) }
This dataframe can then be used to calculate the 95% confidence interval of each percentile (10%, 50% and 90%) between all the replicates. We also add the time of the boundaries of each bin to be able to create the rectangles in ggplot later on.
########################### # Calculate confidence intervals around these prediction intervals calculated with each replicate sim_CI <- sim_PI %>% group_by(BIN) %>% summarize(C_median_CI_lwr = quantile(C_median, perc_CI[1]), C_median_CI_upr = quantile(C_median, perc_CI[2]), # Median C_low_lwr = quantile(C_lower, perc_CI[1]), C_low_upr = quantile(C_lower, perc_CI[2]), # Lower percentages C_up_lwr = quantile(C_upper, perc_CI[1]), C_up_upr = quantile(C_upper, perc_CI[2]) # High percentages ) ### Set bin boundaries in dataset sim_CI$x1 <- NA sim_CI$x2 <- NA for(i in 1:Number_of_bins){ sim_CI$x1[sim_CI$BIN == i] <-bin_times[i] sim_CI$x2[sim_CI$BIN == i] <-bin_times[i+1] }
Graphics
Instead of depicting the simulated intervals with lines, we use the geom_rect function of ggplot to plot rectangles in our figure. This rectangle is the 95% confidence interval around the corresponding value.
For the observations, we add the distributions with geom_line (black), and add all observations as separate points. And we of course want to save it as a high resolution .png image.
################################################# ############### Plotting ######################### ### VPC on a linear scale CI_VPC_lin <- ggplot() + ## Set the rectangles from the simulations geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_median_CI_lwr, ymax=C_median_CI_upr), fill='red', alpha=0.25) + # Median geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_low_lwr, ymax=C_low_upr), fill='blue', alpha=0.25) + # 10% percentile geom_rect(data=sim_CI, mapping=aes(xmin=x1, xmax=x2, ymin=C_up_lwr, ymax=C_up_upr), fill='blue', alpha=0.25) + # 90% percentile # Lines of the observations geom_line(data = obs_vpc, aes(TIME, C_median), col = 'black', linetype = 'dashed', size = 1.25) + geom_line(data = obs_vpc, aes(TIME, C_lower), col = 'black', linetype = 'dashed', size = 1.25) + geom_line(data = obs_vpc, aes(TIME, C_upper), col = 'black', linetype = 'dashed', size = 1.25) + ###### Add observations geom_point(data = Obs,aes(x=TIME,y=DV),color='black')+ ####### Set ggplot2 theme and settings theme_bw()+ theme(axis.title=element_text(size=9.0))+ theme(strip.background = element_blank(), strip.text.x = element_blank(),legend.position="none") + # Set axis labels labs(x="Time (h)",y="Concentration (mg/L)")+ # Add vertical lines to indicate dosing geom_vline(xintercept = 0, linetype="dashed", size=1) + # Set axis scale_y_continuous(expand=c(0.01,0))+ scale_x_continuous(expand=c(0.01,0))+ ## Add title and subtitle ggtitle("Percentile VPC","") ##################################### ## Save as png png("Figure_Simulation_Confidence_VPC_with_Observations.png",width = 10, height = 5, units='in',res=600) print(CI_VPC_lin) dev.off()
This analysis will result in the following confidence interval VPC for our developed model. We can clearly see that the lines of the observations are well within the simulated distributions, indicating adequate model performance.

COMMENT
Any suggestions or typo’s? Leave a comment or contact me at info@pmxsolutions.com!
This is incredibly helpful. Thank you very much for the step-by-step for plotting VPC without reliance on xpose or psn.
Thank you so much for sharing! It is very helpful!