A step-by-step guide to percentile visual predictive checks (VPC) of NONMEM models

This blog is an extension on the previously discussed scatter VPC.

A percentile visual predictive check (VPC) can be used to compare the distribution of the observations with the simulated distributions of your ‘final’ model. In this example, data was generated in which opportunistic sampling around a time point (e.g. 2, 4, 6, 8, 12, 16h) was performed. This results in a scatter of observations around each time point.

Due to this sampling strategy, we need to stratify the observations in bins.

Analysis outline

When we use the PsN function vpc, our dataset used for modelling will be used as the basis for the simulation. This will result in every observation of the dataset that will be simulated 500/1000 times, depending on the chosen number of samples. The no_of_bins and bin_by_count will not be used in this tutorial since we will create our own binning function. In the end, we want to generate the 80% confidence/prediction interval of the observation and the simulated data.

PsN function to create the simulated dataset from Model1:

vpc -samples=500 -no_of_bins=8 -bin_by_count=1 -dir=vpc_Model1 Model1.mod

Getting started

In R, we first want to define the working directory of our ‘final’ model and the prediction interval we want to use at a later stage:

##################################################################################
############################ CODING BLOCK ########################################
##################################################################################
# Run-Time Environment: R version 3.4.0
# Short title: Percentile VPC using ggplot2
#
#
##################################################################################
##################################################################################

################
# Clean workspace
rm(list=ls(all=TRUE))

## Load some libraries
library(reshape)
library(ggplot2)
library(xpose4)
library(dplyr)

#########################################
# Set working directory of model file
setwd("J:\\Model")

#########################################
# Set the percentile intervals

PI <- 80 # Specify prediction interval in percentage. Common is 80% interval (10-90%)

# Make a vector for upper and lower percentile to be used in the piping function
perc_PI <- c(0+(1-PI/100)/2, 1-(1-PI/100)/2)

Binning in R

Then, we need to define what kind of bins we want to use (how we specify which observations should be grouped together). We can, for example, divide the bins on different time intervals or by an equal number of observations. In this example we take the easy route, by manually defining the boundaries of the bins around each sampling time point.

#########################################
# Specify the bin times manually

bin_times <- c(0,3,5,7,10,14,18)

Number_of_bins <- length(bin_times)-1

This results in bins between 0-3h, 3-5h, 5-7h, etc. and creates an object which includes the number of bins.

Dataset transformations

When you used the PsN function for the dataset generation, a new folder including a file ending with ‘*1.npctab.dta’ will be created. This is the simulated dataset that we will use, e.g. 500/1000x larger than the original dataset. In the next step we want to load the original dataset and the simulated dataset in our R workspace and do some data cleaning (e.g. removing the dosing rows).

#########################################
######### Read dataset with original observations
Obs <- read.csv("Original_dataset.csv",na.strings = ".")

Obs<- Obs[Obs$CMT == 2,] # Remove dosing lines from the dataset

#########################################
######### Load in simulated data

## Search for the generated file by PsN in the sub-folders

files <- list.files(pattern = "1.npctab.dta", recursive = TRUE, include.dirs = TRUE)


# This reads the VPC npctab.dta file and save it in dataframe_simulations
dataframe_simulations <- read.nm.tables(paste('.\\',files,sep=""))
dataframe_simulations <- dataframe_simulations[dataframe_simulations$MDV == 0,]

After all the datasets are correctly loaded in R, we can start the calculations. We can add a new column BIN to the datasets by reversing a for-loop, dependent on the time. This will start with settings BIN = 6, then 5, 4, etc. It iterates towards the first bin untill all observations have been specified to a BIN.

#########################################
######### Create bins in simulated dataset

### Set bins by a backward for-loop
for(i in Number_of_bins:1){
dataframe_simulations$BIN[dataframe_simulations$TIME <= bin_times[i+1]] <-i
}

#########################################
######### Create bins in original dataset

### Set bins by a backward for loop
for(i in Number_of_bins:1){
Obs$BIN[Obs$TIME <= bin_times[i+1]] <-i
}

We also want to know the median time of each bin, which we can calculate using the following function:

## Get median time of each bin of observations for plotting purposes
bin_middle <- Obs %>%
group_by(BIN) %>%
summarize(bin_middle = median(TIME, na.rm = T))

# As a vector
bin_middle <- bin_middle$bin_middle

Calculate the distributions

We can use the stratification column BIN to calculate our summary statistics. This will be done using a piping function for both datasets. Furthermore, the median time points will be added to each row to be eventually used in the plot.

#########################################
######### Calculate percentiles of simulated data

sim_vpc <- dataframe_simulations %>%
group_by(BIN) %>%
summarize(C_median = median(DV, na.rm = T),
C_lower = quantile(DV, perc_PI[1], na.rm = T),
C_upper = quantile(DV, perc_PI[2], na.rm = T))

sim_vpc$TIME <- bin_middle

#########################################
######### Calculate percentiles of observations

obs_vpc <- Obs %>%
group_by(BIN) %>%
summarize(C_median = median(DV, na.rm = T),
C_lower = quantile(DV, perc_PI[1], na.rm = T),
C_upper = quantile(DV, perc_PI[2], na.rm = T))

obs_vpc$TIME <- bin_middle

Graphics

After these calculations, we have all the data that we need! We can specify the lines, points, and theme options that we want in our ggplot2. In this scenario, the red lines are the simulated distributions whereas the black lines are the observations. In the end, we save the image as a .png file with high resolution.

###########################################################################################
########################### Graphics ######################################################
# ----------------------------------------------------------------------------------------#
###########################################################################################

Plot_simu <-ggplot()+

###### Simulation percentiles
geom_line(data = sim_vpc,aes(x=TIME,y=C_median),color='red')+
geom_line(data = sim_vpc,aes(x=TIME,y=C_lower),color='red',lty='dashed')+
geom_line(data = sim_vpc,aes(x=TIME,y=C_upper),color='red',lty='dashed')+

###### Add observation percentiles
geom_line(data = obs_vpc,aes(x=TIME,y=C_median),color='black')+
geom_line(data = obs_vpc,aes(x=TIME,y=C_lower),color='black',lty='dashed')+
geom_line(data = obs_vpc,aes(x=TIME,y=C_upper),color='black',lty='dashed')+

###### Add observations
geom_point(data = Obs,aes(x=TIME,y=DV),color='black')+

####### Set ggplot2 theme and settings
theme_bw()+
theme(axis.title=element_text(size=9.0))+
theme(strip.background = element_blank(),
strip.text.x = element_blank(),legend.position="none") +

# Set axis labels
labs(x="Time (h)",y="Concentration (mg/L)")+

# Add vertical lines to indicate dosing
geom_vline(xintercept = 0, linetype="dashed", size=1) +

# Set axis
scale_y_continuous(expand=c(0.01,0))+
scale_x_continuous(expand=c(0.01,0))+

## Add title and subtitle
ggtitle("Percentile VPC","")

#####################################
## Save as png
png("Figure_Simulation_Percentile_VPC_with_Observations.png",width = 10, height = 5, units='in',res=600)

print(Plot_simu)

dev.off()

This analysis will result in the following percentile VPC for our model:

Percentile visual predictive check. Red lines indicate simulated observations. Black lines indicate observations. Dashed lines indicate 80% interval of data (black) and 80% prediction interval (red) for simulations.
COMMENT

Any suggestions or typo’s? Leave a comment or contact me at info@pmxsolutions.com!

Leave a Reply

Your email address will not be published. Required fields are marked *