A step-by-step guide to scatter visual predictive checks (VPC) of NONMEM models

There are many different ways to evaluate model performance, and sometimes it seems that there are even more ways to use a visual predictive check (VPC) for your population PK or PK/PD model.

This post will show how to create one of the most basic VPC’s available, the scatter VPC:

However, a scatter VPC is a very basic model evaluation tool. When there are highly influential covariates, many different dosing levels, or when there are irregular dosing intervals, a scatter VPC is not the correct VPC to use. A confidence interval VPC or a pcVPC would be better fit for purpose.

Recommended background information on VPC’s:

Model simulation

A key characteristic of a VPC is that it is based on a simulation. Therefore, lets assume we have already developed our PK model and we want to start the simulation.

Already here we have two flavors, we can use our dataset from model development to simulate the new observations or create a new dataset with observations at a time interval that was specified by the user (I choose for the latter here). This will make sure that the simulated model predictions are smooth over time and not only consist of a few sparse time points.

A basic script to create a NONMEM dataset for simulation can be found below:

##################################################################################
############################ CODING BLOCK ########################################
##################################################################################
# Run-Time Environment: R version 3.4.0
# Short title: Create dataset for PK simulations
#
#
##################################################################################
##################################################################################

################
# Observations dataframe
# --------------

# Create a dataset for 1 ID
ID <- "1"

# Make a time vector from 0 to 24h by 0.1h intervals
TIME<- seq(0.1, 24, by = 0.1) # Set observation interval

# Combine them in a data frame
DF <- data.frame(ID,TIME)

# Add NM columns
DF$AMT <- NA
DF$EVID <- 0
DF$DV <- NA
DF$CMT <- 2 # Observations in CMT 2
DF$MDV <- 0

################
# Dosing dataframe
# --------------

TIME <- 0
NONMEMdata <- data.frame(ID,TIME )
NONMEMdata$AMT <- 1000 # Put dosing here
NONMEMdata$EVID <- 1
NONMEMdata$DV <- NA
NONMEMdata$CMT <- 1 # Dose in CMT 1
NONMEMdata$MDV <- 1

################
# Combine dataframes
# --------------

NONMEMdataALL <- rbind(DF,NONMEMdata)

####################

NONMEMdataALL <- NONMEMdataALL [order(NONMEMdataALL$ID,NONMEMdataALL$TIME, -NONMEMdataALL$AMT ),]

## Save as NM dataset (.csv)

setwd("")

write.table(NONMEMdataALL , file=paste("VPC.Dataset.",Sys.Date(),".csv",sep=""),append =F, sep=",", na=".",row.names = FALSE,col.names = TRUE, quote = FALSE)

Then, we can change the $ESTIMATION in our NONMEM control stream to $SIMULATION to simulate this dataset 500/1000 times.

$SIM (12345) ONLYSIMULATION SUBPROBLEMS=1000

Which will result in usually quite a large dataset that needs to be analysed.

Analysis in R

First we need to set the correct working directory and read in the simulated dataset using the xpose function read_nm_tables(). Furthermore, we load in the observations (the original dataset) which we want to visualise on top of the simulated distribution.

# Read some libraries
library(reshape)
library(ggplot2)
library(xpose)
library(dplyr)

#### Read in simulations
# Set working directory of SDTAB output
setwd('')
# Read NM output and only select CMT = 2
SimOutput<- read_nm_tables("SDTABSim1")
SimOutput <- SimOutput[SimOutput$CMT == 2,]

######### Read observations dataset
Obs <- read.csv("Observations.csv",na.strings = ".")
Obs<- Obs[Obs$CMT == 2,] # Remove dosing lines from the dataset

Then, we have the SimOutput object in which we have 1000 simulations per timepoint which we need to analyse. We set the prediction interval for which we can calculate the upper and lower percentiles:

#########################################
# Set the percentile intervals

PI <- 80 #specify prediction interval in percentage. Common is 80

perc_PI <- c(0+(1-PI/100)/2, 1-(1-PI/100)/2)

This can then be implemented in a piping function of the dplyr package:
https://www.datacamp.com/community/tutorials/pipe-r-tutorial

In which we want to group the simulated dataset per timepoint and then calculate the median, the lower percentile and the upper percentile. This will all be included in the newly created object sim_vpc.

We can also choose to use the simulated DV or IPRED values, with or without residual variability included.

## Using DV = with residual error. Using IPRED = no residual error
sim_vpc <- SimOutput %>%
group_by(TIME) %>% ## Set the stratification identifiers (e.g. TIME, DOSE, CMT, etc.)
summarize(C_median = median(IPRED, na.rm = T),
C_lower = quantile(IPRED, perc_PI[1], na.rm = T),
C_upper = quantile(IPRED, perc_PI[2], na.rm = T))

Figure

When we have the sim_vpc object and the original observations, we have all the information we need. The median simulated value, the simulated prediction interval and the observations.

In ggplot2, we first add the median line and a 80% ribbon (this is of course dependent on the PI you used above). Then, we overlay the data points from our observations as a geom_point(). After adding some ggplot2 theme settings, axis, titles, and saving it as a .tiff, we are done!

Plot_simu <-ggplot()+

###### Simulation interval with ribbon
# Add median line
geom_line(data = sim_vpc,aes(x=TIME,y=C_median))+
# Add 80% ribbon
geom_ribbon(data = sim_vpc, aes(ymin=C_lower, ymax=C_upper, x=TIME, fill = "band"), alpha = 0.3)+

###### Add observations from dataset
geom_point(data = Obs,aes(x=TIME,y=DV))+

####### Set ggplot2 theme and settings
theme_bw()+
theme(axis.title=element_text(size=9.0))+
theme(strip.background = element_blank(),
strip.text.x = element_blank(),legend.position="none") +

# Set axis labels
labs(x="Time (h)",y="Concentration (mg/L)")+

# Add vertical lines to indicate dosing at time 0
geom_vline(xintercept = 0, linetype="dashed", size=1) +

# Set axis
scale_y_continuous(expand=c(0,0))+
scale_x_continuous(expand=c(0,0))+

## Add title and subtitle
ggtitle("Scatter VPC","")

#####################################
## Save as tiff
ggsave("Figure_Simulation_Scatter_VPC_with_Observations.tiff",plot=Plot_simu, width = 14, height = 8, compression = "lzw")


Additional remarks

The scatter VPC can be improved by calculating the distribution percentiles of the data. However, this is only possible when data is always sampled at the same timepoints (clinical trial setup). When this is not the case, binning should be used which will be discussed in a later post.

Furthermore, NONMEM has some problems when simulating a large number of observations. Therefore you may have to add the following block to your model control stream:

$SIZES
NO=1500 ; Change to number of observations
LIM6=100

The used model and dataset for this simulation can be downloaded below:

Simulation_Model

VPC.Dataset.2018-06-24

 Comment

Any suggestions or typo’s? Leave a comment or contact me at info@pmxsolutions.com!


Continue reading:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.