There are many different ways to evaluate model performance, and sometimes it seems that there are even more ways to use a visual predictive check (VPC) for your population PK or PK/PD model.
This post will show how to create one of the most basic VPC’s available, the scatter VPC:
However, a scatter VPC is a very basic model evaluation tool. When there are highly influential covariates, many different dosing levels, or when there are irregular dosing intervals, a scatter VPC is not the correct VPC to use. A confidence interval VPC or a pcVPC would be better fit for purpose.
Recommended background information on VPC’s:
- https://www.page-meeting.org/pdf_assets/8694-Karlsson_Holford_VPC_Tutorial_hires.pdf
- http://holford.fmhs.auckland.ac.nz/docs/vpc-tutorial-and-datatop.pdf
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3085712/
- https://ascpt.onlinelibrary.wiley.com/doi/full/10.1002/psp4.12161
Model simulation
A key characteristic of a VPC is that it is based on a simulation. Therefore, lets assume we have already developed our PK model and we want to start the simulation.
Already here we have two flavors, we can use our dataset from model development to simulate the new observations or create a new dataset with observations at a time interval that was specified by the user (I choose for the latter here). This will make sure that the simulated model predictions are smooth over time and not only consist of a few sparse time points.
A basic script to create a NONMEM dataset for simulation can be found below:
################################################################################## ############################ CODING BLOCK ######################################## ################################################################################## # Run-Time Environment: R version 3.4.0 # Short title: Create dataset for PK simulations # # ################################################################################## ################################################################################## ################ # Observations dataframe # -------------- # Create a dataset for 1 ID ID <- "1" # Make a time vector from 0 to 24h by 0.1h intervals TIME<- seq(0.1, 24, by = 0.1) # Set observation interval # Combine them in a data frame DF <- data.frame(ID,TIME) # Add NM columns DF$AMT <- NA DF$EVID <- 0 DF$DV <- NA DF$CMT <- 2 # Observations in CMT 2 DF$MDV <- 0 ################ # Dosing dataframe # -------------- TIME <- 0 NONMEMdata <- data.frame(ID,TIME ) NONMEMdata$AMT <- 1000 # Put dosing here NONMEMdata$EVID <- 1 NONMEMdata$DV <- NA NONMEMdata$CMT <- 1 # Dose in CMT 1 NONMEMdata$MDV <- 1 ################ # Combine dataframes # -------------- NONMEMdataALL <- rbind(DF,NONMEMdata) #################### NONMEMdataALL <- NONMEMdataALL [order(NONMEMdataALL$ID,NONMEMdataALL$TIME, -NONMEMdataALL$AMT ),] ## Save as NM dataset (.csv) setwd("") write.table(NONMEMdataALL , file=paste("VPC.Dataset.",Sys.Date(),".csv",sep=""),append =F, sep=",", na=".",row.names = FALSE,col.names = TRUE, quote = FALSE)
Then, we can change the $ESTIMATION in our NONMEM control stream to $SIMULATION to simulate this dataset 500/1000 times.
$SIM (12345) ONLYSIMULATION SUBPROBLEMS=1000
Which will result in usually quite a large dataset that needs to be analysed.
Analysis in R
First we need to set the correct working directory and read in the simulated dataset using the xpose function read_nm_tables(). Furthermore, we load in the observations (the original dataset) which we want to visualise on top of the simulated distribution.
# Read some libraries library(reshape) library(ggplot2) library(xpose) library(dplyr) #### Read in simulations # Set working directory of SDTAB output setwd('') # Read NM output and only select CMT = 2 SimOutput<- read_nm_tables("SDTABSim1") SimOutput <- SimOutput[SimOutput$CMT == 2,] ######### Read observations dataset Obs <- read.csv("Observations.csv",na.strings = ".") Obs<- Obs[Obs$CMT == 2,] # Remove dosing lines from the dataset
Then, we have the SimOutput object in which we have 1000 simulations per timepoint which we need to analyse. We set the prediction interval for which we can calculate the upper and lower percentiles:
######################################### # Set the percentile intervals PI <- 80 #specify prediction interval in percentage. Common is 80 perc_PI <- c(0+(1-PI/100)/2, 1-(1-PI/100)/2)
This can then be implemented in a piping function of the dplyr package:
https://www.datacamp.com/community/tutorials/pipe-r-tutorial
In which we want to group the simulated dataset per timepoint and then calculate the median, the lower percentile and the upper percentile. This will all be included in the newly created object sim_vpc.
We can also choose to use the simulated DV or IPRED values, with or without residual variability included.
## Using DV = with residual error. Using IPRED = no residual error sim_vpc <- SimOutput %>% group_by(TIME) %>% ## Set the stratification identifiers (e.g. TIME, DOSE, CMT, etc.) summarize(C_median = median(IPRED, na.rm = T), C_lower = quantile(IPRED, perc_PI[1], na.rm = T), C_upper = quantile(IPRED, perc_PI[2], na.rm = T))
Figure
When we have the sim_vpc object and the original observations, we have all the information we need. The median simulated value, the simulated prediction interval and the observations.
In ggplot2, we first add the median line and a 80% ribbon (this is of course dependent on the PI you used above). Then, we overlay the data points from our observations as a geom_point(). After adding some ggplot2 theme settings, axis, titles, and saving it as a .tiff, we are done!
Plot_simu <-ggplot()+ ###### Simulation interval with ribbon # Add median line geom_line(data = sim_vpc,aes(x=TIME,y=C_median))+ # Add 80% ribbon geom_ribbon(data = sim_vpc, aes(ymin=C_lower, ymax=C_upper, x=TIME, fill = "band"), alpha = 0.3)+ ###### Add observations from dataset geom_point(data = Obs,aes(x=TIME,y=DV))+ ####### Set ggplot2 theme and settings theme_bw()+ theme(axis.title=element_text(size=9.0))+ theme(strip.background = element_blank(), strip.text.x = element_blank(),legend.position="none") + # Set axis labels labs(x="Time (h)",y="Concentration (mg/L)")+ # Add vertical lines to indicate dosing at time 0 geom_vline(xintercept = 0, linetype="dashed", size=1) + # Set axis scale_y_continuous(expand=c(0,0))+ scale_x_continuous(expand=c(0,0))+ ## Add title and subtitle ggtitle("Scatter VPC","") ##################################### ## Save as tiff ggsave("Figure_Simulation_Scatter_VPC_with_Observations.tiff",plot=Plot_simu, width = 14, height = 8, compression = "lzw")
Additional remarks
The scatter VPC can be improved by calculating the distribution percentiles of the data. However, this is only possible when data is always sampled at the same timepoints (clinical trial setup). When this is not the case, binning should be used which will be discussed in a later post.
Furthermore, NONMEM has some problems when simulating a large number of observations. Therefore you may have to add the following block to your model control stream:
$SIZES NO=1500 ; Change to number of observations LIM6=100
The used model and dataset for this simulation can be downloaded below:
Comment
Any suggestions or typo’s? Leave a comment or contact me at info@pmxsolutions.com!