# Validate your model with NPDE analysis

As part of the series of tutorials on model validation, I will get you started on doing your own Normalized Prediction Distribution Errors (NPDE) analysis.  As the VPC and bootstrap, the NPDE is also a simulation-based evaluation tool. NPDEs are useful to investigate the accuracy of the model predictions, and I find it particularly useful when working with models based on sparse data (e.g., peak and trough concentrations), when I focus more on comparing the observations with their own simulated distributions than on capturing the concentration-time profile.

I illustrated the concept behind creating an NPDE in the figure below.

#### Brief introduction to theory behind NPDEs

NPDEs, just like VPCs, are based on Monte Carlo simulations. For each observation you will obtain K simulated data points. For each observation (yobs), the K simulated concentrations (ysim) follow a normal distribution which is called the posterior predictive distribution for the observation. From this we derive the cumulative distribution function used further to calculate the prediction discrepancy as the percentile of an observation in the predictive distribution. When the number of simulations is large enough, the prediction discrepancies follow an uniform distribution (under the null hypothesis, H0, that this model can be used to describe the data correctly). Usually, more than one sample/observation is available per subject, hence, the observations are correlated within each subject. Therefore, a decorrelation* step is performed for both the observed and simulated data and the prediction discrepancies are computed again and reported as the prediction distribution errors which also follow a uniform distribution.
NPDEs are computed by using the inverse of a normal cumulative density function with the prediction distribution errors. NPDEs should follow a normal distribution with mean 0 and variance of 1.

*Decorrelation between multiple observations coming from the same subject can be obtained using several methods which basically rely on computing the mean and the variance of your K simulations. For more information visit the following link.

#### How to get the NPDEs of your final model using Pirana & R?

The way I learned to perform an NPDE analysis could be considered “old-school” but it worked through different versions of NONMEM and R, and this is why I prefer it.

First make a copy of your final model file (if your model is called run01.mod make it run01npde.mod)

`Tip: If you are using Pirana, when duplicating your model, tick the box that says: use the final parameter estimates`

In \$ERROR block, add FSIM = Y as the last line

\$ERROR
IPRED=F
Y=F*(1+ERR(1))+ERR(2) ; or prop or add error, keep it as it was
FSIM=Y

Manually fix/check if the values in \$THETA, \$OMEGA and \$SIGMA are the final estimate values, such as in the example below

\$THETA
0.15 FIX ;CL (L/hr)
120 FIX ;V1 (L)
;… etc

\$OMEGA
0.51 FIX ;IIV CL
0.32 FIX ;IIV V1
;… etc If it was fixed at zero, keep it fixed at zero.

\$SIGMA
0.125 FIX ;proportional error

Comment out the \$ESTIMATION, \$COVARIANCE and \$TABLE (add a “;” in front of every line) and add \$SIMULATION

; \$EST
; \$COV
; \$TABLE
\$SIMULATION (12345) ONLYSIMULATION SUBPROBLEMS=1000

Note: The number within the brackets is the seed number. This is to ensure reproducibility of your results, that the subjects that you simulate now, sampling from the ETA and EPS distributions are the same every time you re-run this NPDE analysis.

As presented in the NPDE methodology part above, and also for the other simulation-based validation methods, the number of simulations performed is influential on the quality of the results. In the current NPDE manual a number of 1000 simulations is recommended, but this is highly dependent on the number of observations.

Add a new \$TABLE to keep the results of your NPDE and change the file number accordingly.

\$TABLE ID TIME FSIM IPRED NOPRINT NOHEADER NOAPPEND FILE=run01NPDE.tab

Run the model as you would normally do (execute command in Pirana with PsN).

Once the K (in this case K = 1000) simulations are done, you need to import your results to R use of the autonpde package. An example script with comments is provided below:

# ———- NPDE analysis of run01
# ————————————

library(npde) #if you don’t have this library yet, install it first

# — Set your working directory to the directory where your files are located:

setwd(“C:/PKmodel/Model_evaluation/NPDE”)

# — The code for calculating the NPDE:
# observed data
obs <- read.csv(“Data.csv”,na.strings=”.”)     # this is your dataset you use for modelling
# simulated data
# — Compute NPDE
d1 <- autonpde(obs, sim, iid=2 , ix=3, iy=9, imdv=10, boolsave=F, units = list(x = “h”, y = “ug/L”))
# iid: ID column in observed data
# ix: time column in observed data
# iy: DV column in observed data,
# mdv: MDV column in observed data
# boolsave=F, does not save automatically to pdf
# please check the columns in your observed data, as these numbers might not match the columns in your dataset
# after this step you get the results for a few statistical tests looking for normality.

# — The code for standard NPDE plots:
plot(d1,plot.type=”qqplot”)
plot(d1,plot.type=”hist”,xlab=”npde”)
plot(d1,plot.type=”x.scatter”)
plot(d1,plot.type=”pred.scatter”,xlab=”Predicted concentration (ug/L)”)
plot(d1) #all four plots in 1 page
# — end of script

#### Results interpretation

The normality of the NPDE distribution is tested with a few statistical tests, which are displayed immediately after running the autonpde in the R console:

• Wilcoxon signed rank test (is the mean significantly different than 0?)
• Fischer variance test ( is the variance significantly different than 1?)
• SW normality test (is the NPDE normally distributed?)
• Global p-value (the smallest p-value of the 3 tests, multiplied by 3)

The standard graphs produced assess either the normality of the NPDEs distribution:

• QQ plots and histogram (NPDE example – top row)

Or the distribution of the variance :

• Scatter plots vs. time and DV predictions (NPDE example – bottom row) NPDE example obtained with command plot(d1) with prediction bands around selected percentiles. By default, 95% is used in the npde package, and each prediction interval is plotted as a colored area (blue for the 2.5 and 97.5th percentiles and pink for the median). The corresponding 2.5th, 50th and 97.5th percentiles of the observed data are plotted as lines and should remain within the colored areas.

In the NPDE for a model that performs well you would like to see:

• data points in the QQ plots follow the theoretical line and are within the confidence interval. (upper left corner)
• an overlap between the obtained histogram and the theoretical normal distribution. (upper right corner)
• data points scattered around the null line, and most of the points within -1.96 and 1.96. (bottom row)

Any trends observed in these plots (e.g. cone-shaped graph) might indicate model misspecifications an would require you to make some changes to either the structural, the residual error or maybe the IIV models.

This concludes the short introduction on performing an NPDE analysis for model validation. For more information about this topic you should visit this website where all materials about NPDEs (from conception till recently) are posted.
When writing up this post I came across a comprehensive summary on model evaluations which I also recommend (link).