# Publishing the covariance matrix of population models. Why not?

### The information in the covariance matrix

Everyone that has worked for longer than a day with NONMEM has a high probability of encountering error messages mentioning the covariance matrix (nonpositive semidefinite…, R matrix algorithmically singular…, etc.) or people ask and discuss whether or why the covariance step of the model was not successful. However, we barely pay any attention to it afterwards, nor do we see it published together with the final model.

The pieces of information that we commonly use from a successful covariance step is the standard errors on your parameters, used to derive the relative standard errors (RSE) that we report in our parameter table, and calculation of the condition number (ratio of largest to smallest eigenvalues).

However, it contains so much more information! Most importantly, if you want to simulate a new parameter set while accounting for their underlying covariance/correlation structure, we need to have access to this covariance matrix. Please look at this post to see how you can perform a simulation with parameter uncertainty based on your NONMEM model in R. Furthermore, it can be used to check the correlation between parameters and observe and judge any trends of strong correlations in the model. Additionally, it becomes important when multiple models in literature exist for the same compound and/or for different populations. Plotting and overlapping the PK curve of these models could result in different conclusions compared to plotting the PK curve with the corresponding 95% confidence interval, especially in small populations where RSEs are high and potential correlations between covariate relationships or other PK parameters may exist. Therefore, simulating literature models with the correct covariance matrix assists a correct external validation of a population PK/PD model.

Unfortunately, performing simulations with parameter uncertainty is not implemented in NONMEM or PsN, which might be the reason that only very few give the matrix the attention it deserves… Or perhaps the simulations performed with a model might suddenly not look that perfect once we add the existing levels of uncertainty on top of it (publish or perish right…). Once you start looking in literature for models to use for simulation purposes in your own study, you start noticing the lack of reporting of this information.

Note: In this post I will use the (variance-)covariance and correlation matrix terminology interchangeably.

### Covariance matrix in (recent) reporting guidelines of population PK/PD models

Dykstra et al. Reporting guidelines for population pharmacokinetic analyse 2015: https://dx.doi.org/10.1007%2Fs10928-015-9417-1

In the NONMEM model fitting procedure, we have two distinct steps. The minimization step, which finds the best estimate for each model parameter, after which a covariance step is performed to derive the standard errors of these parameters.

Interesting in this publication is the following paragraph on the results of their survey: “Fifty-nine percent (59 %) of respondents indicated that it is sometimes acceptable to report a final model that did not successfully estimate the model variance–covariance matrix (i.e. completion of the \$COV step in NONMEM), “with justification”, while 65 % would sometimes accept the use of first order (FO) estimation.”

The participants did unfortunately not discuss in what cases this would be acceptable to not have a covariance matrix (and therefore no uncertainty estimates) in a final model. Not having these estimates means that we can only simulate a median/mean profile with inter-individual variability, but no confidence intervals around them.

Furthermore, this is the only time in the paper that they mention the variance-covariance matrix. It is not suggested as something that should be included in a publication, which is unfortunate for a ‘reporting guideline’.

As a side note: for the guidelines of regulatory agencies they write that “At the minimum, the model code and outputs for the base and the final model should be included.”. If one defines the ‘outputs’ as the full .lst file of these models, it would contain the covariance matrix. However, the variance-covariance matrix is not specified as such in the text.

Reporting a Population Pharmacokinetic–Pharmacodynamic Study: A Journal’s Perspective: https://pubmed.ncbi.nlm.nih.gov/24327237/

The abstract of the following article states: “This guide is intended to enhance the readability, reproducibility and understanding of the work for a general readership,”. Let’s see whether these guidelines would indeed result in reproducible models, as we require the covariance matrix to reproduce the model with the intrinsic covariance structure.

Looking for the term ‘variance’ in the text results in the parapraph that the author should report the correlation or covariance between parameters, but their example focuses on the implementation in an omega block:

Then, they mention that we should provide enough information to be able to use the model for simulation purposes with a strong focus on the subject characteristics and the underlying covariance structure of the covariates:

“Further, if the derived model is to be used for simulation, it is also recommended that the relationships among these clinical/demographic variables are reported, either in the table or in an appendix or supplementary file. This may include covariances between continuous variables (e.g. height and weight for each sex), two-by-two tables of categorical variables (e.g. sex and disease status) or results of regression analyses. This information is essential for readers interested in simulating subjects from the reported study population.”

As I fully agree that the covariance structure behind the covariates would also be very meaningful, this would have been an excellent paragraph to also add that we should publish the covariance matrix of our final model, to simulate a new parameter set from the reported model.

### Examples of how to report the matrix in a publication

#### Parameter table

The information we commonly include in a parameter table is the description of the parameter, the parameter estimate, the relative standard error (RSE, in %), and perhaps the shrinkage for our inter-individual and residual variability. We can extend this table with the covariance/correlation between the parameters as new columns. However, it is recommended to do this only for models with a small number of parameters because the number of columns linearly increase with the number of parameters. Having to many parameters would be a reason to give the covariance table in the supplemental material of a publication. Have a look at the following table which include the covariance matrix in the same table as the parameter estimates:

As the covariance table in itself is difficult to interpret due to the values of the covariances that depend on the variances, this table can also be replaced with the correlation matrix which is easier to interpret. Both tables can be converted in one another if the diagonal variances (which can be calculated from the RSE) are given.

#### Reporting the covariance matrix in the supplemental material

.lst file of final model run

The easiest and in my opinion the most complete way is publishing the full output of the model run, the .lst file, as a supplemental material of your publication. This file contains the model code, the number of observations and number of individuals used, potential error messages or warnings, the significance numbers, shrinkages, and the covariance matrix!

.cov or .cor file

One could also just copy the contents of the .cov and .cor files, clearly annotated, in a word document and publish them as supplemental materials to your article. Please take care of this annotation, as THETA1 does not always correspond with the first row in the parameter table for example. One is therefore also encouraged to provide the full NONMEM model code (to let the readers figure it out themselves). Don’t forget that copy-and-pasting a matrix from a pdf document is difficult, therefore publish the supplemental as a Word or text document.

dput R code of covariance matrix

If you want to claim the price of ‘best reproducible model’, you can also add the R code required to load your covariance table in NONMEM. You can do this by using the dput function in R. The dput function allows to create R code from an R object or dataframe, which others can use in their scripts.

So if you have your covariance matrix as an object in R, one can simply call dput(cov_matrix_object) and copy the generated R code. As mentioned before, make sure the matrix/document/code is annotated clearly. Please find an example below on how to turn your final parameter estimates and your covariance matrix in R code for others to use:

```##########################################
## Parameter estimates

## Obtain model parameter estimates
filter(ITERATION == -1E9)

est\$ITERATION <- NULL
est\$OBJ <- NULL

## Compute R code for this dataframe
dput(est)

##########################################
## Read in covariance matrix generated by NONMEM

# Remove NAME column
cov\$NAME <- NULL

## Create a matrix
cov<-data.matrix(cov)

## Compute R code for this matrix
dput(cov)```

The functions dput(est) and dput(cov) result in the following pieces of R scripts in our console that we can use to load in our matrix in R immediately, without any risk of typo's:

```structure(list(THETA1 = 0.679332, THETA2 = 15.1588, THETA3 = 6.73738,
SIGMA.1.1. = 0.0480769, OMEGA.1.1. = 0.0343482, OMEGA.2.1. = 0,
OMEGA.2.2. = 0.282795), row.names = c(NA, -1L), class = "data.frame")

structure(c(0.000226877, 0.00354806, -0.000231722, -5.35541e-07,
-4.8394e-05, 0, -3.74119e-05, 0.00354806, 0.422326, 0.0446764,
0.000298066, -0.000784339, 0, 0.0015094, -0.000231722, 0.0446764,
0.33613, 0.000151759, 3.16997e-05, 0, 0.000474441, -5.35541e-07,
0.000298066, 0.000151759, 2.97217e-05, -1.41725e-05, 0, -3.29127e-06,
-4.8394e-05, -0.000784339, 3.16997e-05, -1.41725e-05, 0.000214389,
0, -1.37771e-06, 0, 0, 0, 0, 0, 0, 0, -3.74119e-05, 0.0015094,
0.000474441, -3.29127e-06, -1.37771e-06, 0, 0.00424649), .Dim = c(7L,
7L), .Dimnames = list(NULL, c("THETA1", "THETA2", "THETA3", "SIGMA.1.1.",
"OMEGA.1.1.", "OMEGA.2.1.", "OMEGA.2.2.")))

```

What are your thoughts on reporting the covariance matrix? Should it become a requirement for future publications or would it just be 'nice to have'? Unfortunately I can't remember the last time I published the covariance matrix of my model at the time of writing, so this is a good moment to change that and also to comment on it when I am asked for peer review of new articles.