Getting your NONMEM dataset to work – DATA ERROR summary

Getting your dataset in the right format to work with NONMEM can be a terrible job. Getting all your observations, doses, dosing times, etc. from different Excel sheets collected in the clinic can take a lot of time. However, the worst thing is when you think you are finished creating your dataset and immediately get an error when you start your NONMEM run. Especially when you encounter an error message that is difficult to interpret. This post will not show how to build your NM dataset, there are already multiple examples on that, but will show you what kind of errors you may encounter and how to fix them.

An example of a working dataset. With a dose being administered in compartment 1 and observations in compartment 2.

 

Where to find the data errors?

First of all, the error messages can be found in the generated .lst file. If you use PsN, these can be found in the modelfit_dir1/NM_run1 folder, even if your run was unsuccessful, or in the main model folder.

The examples provided below are the error messages copied from these .lst files. Every example also contains a dataset that can be used to reproduce the error.

 

Error messages

 

1. Non-numeric cells/columns in the dataset

All your cells in your NM dataset should be numbers. NONMEM can’t handle characters and will therefore throw an error when you try this. It can happen that your covariate is a character and you forgot to change it:

(DATA ERROR) RECORD 1, DATA ITEM 8, CONTENTS: MALE
ITEM IS NOT A NUMBER.

Our covariate column here contains the value “MALE”, which is not allowed.

FIX

Change your covariates to numeric, e.g. male/female can be changed to 0 and 1.

Dataset

Dataset.COV.Character.csv

 

2. NA’s in the DV column

If you have missing observations, and you work in R, your observations will probably be converted to NA. This will give the same error as above since NA is read as a character in NONMEM.

(DATA ERROR) RECORD 21, DATA ITEM 5, CONTENTS: NA
ITEM IS NOT A NUMBER.

FIX

When saving your dataset in R, make sure you specify the na=”.”, which will convert all the NA to dots. The dots can be used in NONMEM. Furthermore, make sure that your MDV column equals 1 when an observation is missing.

I personally use the following settings to create my NONMEM dataset:

write.table(df , file="NONMEMDATASET.csv",append =F, sep=",", na=".",row.names = FALSE,col.names = TRUE, quote = FALSE)

Dataset

Dataset.NADV.csv

 

3. Time not sorted

The following error message will show up when the time column is not sorted. NONMEM needs incremental times per individual, meaning that NONMEM cannot go back in time.

0DATA REC 3: TIME DATA ITEM IS LESS THAN PREVIOUS TIME DATA ITEM

This may also happen when you have multiple occasions in your dataset, where both occasions start at time 0. This is not allowed. If you have data from 2 occasions, you can add a fixed time to the second occasion. For example, start occasion 2 at time point 168 hours. Then, the time column is still incremental within an individual and you can stratify between occasion 1 (time < 168) and occasion 2 (time > 168).

FIX

Sort your dataframe correctly, first on ID, then on time. This should fix the problem.

df <- df [order(df$ID,df$TIME),]

Dataset

Dataset.TimeNotSorted.csv

 

4. EVID is always 0

An EVID column specifies what kind of row we are dealing with, an observation? dosing? or something else? The numbers have specific meanings, of which EVID = 0 is for observations and EVID = 1 for dosing records. An explanation of the other EVID’s can be found in the NM manual.

When you get the following error:

0DATA REC 1: OBSERVATION EVENT RECORD MAY NOT SPECIFY DOSING INFORMATION

This means that you specified with EVID that the record should be treated as an observation, however, there is dosing information included. In this case the AMT was larger then 0 whereas EVID = 0).

FIX

Make sure that all observations have EVID = 0 and all dosing records have EVID = 1.

Dataset

Dataset.EVID0.csv

 

5. Use wrong IGNORE statement

The IGNORE statement in the $INPUT block of your NONMEM model should be used to specify which rows to ignore from your original dataset.

In my model code I specified the following:

$INPUT ID TIME AMT EVID DV CMT MDV

$DATA Dataset.IGNORE.csv IGNORE=I

Which resulted in the following error:

(DATA ERROR) RECORD 1, DATA ITEM 1, CONTENTS: SubjectNr
ITEM IS NOT A NUMBER.

This is because the column name of my ID column is actually SubjectNr in the dataset. Therefore, ignoring all rows starting with I would not ignore the first row containing the column headers.

FIX

IGNORE=@ can be used to ignore all rows starting with a character. In this case, IGNORE=S would also have removed the first row, since the row started with the word SubjectNr.

Dataset

Dataset.IGNORE.csv

 

6. Switch $INPUT labels

Sometimes, the error may be completely different whereas the dataset seems to be correct. For example a combination of multiple error messages can occur:

0DATA REC 1: DOSE AMOUNT IS ZERO
0DATA REC 2: TIME DATA ITEM IS LESS THAN PREVIOUS TIME DATA ITEM
0DATA REC 2: OBSERVATION EVENT RECORD MAY NOT SPECIFY DOSING INFORMATION

This error was also previously encountered by others: https://www.pharmpk.com/PK11/PK2011112.html

FIX

Make sure your $INPUT is correct in your NONMEM model. If the order of your columns in the original dataset change, this will impact the way NONMEM reads your data. For example, your TIME column will suddenly be read as your DV column.

 

Remarks

Another check to perform is that the units of the AMT column and your concentrations (DV) are the same. For example make sure that the dose is in mg and the concentration is in mg/L. This would result in your volume of distribution being estimated in liters. Else, this would result in strange estimates that can be hard to explain at a later stage (and would need you to repeat model development…).

 

COMMENT

Any suggestions on other encountered errors or typo’s? Leave a comment or contact me at info@pmxsolutions.com!

19 thoughts on “Getting your NONMEM dataset to work – DATA ERROR summary

  1. Robert Reply

    What we should do with DV=0 records?

    • MJvanEsdonk Post authorReply

      To the best of my knowledge, there are 2 ways of handling a DV of 0.
      1: A DV=0 can indicate that there was no measurement performed and therefore the MDV column should be set to 1. This will ignore the observed DV in the estimation.
      2: A DV=0 can be the result of an observation below the lower limit of quantification. There are multiple ways to handle this, all with their own (dis)advantages. I can recommend to look at Chapter 2 of the “MI212: Advanced Topics in Population PK-PD Modeling & Simulation” which will discuss in detail different methodologies (https://www.metrumrg.com/course/mi212-advanced-topics-population-pk-pd-modeling-simulation/).

  2. Graziela Lock Reply

    I have a doubt: In my analysis I have 24 observations. If I lost the observations since the observation 12, I have to put this points 13 to 24 in my dataset and describe that these are missing variables?

  3. Permala Reply

    I am trying to convert the model fine data with the amount (mg), concentration (ng/ml) to nmol, and nM.
    The model seems not performing well and the estimates are not same or in the ballpark between the 2 models.
    Please help.

    • MJvanEsdonk Post authorReply

      Hello,
      Without having a look at your calculations I can’t troubleshoot your problem.
      The conversion from ng/ml to nM is a constant which should not change your PK profile, only shift it up- or downwards. The model fit should therefore also not suddenly result in completely different estimates.
      Michiel

  4. Graziela Lock Reply

    Hello,

    I tried run nonmem but this information appears:

    0PRED EXIT CODE = 1
    0INDIVIDUAL NO. 7 ID= 2.03000000000000E+02 (WITHIN-INDIVIDUAL) DATA REC NO. 1
    THETA=
    7.00E-01 3.12E-02 4.40E+00 1.49E-01 1.56E+00 3.68E-01 5.78E-01 4.57E-01 3.68E-01 3.03E-01
    2.20E-01 9.95E-02 1.46E-02 7.68E-02 4.46E-01 6.43E-01 9.87E-01 6.51E-01
    OCCURS DURING SEARCH FOR ETA AT INITIAL VALUE, ETA=0
    F OR DERIVATIVE RETURNED BY PRED IS INFINITE (INF) OR NOT A NUMBER (NAN).

    Only my first group of animals works, from the second group in the same dataset, the nonmem signals that message.
    I tried to remove the individuals to discover which of them is wrong, but it wasn´t work.
    At R all of my columns are numeric.

    • MJvanEsdonk Post authorReply

      This may happen when in a certain row your F (prediction of the model) is equal to 0. Perhaps this is from a pre-dose sample that should be removed?
      Feel free to share the NM code and a minimally reproducible example that gives the error.

      • GRAZIELA Reply

        Thanks for your help!
        I discovered today that the problem was in the dataset: when I had AMT, and EVID = 1, I had to have FLAG=0, and I was with FLAG=2.

  5. Matthew Reply

    Hello Michiel.

    I was googling the error I encountered earlier today and then directed here.

    The error message was similar to what use Graziela had above.

    “F OR DERIVATIVE RETURNED BY PRED IS INFINITE (INF) OR NOT A NUMBER (NAN).”

    I personally encountered this error many times but it was almost always due to a DIV/0 error in the $ERROR block. This time the same error occurred at different records for each rerun, so I suppose it is not a DIV/0 error, which should always occur at the first DIV/0 issue in the datafile.

    Another possibility I can think of is the DV being extremely far away from PRED. But I have run a simulation only to find out that the largest difference is about 3~4 times the SD of RUV (F*W_PROP in my case) which I think is not that extreme to cause this error.

    I have attached a reproducible example below, which I reduced to a particularly problematic subject from my dataset. This subject has only 3 observations. The EXIT from PRED apparently only happens at 2 of them: record #3 or #33.

    I would be grateful if you or someone can shed some light on this.

    (P.S. I actually came to this problem from another. I was using combined prop-add RUV model and noticed extremely high prediction-corrected DV that ruined my PsN-generated VPC, which occurred with very small PRED, where DV/PRED ratio is hundreds to thousands fold. My impression is that the reason of adding add component to a prop RUV model is to resolve the overweighing of large prop error at very low PRED. I am wondering if I have missed configurations that are required for the vpc toolkits to accommodate this.
    Long story short: I had problem with VPC so I wanted to try a prop model instead. That’s why I fixed add component to zero in the example below.)

    ================================================================

    $PROBLEM NICU_D620_52test_615

    $INPUT ID TIME PMA_DAY WT_KG SCR_UMOL_L AMT=AMT_MG DV=DV_MG_L RATE=RATE_MG_HR MDV EVID

    $DATA DATA_NICU_620_52test.csv IGNORE=@

    $SUBROUTINE ADVAN1 TRANS2

    $PK
    ;THETA
    TVLNCL=THETA(1)
    TVLNVD=THETA(2)
    LN_W_PROP=THETA(3)
    LN_W_ADD=THETA(4)
    B_WT_CL=THETA(5)
    B_WT_VD=THETA(6)
    B1_PMA_CL=THETA(7)
    B2_PMA_CL=THETA(8)
    B1_SCR_CL=THETA(9)
    B2_SCR_CL=THETA(10)
    ;RENAME THETA
    LNEXP_WT_CL=B_WT_CL
    LNEXP_WT_VD=B_WT_VD
    LNHILL_PMA_CL=B1_PMA_CL
    EG50_LNPMA_CL=B2_PMA_CL
    LNIC50_SCR_CL=B1_SCR_CL
    LNSLP_SCR_CL=B2_SCR_CL
    ;THETA TRANSFORMATION
    TVCL=EXP(TVLNCL)
    TVVD=EXP(TVLNVD)
    EXP_WT_CL=EXP(LNEXP_WT_CL)
    EXP_WT_VD=EXP(LNEXP_WT_VD)
    HILL_PMA_CL=EXP(LNHILL_PMA_CL)
    EG50_PMA_CL=EXP(EG50_LNPMA_CL)
    IC50_SCR_CL=EXP(LNIC50_SCR_CL)
    SLP_SCR_CL=EXP(LNSLP_SCR_CL)
    ;ETA
    ETA_CL=ETA(1)
    ETA_VD=ETA(2)
    ;EFFECT
    F_SIZE_CL=(WT_KG/1.7)**EXP_WT_CL
    F_SIZE_VD=(WT_KG/1.7)**EXP_WT_VD
    F_PMA_CL=(PMA_DAY**HILL_PMA_CL)/(PMA_DAY**HILL_PMA_CL+EG50_PMA_CL**HILL_PMA_CL)
    F_SCR_CL=1-0.5**((SCR_UMOL_L/IC50_SCR_CL)**(-SLP_SCR_CL))
    ;OCC SETTING FOR BOV
    ETA_BOV_CL=0
    ;CLEARANCE
    POPCL=TVCL*F_SIZE_CL*F_PMA_CL*F_SCR_CL
    INDCL=POPCL*EXP(ETA_CL)
    CL=INDCL*EXP(ETA_BOV_CL)
    ;VOLUME OF DISTRIBUTION
    POPVD=TVVD*F_SIZE_VD
    VD=POPVD*EXP(ETA_VD)
    V=VD
    S1=V
    ;LLOQ DEFINITION
    IF (NEWIND.EQ.0) THEN
    LLOQ=1
    ENDIF

    $THETA
    -1.40 ;LN_TVCL
    0.0205 ;LN_TVVD
    -1.49 ;LN_W_PROP
    0 FIX ;W_ADD
    -0.2876821 FIX ;LNEXP_WT_CL
    0 FIX ;LNEXP_WT_VD
    1.79 ;LNHILL_PMA_CL
    5.52 ;EG50_LNPMA_CL
    5.09 ;LNIC50_SCR_CL
    0.916 ;LNSLP_SCR_CL

    $ERROR
    W_PROP=EXP(LN_W_PROP)
    W_ADD=0
    SD=SQRT((F*W_PROP)**2+(W_ADD**2))
    IPRED=F
    IF (F.EQ.0) CUMD=1
    IF (F.GT.0) CUMD=PHI((LLOQ-F)/SD)
    IF (DV.GE.LLOQ) THEN
    F_FLAG=0
    Y=F+SD*EPS(1)
    ELSE
    F_FLAG=1
    Y=CUMD
    ENDIF

    $OMEGA BLOCK(1) 0.1 ;OMG_CL
    $OMEGA BLOCK(1) 0.05 ;OMG_VD

    $SIGMA
    1 FIXED ;EPS

    $ESTIMATION METHOD=1 INTERACTION LAPLACIAN MAXEVAL=0 PRINT=1 NOABORT

    $COVARIANCE PRINT=E

    $TABLE ID TIME PMA_DAY WT_KG SCR_UMOL_L AMT_MG DV_MG_L RATE_MG_HR MDV EVID POPCL INDCL CL VD PRED IPRED CWRES ECWRES EWRES NWRES UNCONDITIONAL NOPRINT ONEHEADER NOAPPEND FILE=NICU_D620_52test_615.txt

    =====================================================================

    ID,TIME,PMA_DAY,WT_KG,SCR_UMOL_L,AMT_MG,DV_MG_L,RATE_MG_HR,MDV,EVID
    52,0,373.18333333333334,4.88564668034018,40,70,0,70,1,1
    52,7.333333333333333,373.4888888888889,4.892256599111551,40,70,0,70,1,1
    52,14.616666666666667,373.7923611111111,4.898821450254939,40,0,10.38,0,0,0
    52,15.216666666666667,373.8173611111111,4.8993622617907775,40,70,0,70,1,1
    52,22.666666666666668,374.1277777777778,4.906077338360788,40,70,0,70,1,1
    52,30.65,374.4604166666667,4.913273136295986,40,70,0,70,1,1
    52,38.63333333333333,374.79305555555555,4.920468934231186,40,70,0,70,1,1
    52,46.85,375.1354166666667,4.927875047763658,40,70,0,70,1,1
    52,582.6166666666667,397.4590277777778,5.4107897041831485,40,0,0,0,1,2
    52,871.3333333333334,409.4888888888889,5.671025210720659,36,0,0,0,1,2
    52,988.3333333333334,414.3638888888889,5.776483460209391,37,0,0,0,1,2
    52,1614.5,440.45416666666665,6.340880388028722,27,0,0,0,1,2
    52,1621.7166666666667,440.7548611111111,6.347385149001461,22,0,0,0,1,2
    52,2239.2166666666667,466.4840277777778,6.903970354636442,27,0,0,0,1,2
    52,2407.3333333333335,473.4888888888889,7.055502742470182,21,0,0,0,1,2
    52,2436.6666666666665,474.7111111111111,7.081942417555675,40,0,0,0,1,2
    52,2443.1666666666665,474.9819444444445,7.087801209193939,43,0,0,0,1,2
    52,2455.2833333333333,475.4868055555555,7.098722597709367,41,0,0,0,1,2
    52,2459.6,475.66666666666663,7.1026134362588795,32,0,0,0,1,2
    52,2476.8166666666666,476.38402777777776,7.118131722828945,31,0,0,0,1,2
    52,2503.15,477.48125,7.141867340235242,26,0,0,0,1,2
    52,2526.2,478.44166666666666,7.1626435167370825,31,0,0,0,1,2
    52,2604.0666666666666,481.68611111111113,7.23282883605494,32,0,0,0,1,2
    52,2609.6833333333334,481.9201388888889,7.237891432932107,38,0,0,0,1,2
    52,2621.5,482.4125,7.248542415679616,28,0,0,0,1,2
    52,2645.2,483.4,7.269904471345281,26,0,0,0,1,2
    52,2671.9166666666665,484.5131944444445,7.293985607232809,25,0,0,0,1,2
    52,2678.7833333333333,484.7993055555555,7.30017489480964,24,0,0,0,1,2
    52,2693.633333333333,485.41805555555555,7.313559980321671,20,0,0,0,1,2
    52,2715.6666666666665,486.3361111111111,7.333419781721117,15,0,0,0,1,2
    52,2727.05,486.8104166666667,7.343680178359408,15,100,0,100,1,1
    52,2734.766666666667,487.13194444444446,7.350635615612013,7.5,100,0,100,1,1
    52,2742.5333333333333,487.4555555555555,7.357636120492602,7.5,0,8.41,0,0,0
    52,2742.8,487.4666666666667,7.3578764811752,7.5,100,0,100,1,1
    52,2743.383333333333,487.49097222222224,7.3584022701683764,7.5,0,0,0,1,2
    52,2752.2833333333333,487.8618055555555,7.3664243079499965,7.5,100,0,100,1,1
    52,2758.9333333333334,488.1388888888889,7.372418302472221,18,100,0,100,1,1
    52,2766.7833333333333,488.46597222222226,7.379493920066125,18,100,0,100,1,1
    52,2767.05,488.4770833333333,7.379734280748718,18,0,0,0,1,2
    52,2774.6,488.79166666666663,7.3865394925747,18,100,0,100,1,1
    52,2782.6833333333334,489.12847222222223,7.393825425765874,7.5,100,0,100,1,1
    52,2790.65,489.4604166666667,7.4010062011584115,7.5,100,0,100,1,1
    52,2790.8,489.4666666666667,7.401141404042372,7.5,0,0,0,1,2
    52,2798.65,489.79375,7.408217021636274,7.5,100,0,100,1,1
    52,2806.6,490.125,7.415382774486149,15,100,0,100,1,1
    52,2814.766666666667,490.46527777777777,7.422743820390632,15,100,0,100,1,1
    52,2814.9,490.4708333333333,7.422864000731931,15,0,0,0,1,2
    52,2822.9333333333334,490.80555555555554,7.4301048662951175,17,0,0,0,1,2
    52,2822.983333333333,490.8076388888889,7.430149933923104,17,100,0,100,1,1
    52,2830.75,491.13125,7.437150438803696,17,100,0,100,1,1
    52,2835.8166666666666,491.34236111111113,7.441717291773008,17,0,0,0,1,2
    52,2838.766666666667,491.46527777777777,7.444376281824219,17,100,0,100,1,1
    52,2846.85,491.80208333333337,7.451662215015393,21,100,0,100,1,1
    52,2854.6,492.125,7.458647697353321,21,100,0,100,1,1
    52,2862.8166666666666,492.46736111111113,7.466053810885793,21,100,0,100,1,1
    52,2863.3,492.4875,7.466489464622997,21,0,0,0,1,2
    52,2870.633333333333,492.79305555555555,7.47309938339437,21,100,0,100,1,1
    52,2878.8,493.1333333333333,7.480460429298855,21,100,0,100,1,1
    52,2886.8333333333335,493.46805555555557,7.487701294862042,21,100,0,100,1,1
    52,2894.983333333333,493.8076388888889,7.495047318223863,21,100,0,100,1,1
    52,2903.8,494.175,7.502994243292173,21,100,0,100,1,1
    52,2910.5333333333333,494.4555555555555,7.509063350527708,21,0,0,0,1,2
    52,2910.6666666666665,494.4611111111111,7.509183530869006,21,100,0,100,1,1
    52,2918.633333333333,494.79305555555555,7.5163643062615435,21,100,0,100,1,1
    52,2926.6666666666665,495.12777777777774,7.5236051718247285,29,100,0,100,1,1
    52,2932.983333333333,495.3909722222222,7.52929871549371,29,0,19.88,0,0,0

    • MJvanEsdonk Post authorReply

      Dear Matthew,

      The problem can be traced back to the following part in your $ERROR block:
      IF (DV.GE.LLOQ) THEN
      F_FLAG=0
      Y=F+SD*EPS(1)
      ELSE
      F_FLAG=1
      Y=CUMD
      ENDIF

      Applying the first part of the ifelse statement shows correct estimation:
      F_FLAG=0
      Y=F+SD*EPS(1)

      However, including the following part introduces the error you described:
      Y=CUMD

      This indicates that CUMD is negative for certain combinations in the model.
      You can look into if this M3 method is required for your dataset and how it performs with just a default error block.
      Good luck,
      Michiel

  6. Emily Reply

    I keep getting the following error message:
    (DATA WARNING 2) RECORD 1, DATA ITEM 9, CONTENTS:
    THE NUMBER OF DATA ITEMS SPECIFIED IN $INPUT EXCEEDS THE NUMBER OF VALUES
    IN A RECORD OF THE NM-TRAN DATA FILE. NULLS WERE SUPPLIED FOR MISSING
    VALUES, STARTING WITH THE ABOVE NUMBERED DATA ITEM.

    (DATA WARNING 2) RECORD 2, DATA ITEM 9, CONTENTS:
    THE NUMBER OF DATA ITEMS SPECIFIED IN $INPUT EXCEEDS THE NUMBER OF VALUES
    IN A RECORD OF THE NM-TRAN DATA FILE. NULLS WERE SUPPLIED FOR MISSING
    VALUES, STARTING WITH THE ABOVE NUMBERED DATA ITEM.

    I can’t find any missing values in my data file and the number of data items in my $INPUT matches up with the number of columns in my data file

    • MJvanEsdonk Post authorReply

      Hello Emily,

      Please check your $input and whether the columns that you have specified really match the number of columns that are available in your dataset. Also check if a comma is used as seperator variable in your dataset (e.g. open it with notepad) as it could be that a row is not being separated correctly, resulting in 1 cell per row.

      Feel free to post row 2 here and your $input line.

      Good luck.

  7. Julia Reply

    Hello, I’m trying run nonmem with ADVAN 7, for a model with parent and metabolito, but appears this message:
    0 PRED EXIT CODE = 1
    0INDIVIDUAL NO. 1 ID= 1 (WHITIN-INDIVIDUAL) DATA REC NO.1
    THETA=
    1.50E-01 5.00E+01 5.00E+01
    NÚMERO AL DIFFICULTIES OBTAINING THE SOLUTION

    I’m already try chance yo ADVAN 5, removing patients and the error continues

    thanks for any help you can give me

    • MJvanEsdonk Post authorReply

      Hi,

      This can be due to a number of different problems. Perhaps try ADVAN6, 9, or 13 in which you define the ordinary differential equations yourself, this limits the potential typos etc.

      Let me know if you require an example NM control stream of such a model.
      Michiel

  8. Julia Reply

    Sorry the message of error Is:
    NUMERICAL DIFFICULTIES OBTAINING THE SOLUTION

  9. Parsshava Mehta Reply

    Hi,

    I have a problem there are a few covariates in my dataset which do not have a value and I have marked them as `.` The problem is that after the run when I get the output the value changes to `zero`. This missing value which changes to `zero` causes a confusion with my actual `0` value since they are categorical covariates. What is the possible fix for this.

    • MJvanEsdonk Post authorReply

      Thank you for this interesting question! It is actually always a debate on how to handle missing covariate values and the way to choose depends on your analysis and the number of missing covariates
      For example, one way is to set the covariate value to the most frequent group, another option is to remove all together, or exclude this subject from the covariate analysis.
      I can suggest the following paper highlighting some of these methodologies which hopefully answers your question:
      https://link.springer.com/article/10.1208/s12248-013-9526-y
      https://cognigen.com/nonmem/nm/99sep112000.html

      Happy coding.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.