Mixed Hidden Markov Model for Heterogeneous Longitudinal Data with Missingness and Errors in the Outcome Variable


  • Dominique Dedieu
  • Cyrille Delpierre
  • Sébastien Gadat
  • Thierry Lang
  • Benoit Lepage
  • Nicolas Savy


Analysing longitudinal declarative data raises many difficulties, such as the processing of errors and missingness in the outcome variable. Moreover, long-term monitored cohorts (commonly encountered in life-course epidemiology) may reveal a problem of time heterogeneity, especially regarding the way subjects respond to the investigator. We propose a Mixed Hidden Markov Model which considers several causes of randomness in response and also enables the effect of a past health outcome to act on present responses through a memory state. Hence, we take into account both errors and missing responses, time heterogeneity, and retrospective questions. We thus propose a Stochastic Expectation Maximization algorithm (SEM), which is less time-consuming than usual EM algorithms to perform the estimation of the parameters of our MHMM. We carry out a simulation study to assess the performances of this algorithm in the context of cancer epidemiology with the British NCDS 1958 cohort. Simulations show that the effect of covariates on the transitions probabilities is estimated with moderate bias. At last, we investigate a brief real data application on the effect of early social class on cancer through a smoking behaviour. It appears that in the female sample we used, the early social class does not mainly act on smoking behaviours. Moreover, more information is needed to compensate for data missingness and declarative errors in the view to improve our statistical analysis.






Numéro spécial : données longitudinales quantitatives, événementielles, incomplètement observées