Français
Abstract
Some public health surveys suffer from an issue of acceptability among respondents leading to a low response rate. To address this problem, we propose to reduce the length of individual questionnaires by randomly removing items. In order to impute the missing data, we introduce a Bayesian model for data imputation based on non-negative matrix factorization. We propose an inference algorithm combining a Gibbs sampler algorithm and a variational approach. Using the results of a survey on patient safety culture conducted at Grenoble University Hospital, we compare the performance of our new method with several classical approaches, with a random forest method, and with three additional matrix factorization methods. The average reconstruction error is lower than for other methods when the proportion of removed items is high (greater than 40%). With lower proportions of removed items (lower than 40%), the histograms of the marginal distributions are reconstructed satisfactorily. In this respect, the best performances were obtained with the random forest approach. Overall, our results suggest that similar surveys could be carried out by substantially reducing the number of questions asked to each worker with limited loss of information and interpretation.