Aufsatz
Validation and generalizability of machine learning prediction models on attrition in longitudinal studies
Abstract
Attrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm—gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition.
Citation
In: International Journal of Behavioral Development (IJBD) Volume 46 / Issue 2 (2022-02-07) , S. 169-176 ; eissn:1464-0651Sponsorship
Gefördert im Rahmen eines Open-Access-Transformationsvertrags mit dem VerlagCitation
@article{doi:10.17170/kobra-202203035823,
author={Jankowsky, Kristin and Schroeders, Ulrich},
title={Validation and generalizability of machine learning prediction models on attrition in longitudinal studies},
journal={International Journal of Behavioral Development (IJBD)},
year={2022}
}
0500 Oax 0501 Text $btxt$2rdacontent 0502 Computermedien $bc$2rdacarrier 1100 2022$n2022 1500 1/eng 2050 ##0##http://hdl.handle.net/123456789/13763 3000 Jankowsky, Kristin 3010 Schroeders, Ulrich 4000 Validation and generalizability of machine learning prediction models on attrition in longitudinal studies / Jankowsky, Kristin 4030 4060 Online-Ressource 4085 ##0##=u http://nbn-resolving.de/http://hdl.handle.net/123456789/13763=x R 4204 \$dAufsatz 4170 5550 {{Maschinelles Lernen}} 5550 {{Längsschnittuntersuchung}} 5550 {{Prognosemodell}} 5550 {{Fehlende Daten}} 7136 ##0##http://hdl.handle.net/123456789/13763
2022-04-19T12:05:04Z 2022-04-19T12:05:04Z 2022-02-07 doi:10.17170/kobra-202203035823 http://hdl.handle.net/123456789/13763 Gefördert im Rahmen eines Open-Access-Transformationsvertrags mit dem Verlag eng Namensnennung 4.0 International http://creativecommons.org/licenses/by/4.0/ machine learning attrition longitudinal studies predictive modeling generalizability 150 Validation and generalizability of machine learning prediction models on attrition in longitudinal studies Aufsatz Attrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm—gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition. open access Jankowsky, Kristin Schroeders, Ulrich doi:10.1177/01650254221075034 Maschinelles Lernen Längsschnittuntersuchung Prognosemodell Fehlende Daten publishedVersion eissn:1464-0651 Issue 2 International Journal of Behavioral Development (IJBD) 169-176 Volume 46 false
The following license files are associated with this item: