Validation and generalizability of machine learning prediction models on attrition in longitudinal studies

Jankowsky, Kristin; Schroeders, Ulrich

🇬🇧

Aufsatz

Abstract

🇬🇧

Attrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm—gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition.

Citation

In: International Journal of Behavioral Development (IJBD) Volume 46 / Issue 2 (2022-02-07) , S. 169-176 ; eissn:1464-0651

Sponsorship

Gefördert im Rahmen eines Open-Access-Transformationsvertrags mit dem Verlag

Collections

Artikel (Publikationen im Open Access gefördert durch die UB)

Citation

BibTex

@article{doi:10.17170/kobra-202203035823,
   author={Jankowsky, Kristin and Schroeders, Ulrich},
   title={Validation and generalizability of machine learning prediction models on attrition in longitudinal studies},
   journal={International Journal of Behavioral Development (IJBD)},
   year={2022}
}

0500 Oax
0501 Text $btxt$2rdacontent
0502 Computermedien $bc$2rdacarrier
1100 2022$n2022
1500 1/eng
2050 ##0##http://hdl.handle.net/123456789/13763
3000 Jankowsky, Kristin
3010 Schroeders, Ulrich
4000 Validation and generalizability of machine learning prediction models on attrition in longitudinal studies / Jankowsky, Kristin
4030 
4060 Online-Ressource
4085 ##0##=u http://nbn-resolving.de/http://hdl.handle.net/123456789/13763=x R
4204 \$dAufsatz
4170 
5550 {{Maschinelles Lernen}}
5550 {{Längsschnittuntersuchung}}
5550 {{Prognosemodell}}
5550 {{Fehlende Daten}}
7136 ##0##http://hdl.handle.net/123456789/13763


<resource xsi:schemaLocation="http://datacite.org/schema/kernel-2.2 http://schema.datacite.org/meta/kernel-2.2/metadata.xsd">
2022-04-19T12:05:04Z
2022-04-19T12:05:04Z
2022-02-07
doi:10.17170/kobra-202203035823
http://hdl.handle.net/123456789/13763
Gef&ouml;rdert im Rahmen eines Open-Access-Transformationsvertrags mit dem Verlag
eng
Namensnennung 4.0 International
http://creativecommons.org/licenses/by/4.0/
machine learning
attrition
longitudinal studies
predictive modeling
generalizability
150
Validation and generalizability of machine learning prediction models on attrition in longitudinal studies
Aufsatz
Attrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm&mdash;gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition.
open access
Jankowsky, Kristin
Schroeders, Ulrich
doi:10.1177/01650254221075034
Maschinelles Lernen
L&auml;ngsschnittuntersuchung
Prognosemodell
Fehlende Daten
publishedVersion
eissn:1464-0651
Issue 2
International Journal of Behavioral Development (IJBD)
169-176
Volume 46
false
</resource>

The following license files are associated with this item:

Creative Commons

Except where otherwise noted, this item's license is described as Namensnennung 4.0 International

View/Open

Date

Author

Subject

URI

Metadata