Validation and generalizability of machine learning prediction models on attrition in longitudinal studies

dc.date.accessioned2022-04-19T12:05:04Z
dc.date.available2022-04-19T12:05:04Z
dc.date.issued2022-02-07
dc.description.sponsorshipGefördert im Rahmen eines Open-Access-Transformationsvertrags mit dem Verlagger
dc.identifierdoi:10.17170/kobra-202203035823
dc.identifier.urihttp://hdl.handle.net/123456789/13763
dc.language.isoengeng
dc.relation.doidoi:10.1177/01650254221075034
dc.rightsNamensnennung 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectmachine learningeng
dc.subjectattritioneng
dc.subjectlongitudinal studieseng
dc.subjectpredictive modelingeng
dc.subjectgeneralizabilityeng
dc.subject.ddc150
dc.subject.swdMaschinelles Lernenger
dc.subject.swdLängsschnittuntersuchungger
dc.subject.swdPrognosemodellger
dc.subject.swdFehlende Datenger
dc.titleValidation and generalizability of machine learning prediction models on attrition in longitudinal studieseng
dc.typeAufsatz
dc.type.versionpublishedVersion
dcterms.abstractAttrition in longitudinal studies is a major threat to the representativeness of the data and the generalizability of the findings. Typical approaches to address systematic nonresponse are either expensive and unsatisfactory (e.g., oversampling) or rely on the unrealistic assumption of data missing at random (e.g., multiple imputation). Thus, models that effectively predict who most likely drops out in subsequent occasions might offer the opportunity to take countermeasures (e.g., incentives). With the current study, we introduce a longitudinal model validation approach and examine whether attrition in two nationally representative longitudinal panel studies can be predicted accurately. We compare the performance of a basic logistic regression model with a more flexible, data-driven machine learning algorithm—gradient boosting machines. Our results show almost no difference in accuracies for both modeling approaches, which contradicts claims of similar studies on survey attrition. Prediction models could not be generalized across surveys and were less accurate when tested at a later survey wave. We discuss the implications of these findings for survey retention, the use of complex machine learning algorithms, and give some recommendations to deal with study attrition.eng
dcterms.accessRightsopen access
dcterms.creatorJankowsky, Kristin
dcterms.creatorSchroeders, Ulrich
dcterms.source.identifiereissn:1464-0651
dcterms.source.issueIssue 2
dcterms.source.journalInternational Journal of Behavioral Development (IJBD)eng
dcterms.source.pageinfo169-176
dcterms.source.volumeVolume 46
kup.iskupfalse

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
01650254221075034.pdf
Size:
383.44 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.03 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections