Data from: Developing and validating a survival prediction model for NSCLC patients through distributed learning across three countries

TitleData from: Developing and validating a survival prediction model for NSCLC patients through distributed learning across three countries
Publication TypeDataset
Year of Publication2017
AuthorsJochems, A, Deist, TM, Naqa, IEl, Kessler, M, Mayo, C, Reeves, J, Jolly, S, Matuszak, M, Haken, RTen, van Soest, J, Oberije, C, Faivre-Finn, C, Price, G, De Ruysscher, D, Lambin, P, Dekker, A
PublisherCancerData
Publication Languageeng
KeywordsBayesian network, lung cancer, NSCLC, prediction model
Abstract

Purpose

Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with (chemo)radiotherapy are of limited quality. In this work, we develop a predictive model of survival at two years based on a large volume of historical patient data, as a proof of concept, using a distributed learning approach.

Patients and methods

Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected and stored in 2 different cancer institutes (559 patients at Maastro clinic (Netherlands), 139 at University of Manchester (UK). The model was further validated on 196 patients originating from the University of Michigan (USA).

A Bayesian network model is adapted for distributed learning (watch the animation). Two-year post-treatment survival was chosen as endpoint. The Institute 1 cohort data is publicly available and the developed models can be found at PredictCancer.org)

Results

Variables included in the final model were T and N stage, age, performance status, and total tumor dose. The model has an AUC of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross-validation. A model based on T and N stage performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<0.001). A high- and low-risk chance of survival group can be identified using the model presented in this study, these groups have significantly different overall survival (P<0.01).

Conclusion

Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data sharing barriers. We believe that Distributed learning is the future of sharing data in health care.

DOI10.17195/candat.2017.02.2
Original Publication10.1016/j.ijrobp.2017.04.021
File: 
AttachmentSize
Jochems-2017-MaastroDataUnbinned.csvdisplayed 121 times62.76 KB