Developing and validating a survival prediction model for NSCLC patients through distributed learning across three countries

Authors: 
Arthur Jochems, Timo M. Deist, Issam El Naqa, Marc Kessler, Chuck Mayo, Jackson Reeves, Shruti Jolly, Martha Matuszak, Randall Ten Haken, Johan van Soest, Cary Oberije, Corinne Faivre-Finn, Gareth Price, Dirk de Ruysscher, Philippe Lambin, Andre Dekker
Year: 
2017

 

Purpose

Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with (chemo)radiotherapy are of limited quality. In this work, we develop a predictive model of survival at two years based on a large volume of historical patient data, as a proof of concept, using a distributed learning approach.

Patients and methods

Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected and stored in 2 different cancer institutes (559 patients at Maastro clinic (Netherlands), 139 at University of Manchester (UK). The model was further validated on 196 patients originating from the University of Michigan (USA).

A Bayesian network model is adapted for distributed learning (watch the animation). Two-year post-treatment survival was chosen as endpoint. The Institute 1 cohort data is publicly available and the developed models can be found at PredictCancer.org

Results

Variables included in the final model were T and N stage, age, performance status, and total tumor dose. The model has an AUC of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross-validation. A model based on T and N stage performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<0.001). A high- and low-risk chance of survival group can be identified using the model presented in this study, these groups have significantly different overall survival (P<0.01).

Conclusion

Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data sharing barriers. We believe that Distributed learning is the future of sharing data in health care.

 

File: 
AttachmentSize
File Jochems-2017-MaastroDataUnbinned.csv62.76 KB