Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer

Ye, Xiucai; Li, Hongmin; Sakurai, Tetsuya; Shueng, Pei-Wei

doi:10.7150/ijms.33820

Theranostics

International Journal of Biological Sciences

Nanotheranostics

Journal of Cancer

Journal of Genomics

open access Global reach, higher impact

Full Text | PDF

Int J Med Sci 2019; 16(7):949-959. doi:10.7150/ijms.33820 This issue Cite

Research Paper

Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer

Xiucai Ye^1,2✉, Hongmin Li¹, Tetsuya Sakurai^1,2, Pei-Wei Shueng^3,4

1. Department of Computer Science, University of Tsukuba, Tsukuba, Japan
2. Center for Artificial Intelligence Research, University of Tsukuba, Tsukuba, Japan
3. Division of Radiation Oncology, Far Eastern Memorial Hospital, New Taipei City, Taiwan
4. Faculty of Medicine, School of Medicine, National Yang-Ming University, Taipei, Taiwan

Citation:

Ye X, Li H, Sakurai T, Shueng PW. Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. Int J Med Sci 2019; 16(7):949-959. doi:10.7150/ijms.33820. https://www.medsci.org/v16p0949.htm

Other styles

Abstract

Background: In recent years, the development and diagnosis of secondary cancer have become the primary concern of cancer survivors. A number of studies have been developing strategies to extract knowledge from the clinical data, aiming to identify important risk factors that can be used to prevent the recurrence of diseases. However, these studies do not focus on secondary cancer. Secondary cancer is lack of the strategies for clinical treatment as well as risk factor identification to prevent the occurrence.

Methods: We propose an effective ensemble feature learning method to identify the risk factors for predicting secondary cancer by considering class imbalance and patient heterogeneity. We first divide the patients into some heterogeneous groups based on spectral clustering. In each group, we apply the oversampling method to balance the number of samples in each class and use them as training data for ensemble feature learning. The purpose of ensemble feature learning is to identify the risk factors and construct a diagnosis model for each group. The importance of risk factors is measured based on the properties of patients in each group separately. We predict secondary cancer by assigning the patient to a corresponding group and based on the diagnosis model in this corresponding group.

Results: Analysis of the results shows that the decision tree obtains the best results for predicting secondary cancer in the three classifiers. The best results of the decision tree are 0.72 in terms of AUC when dividing the patients into 15 groups, 0.38 in terms of F₁ score when dividing the patients into 20 groups. In terms of AUC, decision tree achieves 67.4% improvement compared to using all 20 predictor variables and 28.6% improvement compared to no group division. In terms of F₁ score, decision tree achieves 216.7% improvement compared to using all 20 predictor variables and 80.9% improvement compared to no group division. Different groups provide different ranking results for the predictor variables.

Conclusion: The accuracies of predicting secondary cancer using k-nearest neighbor, decision tree, support vector machine indeed increased after using the selected important risk factors as predictors. Group division on patients to predict secondary cancer on the separated models can further improve the prediction accuracies. The information discovered in the experiments can provide important references to the personality and clinical symptom representations on all phases of guide interventions, with the complexities of multiple symptoms associated with secondary cancer in all phases of the recurrent trajectory.

Keywords: secondary cancer, risk factors, class imbalance, patient heterogeneity, spectral clustering, ensemble learning

Citation styles

APA

Ye, X., Li, H., Sakurai, T., Shueng, P.W. (2019). Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. International Journal of Medical Sciences, 16(7), 949-959. https://doi.org/10.7150/ijms.33820.

ACS

Ye, X.; Li, H.; Sakurai, T.; Shueng, P.W. Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. Int. J. Med. Sci. 2019, 16 (7), 949-959. DOI: 10.7150/ijms.33820.

NLM

CSE

Ye X, Li H, Sakurai T, Shueng PW. 2019. Ensemble Feature Learning to Identify Risk Factors for Predicting Secondary Cancer. Int J Med Sci. 16(7):949-959.

This is an open access article distributed under the terms of the Creative Commons Attribution (CC BY-NC) license (https://creativecommons.org/licenses/by-nc/4.0/). See http://ivyspring.com/terms for full terms and conditions.