Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers.
MetadataShow full item record
REGNIER-COUDERT, O., MCCALL, J., LOTHIAN, R., LAM, T., MCCLINTON, S. and N'DOW, J. 2012. Machine learning for improved pathological staging of prostate cancer: a performance comparison on a range of classifiers. Artificial intelligence in medicine [online], 55(1), pages 25-35. Available from: https://doi.org/10.1016/j.artmed.2011.11.003
Objectives: Prediction of prostate cancer pathological stage is an essential step in a patient's pathway. It determines the treatment that will be applied further. In current practice, urologists use the pathological stage predictions provided in Partin tables to support their decisions. However, Partin tables are based on logistic regression (LR) and built from US data. Our objective is to investigate a range of both predictive methods and of predictive variables for pathological stage prediction and assess them with respect to their predictive quality based on UK data. Methods and material: The latest version of Partin tables was applied to a large scale British dataset in order to measure their performances by mean of concordance index (c-index). The data was collected by the British Association of Urological Surgeons (BAUS) and gathered records from over 1700 patients treated with prostatectomy in 57 centers across UK. The original methodology was replicated using the BAUS dataset and evaluated using concordance index. In addition, a selection of classifiers, including, among others, LR, artificial neural networks and Bayesian networks (BNs) was applied to the same data and compared with each other using the area under the ROC curve (AUC). Subsets of the data were created in order to observe how classifiers perform with the inclusion of extra variables. Finally a local dataset prepared by the Aberdeen Royal Infirmary was used to study the effect on predictive performance of using different variables. Results: Partin tables have low predictive quality (c-index = 0.602) when applied on UK data for comparison on patients with organ confined and extra prostatic extension conditions, patients at the two most frequently observed pathological stages. The use of replicate lookup tables built from British data shows an improvement in the classification, but the overall predictive quality remains low (c-index = 0.610). Comparing a range of classifiers shows that BNs generally outperform other methods. Using the four variables from Partin tables, naive Bayes is the best classifier for the prediction of each class label (AUC = 0.662 for OC). When two additional variables are added, the results of LR (0.675), artificial neural networks (0.656) and BN methods (0.679) are overall improved. BNs show higher AUCs than the other methods when the number of variables raises Conclusion: The predictive quality of Partin tables can be described as low to moderate on UK data. This means that following the predictions generated by Partin tables, many patients would received an inappropriate treatment, generally associated with a deterioration of their quality of life. In addition to demographic differences between UK and the original US population, the methodology and in particular LR present limitations. BN represents a promising alternative to LR from which prostate cancer staging can benefit. Heuristic search for structure learning and the inclusion of more variables are elements that further improve BN models quality.