Evaluation of the Reproducibility of Radiomic Intelligent Systems for Early Detection of Lung Cancer

Upcoming PhD defence

Guillermo Torres successfully defended his dissertation on Computer Science on March 19, 2024, and he is now Doctor of Philosophy by the Universitat Autònoma de Barcelona.

What is the thesis about?

Currently, there is a growing trend in cancer cases, with lung cancer leading in cancer-related deaths and ranking second in new cases, just behind breast cancer. Upon lung cancer detection, patients enter a follow-up circuit within the healthcare system, with the frequency depending on the case, for instance, ranging from check-ups every 3, 6 months, or annually. Early detection of lung cancer is crucial, increasing survival chances, reducing patient anxiety, and alleviating the demand for healthcare resources.

To address research gaps, we created a reliable dataset with cases diagnosed histologically through biopsy, promoting transparency while respecting data confidentiality. Numerous studies using machine learning and deep learning report promising performances in lung cancer research. However, commonly used public datasets lack biopsy diagnoses and rely on visual classification by health experts. This constraint motivated us to create a dataset diagnosed through biopsy, adhering to globally accepted acquisition protocols. We also developed an infrastructure that facilitates multi-center data collection. Our dataset is publicly available, fostering research progress while ensuring data confidentiality.

We explored strategies to generate representation spaces characterising lung nodules from computed tomography scans, addressing challenges such as small sample size and data imbalance through dimensionality reduction and feature selection. Deep learning faces challenges in biomedical applications, particularly in screening benign nodules, due to limited annotated data and class imbalance, leading to overfitting.

To address these challenges, we developed a framework to explore the impact of representation spaces through three levels of data splitting in experimental design. It provides insights into model performance, generalisation capabilities, and ensures robust evaluation and reproducibility. Additionally, we conducted a statistical analysis of the impact of scanner acquisition parameters.

The experimental results allow us to analyse outcomes at different levels of generalisation using cross-validation, varying the experimental unit by slice or nodule and relating various visual representation spaces and found hyperparameters.

Keywords: Lung Cancer, Early Lung Cancer Diagnosis, Features Embedding, Hyperparameter Optimization, Meta Learning, Machine Learning, Deep Learning, Computer Vision, Radiomics, Representation Spaces.

*The defence was in Spanish