ISIC 2024 – SKIN CANCER DETECTION

ChallengeDeep LearningComputer VisionData AugmentationPyTorch

CONTEXT AND OBJECTIVE

Skin cancer can be fatal if not detected early, but many people lack access to specialized dermatologic care. AI algorithms using dermoscopy have been effective in helping clinicians diagnose skin cancers. Triaging applications could greatly benefit underserved populations by identifying those who need clinical evaluation, improving early detection and patient outcomes.

In the context of a competition organized by Kaggle, the goal of this project was to develop and submit an AI algorithm that differentiates histologically-confirmed malignant skin lesions from benign lesions on a patient. The overarching objective was to contribute to the collective effort of offering early diagnosis and disease prognosis to a borader population, thanks to automated skin cancer detection.

WHAT WAS DONE

The dataset consisted of diagnostically labelled images with additional metadata. After the initial EDA phase, a data pipeline was developped with scikit-learn to feed a custom neural network built and trained with PyTorch.

The custom model consists in a CNN section that converts the images in vectors of exploitable features, which are then combined to the structured metadata and fed to a fully connected stage for final prediction.

The dataset being extremely imbalanced (only 0.1% positive cases), a combination of metadata oversampling and data augmentation on images was used to train the model.

The source code is available on GitHub.

Training Data

More than 400k examples were available, 0.1% of which corresponding to malignant cases.

Receiver Operating Characteristic

In the context of the competition, models were evaluated based on the area under the ROC curve. More specifically, the sub-area with False Positive Rate >= 0.8. This model scored at 0.123, and the winning submission at 0.173.