skip navigation

Vision Transformers for Diagnostic Classification of Lymphomas: A Matched Comparison with a Convolutional Neural Network

Author(s): Daniel Rivera, Alexander Banerjee, Rongzhen Zhang, Hanadi El Achi, Amer Wahed, Lauren Ho, and Andy Nguyen*

Vision transformers (ViT) have been shown to outperform convolutional neural network (CNN) when pre-trained on sufficient data. ViT has a weaker inductive bias and, therefore, allows for more flexible feature detection. ViT models demonstrate good accuracy on large-scale datasets thanks to their self-supervised learning and multimodal training capabilities. Due to their promising feature detection capabilities, we deployed ViTs for morphological classification of anaplastic large cell lymphoma (ALCL) versus classical Hodgkin lymphoma (cHL). We compared the classification performance of the ViT model with that of our previously designed CNN on the same dataset. Our study presents the first direct comparison of predictive performance between a CNN and a Vision Transformer model, utilizing the same dataset that encompasses ALCL and cHL cases. Our algorithm achieved a diagnostic accuracy of 100% and an F1 score of 1.0 on the independent test set, matching the performance of our previously developed CNN model. The confusion matrix demonstrated perfect classification, with zero false positives or false negatives for both diseases. Our findings suggest that the ViT model can achieve diagnostic performance comparable to that of a CNN, even with small datasets.

View PDF View Fulltext