VISION TRANSFORMER-BASED DEEP LEARNING TECHNIQUES  FOR IMPROPER FACE MASK-WEARING DETECTION

A. S. M. MUNTAHEEN

VISION TRANSFORMER-BASED DEEP LEARNING TECHNIQUES FOR IMPROPER FACE MASK-WEARING DETECTION

dc.contributor.author	A. S. M. MUNTAHEEN
dc.date.accessioned	2025-12-03T13:14:06Z
dc.date.available	2025-12-03T13:14:06Z
dc.date.issued	2024-03
dc.description	VISION TRANSFORMER-BASED DEEP LEARNING TECHNIQUES FOR IMPROPER FACE MASK-WEARING DETECTION	en_US
dc.description.abstract	COVID-19 pandemic causes a global catastrophe that remarkably affects individual lives and society, as well as the economy. The world has taken numerous defenses against this contagious disease and using face masks is one of the most crucial defense mechanisms. Effective prevention relies on proper face mask use, yet less than 25% of individuals adhere to correct usage. The prevalent method for face mask detection involves image processing, machine learning, and deep learning; notably, the Vision Transformer (ViT) base model has outperformed traditional deep learning models in making a significant impact in various domains. The exploration of ViT model in face mask detection is yet to be explored. This paper proposes to apply a most recent deep learning-based image classification model named ViT model to automatically detect improper face mask-wearing, i.e., whether the face masks are being worn correctly or not. The ViT base model shows a significant impact on incorrect face mask detection. The experiment has been conducted on a large custom dataset consisting of 2,03,780 digital images with 03 class labels namely ‘With Mask’ (when people are wearing the mask properly), ‘Without Mask’ (when people are not wearing the mask), and ‘Incorrect Mask’ (when people are not wearing the mask properly) to train, validate and test ViT model that can classify the use of face masks correctly. The results show that the accuracy achieved with the pre-trained ViT model is highly remarkable. Furthermore, the same experiment has been conducted on the Convolutional Neural Network (CNN) model with the same dataset consisting of 03 class labels. Then, the comparison has been done between the CNN model results and the ViT model results. The findings indicate that the ViT model exhibited faster training times and higher training accuracy, making it a more time-efficient option for incorrect face mask identification. This advantage in training time can be crucial for real-time applications and scenarios requiring quick response and decision-making. These findings contribute to advancing the field of image classification and offer valuable insights for future research and the development of improved image classification systems. The extended experiments involve five distinct CNN architectures—XCEPTION, MOBILENETV2, VGG16, INCEPTION, and ResNet 50—utilizing a smaller dataset consisting of 2079 digital images with the same 03 class labels. The results demonstrate that the ViT model outperforms all other models. In conclusion, this study establishes that the ViT model achieves the highest accuracy among the other evaluated models.	en_US
dc.identifier.uri	http://dspace.mist.ac.bd:8080/xmlui/handle/123456789/1051
dc.language.iso	en	en_US
dc.title	VISION TRANSFORMER-BASED DEEP LEARNING TECHNIQUES FOR IMPROPER FACE MASK-WEARING DETECTION	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Muntaheen_0419140003_Thesis_Msc_2024.pdf
Size:: 2.21 MB
Format:: Adobe Portable Document Format
Description:: VISION TRANSFORMER-BASED DEEP LEARNING TECHNIQUES FOR IMPROPER FACE MASK-WEARING DETECTION

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Master's Thesis